#22141: Deprecate `DescriptorFile` and add relevant information to `Descriptor` ---------------------------------+----------------------------------- Reporter: karsten | Owner: karsten Type: enhancement | Status: assigned Priority: Medium | Milestone: metrics-lib 1.9.0 Component: Metrics/metrics-lib | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: ---------------------------------+-----------------------------------
Comment (by karsten): I'm afraid I found two flaws in the design above and the suggestion on #22196: 1. We're using the `Iterable<Descriptor>` and `InvalidDescriptor` as one possible returned element as channel for all kinds of exceptions thrown while reading descriptor files. But we don't have such a channel in `DescriptorCollector#collectDescriptors` which returns, well, `void`. That means that the design above doesn't solve anything of [https://trac.torproject.org/projects/tor/ticket/16225#comment:7 this issue raised on the related ticket #16225]. And ideally we'd handle exceptions in the same way in all descriptor sources. 1. The idea of replacing the parse history with a `minLastModified` timestamp doesn't handle I/O problems very well. For example, in the current implementation, if there's a network problem with downloading a potentially large file from CollecTor, we would not include that in the parse history and retry collecting and reading it next time. But with only a timestamp we would simply skip that descriptor file, which is pretty bad. New plan: - We rename `InvalidDescriptor` to `UnparseableDescriptor` and let it return the `DescriptorParseException` that made it unparseable in our view. This instance ''might'' be useful to the application, at least by containing the raw descriptor `byte[]` or descriptor `File` reference. And if the application produced the input descriptor itself, like CollecTor, knowing about it being unparseable is more important than for a consumer, like Onionoo! - We leave the parse history in place and postpone the idea of stateless, overloaded methods. Sad. - We add a method `removeFromHistoryFile(File)` that can be called by the application if it wants to reprocess a descriptor file containing an `UnparseableDescriptor`. This would not be necessary for I/O exceptions, because those descriptor files would not go into the parse history and retried in the next execution anyway. - We log any exceptions on `warn` level that we caught while collecting or reading or parsing descriptors. The application can't ''do'' anything to handle these issues anyway, except for telling the operator that something went wrong. `DescriptorParseException`s thrown while parsing invalid/unparseable descriptors are exempt from this and will only be logged on info level. - We ''could'' introduce an `ExceptionListener` for exceptions are than `DescriptorParseException`, or the "ParseExceptionsLog" that you mentioned (even though I'm not entirely certain what that would be). I don't think it's necessary though, because we wouldn't it use that ourselves, and we don't know whether it would be used by anyone. Before I write more code (or longer comments!), what do you think? :) -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22141#comment:13> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online _______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs