Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-28 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  closed
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:  implemented
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-
Changes (by iwakeh):

 * status:  needs_review => closed
 * resolution:   => implemented


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-28 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 With the 101st comment:

 Closing.

 Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-28 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:99 iwakeh]:
 > Thanks for fixing this!

 Sure.  Also, tests looked good, I'd say we're ready to release as soon as
 #20380 is in.

 > I added #20489 in order to also create tests for these issues and any
 others that do not have tests yet. The path-stuff will surely be touched
 again in future and then we'll be able catch problems in the unit-tests.

 Sounds good, thanks!

--
Ticket URL: 

Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-28 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Thanks for fixing this!

 I added #20489 in order to also create tests for these issues and any
 others that do not have tests yet. The path-stuff will surely be touched
 again in future and then we'll be able catch problems in the unit-tests.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-27 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:97 karsten]:
 > Replying to [comment:96 iwakeh]:
 > > Please review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep-1027=7a33c3ed9cb1fa39b3f50e6540ca045522b46944 this
 commit].
 > > This changes the top-level name to marker and host, i.e. 'Relay-
 collector.torproject.org/...'
 >
 > Looks good, merged, and resumed testing.

 FYI, pushed another commit as fixup of that last commit.  (The path had to
 be changed in two places.)  Testing more over night.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-27 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:96 iwakeh]:
 > Please review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep-1027=7a33c3ed9cb1fa39b3f50e6540ca045522b46944 this
 commit].
 > This changes the top-level name to marker and host, i.e. 'Relay-
 collector.torproject.org/...'

 Looks good, merged, and resumed testing.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-27 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep-1027=7a33c3ed9cb1fa39b3f50e6540ca045522b46944 this
 commit].
 This changes the top-level name to marker and host.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-27 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 New bug, though I'm not certain yet where to fix it: the
 `DescriptorIndexCollector` instance of one module deletes the synchronized
 descriptors stored by the instance of another module.  Look at the
 following (new) code:

 {{{
 DescriptorCollector dc =
 DescriptorSourceFactory.createDescriptorCollector();
 dc.collectDescriptors("https://collector.torproject.org;,
 new String[] { "/recent/torperf/" }, 0L, new File("sync"), true);
 dc.collectDescriptors("https://collector.torproject.org;,
 new String[] { "/recent/exit-lists/" }, 0L, new File("sync"),
 true);
 }}}

 What would you expect that code to do?  Populate two local directories
 `sync/recent/torperf/` and `sync/recent/exit-lists/`?  Not quite, the last
 line clears all descriptors in `sync/recent/torperf/`.  Uhm.  And I'm
 not even sure what `DescriptorCollectorImpl` would do, possibly the same
 thing.

 We can either work around this in CollecTor by using separate
 subdirectories for the modules, like
 `sync/Relay/collector.torproject.org/recent/relay-descriptors/...`, or we
 can decide this is a (usability) bug in metrics-lib, put out a 1.5.1, and
 switch.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-27 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:93 iwakeh]:
 > Please find another commit on this
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep=d339c3931eb3721024f0982186ba94445019e3d3 branch].

 Looks good, merged to master and starting a new test run now.  Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please find another commit on this
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep=d339c3931eb3721024f0982186ba94445019e3d3 branch].

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:91 iwakeh]:
 > Replying to [comment:90 karsten]:
 > > Well, the sync runs don't take just 3 or 20 minutes here, but many
 hours.  I could imagine that it's the backup daemon trying to capture all
 file system changes for the next backup, or something.  I could imagine
 that similar things can happen on servers.
 >
 > Well, please  investigate, if the backup is the reason.  A backup
 shouldn't hamper the productive system.

 This was just a guess.  I didn't further investigate after finding out
 what the code was doing.  And even if backup daemons were not the reason
 for the slowness here, we shouldn't ship this code, now that we know about
 its inefficiency.

 > Just to make sure we're talking about the same use case:
 > * A fresh installation without previous data is not what sync is for.
 Here the archived data of the last three days should be provided and one
 or two regular download runs. After that, sync can be turned on.
 > * A running instance like a mirror can be enhanced with the sync.

 Why would we discourage turning on sync right from the start?  In this
 case, it was not the first run that was slow, but later sync runs were
 even slower.

 > > I don't understand your reasoning about `index.json` not being a true
 picture of `recent/`.  We're skipping `*.tmp` files when creating that
 file, and we always append to `.tmp` and only rename to the destination
 file when we're sure the file won't change anymore.  Where does that get
 inaccurate?
 >
 > Now I see we're talking about different accuracies:
 >
 > You're referring to the single file assembled in one download (or
 hopefully soon sync-run).  Thus, the index.json of a syncing instance
 could become inaccurate in this case.

 Yes, that's what I was referring to.

 > I'm referring to the accuracy lost by regular operation after creation
 or download of index.json.  For example ticket #20430, the syncing
 instance retrieves index.json, shortly after that the main instance has a
 clean-up run, and thus files listed in index.json don't exist anymore.

 True, that's unrelated to what I meant above.

 > > By the way, we'll need to merge #20380 before putting out the release.
 And I'd want to start a test run over night before releasing, so the
 release cannot happen today anyway. :(
 >
 > That is vital information.
 > When postponing the release there is time for changing the code and
 testing, that's what I said before.
 > I'll take a look.

 If you have some code for me today, I'll run it over night, and maybe we
 can release tomorrow! :)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:90 karsten]:
 > Well, the sync runs don't take just 3 or 20 minutes here, but many
 hours.  I could imagine that it's the backup daemon trying to capture all
 file system changes for the next backup, or something.  I could imagine
 that similar things can happen on servers.

 Well, please  investigate, if the backup is the reason.  A backup
 shouldn't hamper the productive system.

 Just to make sure we're talking about the same use case:
 * A fresh installation without previous data is not what sync is for.
 Here the archived data of the last three days should be provided and one
 or two regular download runs. After that, sync can be turned on.
 * A running instance like a mirror can be enhanced with the sync.

 > I don't understand your reasoning about `index.json` not being a true
 picture of `recent/`.  We're skipping `*.tmp` files when creating that
 file, and we always append to `.tmp` and only rename to the destination
 file when we're sure the file won't change anymore.  Where does that get
 inaccurate?

 Now I see we're talking about different accuracies:

 You're referring to the single file assembled in one download (or
 hopefully soon sync-run).  Thus, the index.json of a syncing instance
 could become inaccurate in this case.

 I'm referring to the accuracy lost by regular operation after creation or
 download of index.json.  For example ticket #20430, the syncing instance
 retrieves index.json, shortly after that the main instance has a clean-up
 run, and thus files listed in index.json don't exist anymore.

 >
 > ...
 >
 > By the way, we'll need to merge #20380 before putting out the release.
 And I'd want to start a test run over night before releasing, so the
 release cannot happen today anyway. :(

 That is vital information.
 When postponing the release there is time for changing the code and
 testing, that's what I said before.
 I'll take a look.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Well, the sync runs don't take just 3 or 20 minutes here, but many hours.
 I could imagine that it's the backup daemon trying to capture all file
 system changes for the next backup, or something.  I could imagine that
 similar things can happen on servers.  The issue is that the update run
 cannot start if the sync run keeps running, and that means we'll lose
 data.  I'm afraid we can't ship this yet.

 I don't understand your reasoning about `index.json` not being a true
 picture of `recent/`.  We're skipping `*.tmp` files when creating that
 file, and we always append to `.tmp` and only rename to the destination
 file when we're sure the file won't change anymore.  Where does that get
 inaccurate?

 I'd rather want to change the sync code to do the same as the update code.
 I can give that a try if you don't want to touch that code anymore.

 By the way, we'll need to merge #20380 before putting out the release.
 And I'd want to start a test run over night before releasing, so the
 release cannot happen today anyway. :(

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 1. Valid points about file-io.  My local runs identified the network as
 the bottleneck rather than file-io when doing an initial sync to an empty
 CollecTor instance.  Subsequent runs were way shorter.  The mirror as a
 running Collector instance needed only 20 min on the first sync-run and
 now way less (3 to 1 min).
  But anyway, it's true that some of the copying could and should be
 reduced.

 2. Ideally index.json should be a true picture of 'recent', but actually,
 it'll always only be a snapshot, even if it's updated with each change,
 b/c then the syncing instance cannot update index.json continuously.  So,
 CollecTor's sync should accommodate the possible differences, which it
 does currently, I think.


 How to proceed?
 Do you think this is a halt to the release?
 I think it can be released as is, because the current set-up increases
 descriptor availability a lot and is tested.
 I'm wary of tuning it now without a release delay.  And, regarding both
 writing and parsing there are duplicate and trip-licate implementations in
 the code-base, which should be streamlined and can be tuned in that
 process.

 I'd suggest to release and have new tickets (which will be part of the
 other tickets for planning the streamlining, modularization, and other
 improvements):
 1. streamline writing all over the code-base with an emphasis on reducing
 file-io for CollecTor;
 2. Make index.json as close to the current state as necessary and
 feasible, which includes pondering about how accurate it should be with
 the given use-cases. Maybe have a clean-up module before index-run.

 Is that an ok plan?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Here's a fresh bug report: synchronizing relay descriptors takes a few
 ''hours'' on average on my laptop, and I think I know why.  Whenever we
 append to an existing file in `recent/`, we copy the entire file content
 to a temporary file, append the new content there, and then rename.  And
 we do that for every single server descriptor or extra-info descriptor we
 receive.  That is bad for two reasons.  First reason is that it obviously
 doesn't scale, with some sync runs taking 4 hours or longer.  Second
 reason is that files in `recent/` shouldn't change after they're included
 in `index.json` (with the inglorious exception of Torperf measurements),
 which is why we should append all descriptors to a temporary file and
 rename when we're sure we're done.  See also
 `ArchiveWriter.cleanUpRsyncDirectory()` where we're renaming temporary
 files to destination files at the very end of the update run.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:85 iwakeh]:
 > Please find four more commits on
 [https://gitweb.torproject.org/user/iwakeh/collector.git/?h=task-18910
 -release-prep the above branch] addressing
 >
 > * the microcons-path
 > * the sync history files
 > * and the storage.

 Looks good, merged to master!

 > For the latter I added a PersistenceUtils test class and corrected the
 behavior you described in an earlier comment:80 (will the comment count
 reach 100 here ;-)
 >
 > Thanks for looking so carefully through all the code!

 Sure!  I'll start a new test and let it run tonight.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please find four more commits on
 [https://gitweb.torproject.org/user/iwakeh/collector.git/?h=task-18910
 -release-prep the above branch] addressing

 * the microcons-path
 * the sync history files
 * and the storage.

 For the latter I added a PersistenceUtils test class and corrected the
 behavior you described in an earlier comment:80 (will the comment count
 reach 100 here ;-)

 Thanks for looking so carefully through all the code!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Addendum to the hist-file topic:
 The added string from the path is just a work-around.  There should be
 different interfaces for consensus flavors (#17861 and #19640) before
 there are too many flavors.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:83 iwakeh]:
 > Replying to [comment:81 iwakeh]:
 > > Replying to [comment:79 karsten]:
 > > > First bug report: configuration property `"ExitlistSources"` is
 disregarded.  I believe we're looking at the non-existent property
 `"ExitlistsSources"` (plural).
 > >
 > > Good catch!  I'll work on a patch and a test preventing this in
 future.
 >
 > Here is
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep=b8386ce0e39aeadff82bdb9bc155ad6e69c23ec7 the branch] with
 a patch and test.  Please review.

 Looks good.  Merged to master!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:81 iwakeh]:
 > Replying to [comment:79 karsten]:
 > > First bug report: configuration property `"ExitlistSources"` is
 disregarded.  I believe we're looking at the non-existent property
 `"ExitlistsSources"` (plural).
 >
 > Good catch!  I'll work on a patch and a test preventing this in future.

 Here is
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -release-prep=b8386ce0e39aeadff82bdb9bc155ad6e69c23ec7 the branch] with
 a patch and test.  Please review.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Here's another one.  The path for microdescriptor consensuses is wrong,
 which leads to us not syncing any of those:

 {{{
 diff --git
 a/src/main/java/org/torproject/collector/relaydescs/ArchiveWriter.java
 b/src/main/java/org/torproject/collector/relaydescs/ArchiveWriter.java
 index 30b6569..20d0663 100644
 --- a/src/main/java/org/torproject/collector/relaydescs/ArchiveWriter.java
 +++ b/src/main/java/org/torproject/collector/relaydescs/ArchiveWriter.java
 @@ -99,7 +99,7 @@ public class ArchiveWriter extends CollecTorMain {
  this.mapPathDescriptors.put("recent/relay-descriptors/consensuses",
  RelayNetworkStatusConsensus.class);
  this.mapPathDescriptors.put(
 -"relay-descriptors/microdescs/consensus-microdesc",
 +"recent/relay-descriptors/microdescs/consensus-microdesc",
  RelayNetworkStatusConsensus.class);
  this.mapPathDescriptors.put("recent/relay-descriptors/server-
 descriptors",
  RelayServerDescriptor.class);
 }}}

 Related to this, I believe that we'd use `sync/sync-history-
 collector.torproject.org-Relay-RelayNetworkStatusConsensus` for the parse
 history, which would be the exact same file as for unflavored consensuses.
 We should use a different parse history file here.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:79 karsten]:
 > First bug report: configuration property `"ExitlistSources"` is
 disregarded.  I believe we're looking at the non-existent property
 `"ExitlistsSources"` (plural).

 Good catch!  I'll work on a patch and a test preventing this in future.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Second bug report: I believe that we're disregarding
 `StandardOpenOption.CREATE_NEW` in `PersistenceUtils.storeToFileSystem()`
 since the switch to temporary files.  We're checking that the temporary
 file does not exist, writing to it, and later renaming to the original
 file, thus replacing it.  What we should really do, when storing a file
 with `StandardOpenOption.CREATE_NEW`, is check whether the file exists and
 return false.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-25 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 First bug report: configuration property `"ExitlistSources"` is
 disregarded.  I believe we're looking at the non-existent property
 `"ExitlistsSources"` (plural).

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-24 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Cherry-picked all new commits from your branch and pushed to master.  Yay!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-24 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 All is working fine on the mirror :-)

 And, the availability of descriptors already increased tremendously even
 though sync has been running less than three days yet.
 I'll add a wiki page with numbers and findings later.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-24 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:75 iwakeh]:
 > There's an additional commit on top of my rebased branch replacing a
 character that made SanitizedBridgesWriterTest fail (in an en_US.UTF-8
 environment).

 Good catch!

 How's the testing going?  Should I merge to master?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-21 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 There's an additional commit on top of my rebased branch replacing a
 character that made SanitizedBridgesWriterTest fail (in an en_US.UTF-8
 environment).

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-21 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Sorry, just wanted to look into this, but you were faster. :)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-21 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Here is a
 [https://gitweb.torproject.org/user/iwakeh/collector.git?h=task-18910-5-rebased
 rebased branch 5], which can be used for testing.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Thanks!
 Could you rebase on master to also catch the changes for #20162 and other
 merged things for testing and operator manual review?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-5 ​branch task-18910-5 in my repository] which
 contains the exact same code as task-18910-4 but with squashed commits.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Great, that fixes the tests.  Cherry-picked, amended to remove unused
 import, pushed to my task-18910-4 branch.

 Next steps: test some more on a server, then squash commits, then merge to
 master?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please review the extended tests
 
[https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910-5=7302996e85275e06501ddeda0662ee773095a8e8
 here].

 The reason was that at the beginning I simply relied on receiving the same
 list order and just verifying the first path in the out list.  The
 breaking tests reminded me of that.
 The expanded tests verify all paths created for 'out'.

 (Actually, I did clone your branch anew (new folder even) before
 comment:62, but only now received the commits that broke the tests. Slow
 update? Hmm.)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 I'll post a fix for the tests soon.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Please take a look at the new commits in my updated task-18910-4 branch.

 Remaining tasks:

  - Find out why unit tests fail.  (See below.)
  - Test on a server.
  - Squash commits and merge.

 Here's the test output with 4 failing tests, both on OS X and on Linux:

 {{{
 [junit] Testsuite: org.torproject.collector.sync.SyncPersistenceTest
 [junit] Tests run: 40, Failures: 4, Errors: 0, Skipped: 0, Time
 elapsed: 30.708 sec
 [junit]
 [junit] Testcase: testDescWriteOutput[0] took 2.044 sec
 [junit] Testcase: testDescWriteRecent[0] took 0.523 sec
 [junit] Testcase: testOutFileContent[0] took 0.396 sec
 [junit] Testcase: testRecentFileContent[0] took 0.389 sec
 [junit] Testcase: testDescWriteOutput[1] took 0.454 sec
 [junit] FAILED
 [junit] data used: relay-descriptors/server-
 descriptor/2016/10/e/3/e381ce74a1a592f6d375706665aba6d4d22923f1,
 relay-2016-10-02-16-05-00-server-descriptors, resulting list:
 [/tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/c/8/c8c3588019f7c896eb4185cfc1074cfe5eb405ea,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/4/1/4179c50d3c764bc85c9d719e14e55a6cc232a10d,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/2/0/2091f76a8256e479cbe4f57be85f87909af07236,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/b/b/bbca7ed70ba6ea88f995b067a004f5a4d0903d6e,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/d/a/dae8966ca600b46bc75ed5efb97286481e9a6876,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/e/3/e381ce74a1a592f6d375706665aba6d4d22923f1,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/e/1/e1142337dee5b890393a0891acbde51577c2b743,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/a/0/a0ed9227a9413f140445002ce412f8828591e7ec,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/5/b/5b202650802a916f1ec3a1ef36b98706e3747701,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/5/a/5a536243bf056cd7177ddfd8eb363fec978f3343]
 
expected:<...-descriptor/2016/10/[e/3/e381ce74a1a592f6d375706665aba6d4d22923f1]>
 but
 was:<...-descriptor/2016/10/[c/8/c8c3588019f7c896eb4185cfc1074cfe5eb405ea]>
 [junit] junit.framework.AssertionFailedError: data used: relay-
 descriptors/server-
 descriptor/2016/10/e/3/e381ce74a1a592f6d375706665aba6d4d22923f1,
 relay-2016-10-02-16-05-00-server-descriptors, resulting list:
 [/tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/c/8/c8c3588019f7c896eb4185cfc1074cfe5eb405ea,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/4/1/4179c50d3c764bc85c9d719e14e55a6cc232a10d,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/2/0/2091f76a8256e479cbe4f57be85f87909af07236,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/b/b/bbca7ed70ba6ea88f995b067a004f5a4d0903d6e,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/d/a/dae8966ca600b46bc75ed5efb97286481e9a6876,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/e/3/e381ce74a1a592f6d375706665aba6d4d22923f1,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/e/1/e1142337dee5b890393a0891acbde51577c2b743,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/a/0/a0ed9227a9413f140445002ce412f8828591e7ec,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/5/b/5b202650802a916f1ec3a1ef36b98706e3747701,
 /tmp/junit1420314177894412746/out/relay-descriptors/server-
 descriptor/2016/10/5/a/5a536243bf056cd7177ddfd8eb363fec978f3343]
 
expected:<...-descriptor/2016/10/[e/3/e381ce74a1a592f6d375706665aba6d4d22923f1]>
 but
 was:<...-descriptor/2016/10/[c/8/c8c3588019f7c896eb4185cfc1074cfe5eb405ea]>
 [junit] at
 
org.torproject.collector.sync.SyncPersistenceTest.testDescWriteOutput(SyncPersistenceTest.java:179)
 [junit]
 [junit] Testcase: testDescWriteRecent[1] took 0.057 sec
 [junit] Testcase: testOutFileContent[1] took 0.1 sec
 [junit] Testcase: 

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 All tests also pass using the pre-release for descriptor-1.5.0, incl. the
 usually ignored test.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Here's the diff for build.xml
 {{{
 diff --git a/build.xml b/build.xml
 index 7266e2d..a13a97e 100644
 --- a/build.xml
 +++ b/build.xml
 @@ -7,6 +7,7 @@



 +  



 @@ -17,11 +18,11 @@



 -  
 -  
 -  
 +  
 +  
 +  

 +value="${dist}/collector-${release.version}.tar.gz" />



 @@ -90,10 +91,6 @@

  
  
 -
 -
 -
 -


  https://trac.torproject.org/projects/tor/ticket/18910#comment:65>
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-20 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 All unit tests pass on the branch as it is.  I assume there was a
 'collector.properties' left in your path; the test output should state
 that.  Which reminds me, that the path for jars should be 'generated/dist'
 as in metrics-lib, for example.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:62 iwakeh]:
 > Replying to [comment:61 karsten]:
 > > Alright, please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-4 ​branch task-18910-4 in my repository].
 >
 > This branch doesn't exist yet.

 Oops, it should exist now.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:61 karsten]:
 > Alright, please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-4 ​branch task-18910-4 in my repository].

 This branch doesn't exist yet.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Alright, please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-4 ​branch task-18910-4 in my repository].

 Things left to do:

  - Let me know if the change log looks okay.
  - Find out why unit tests fail.
  - Upgrade to next metrics-lib as soon as it's released.
  - Test on a server.
  - Squash commits and merge.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:59 karsten]:
 > Those three commits look good!  Mind if I merge the first two into
 
[https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910-4=b998073dbf8ce8e6e728b6154b2b3ad8dcfc69ed
 b998073]?

 That's fine.

 >
 > (Do you know how to use `git commit --fixup b998073` or `git commit
 --squash b998073`?  I find those really helpful to avoid changing existing
 commits and yet making it clear how commits may be squashed during the
 merge for future generations to make sense of them.)

 Yes, but just didn't think of using them.

 >
 > And yes, we'll need to update to the next metrics-lib release once it's
 out.
 >
 > Oh, here's another thing we still need to do: we should make the change
 log more comprehensible for other operators.  I have some ideas for
 improving what's there, but if you'd like to give that a try first, please
 feel free.

 Just go ahead, I'm still too much in the technical mindset right now.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Those three commits look good!  Mind if I merge the first two into
 
[https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910-4=b998073dbf8ce8e6e728b6154b2b3ad8dcfc69ed
 b998073]?

 (Do you know how to use `git commit --fixup b998073` or `git commit
 --squash b998073`?  I find those really helpful to avoid changing existing
 commits and yet making it clear how commits may be squashed during the
 merge for future generations to make sense of them.)

 And yes, we'll need to update to the next metrics-lib release once it's
 out.

 Oh, here's another thing we still need to do: we should make the change
 log more comprehensible for other operators.  I have some ideas for
 improving what's there, but if you'd like to give that a try first, please
 feel free.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 The changes look good.

 Please take a look at the top three commits
 [https://gitweb.torproject.org/user/iwakeh/collector.git?h=task-18910-4
 here].

 Just to not forget:
 The metrics-lib version needs to be updated after the 1.4.1 release.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Thanks for working through all that code so quickly!

 Replying to [comment:56 karsten]:
 > Alright!  Please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-3 branch task-18910-3 in my repository].  I
 created fixup/squash commits for most things I found.  Please take a close
 look and challenge anything you don't like to be squashed into your
 commit.  In particular, please take a look why the unit tests don't pass
 anymore. ;)

 Fine. I'll start testing after finishing this reply.

 >
 > More comments below:
 >
 >  - I believe that index.json would include the `.tmp` files written
 during sync, because we're only skipping files starting with `.`.  And
 that's not even a new issue, because the `.tmp` files written during
 normal updating would also be included.  The issue will only become more
 prevalent.  I'd say we adapt the indexer, which can happen in a separate
 commit.  Or we could change temporary files to start with `.`, but that
 seems a bit more error-prone.  Suggestion: '''fix'''

 That's ok.  Eventually the regular modules should use the classes in
 o.t.c.persist for storing descriptors (that will reduce the line count in
 them tremendously and finally unify writing descriptors).  So, using the
 "." prefix here now is fine.

 >
 >  - As mentioned before somewhere, we're appending server/extra-info
 descriptors to different files in the synchronization run than in the
 earlier update run.  The reason is that `SyncManager` creates its own
 `collectionDate` in the constructor.  However, we cannot just pass a
 `Date` instance to both `SyncManager` and `ArchiveWriter`, because then
 `SyncManager` would append to a file that we're already serving, in which
 case the file size would change once we're done synchronizing, and we
 should try to avoid that.  Let's fix this as soon as the current updaters
 are modules just like the synchronization module, and let's keep in mind
 not to close and rename temporary files to their final names between
 modules.  Suggestion: '''postpone'''

 Agreed.

 >
 >  - As said earlier, let's release metrics-lib and use that new release
 as basis for this patch.  Then we won't have to import and instantiate
 `org.torproject.descriptor.index.DescriptorIndexCollector` anymore, and we
 can change defaults for the `*SyncOrigins` properties in
 `collector.properties` back to `https://collector.torproject.org`.
 Suggestion: '''change'''

 Fine!
 All these changes are reasons for metrics-lib release 1.4.1.

 >
 >  - Regarding comment 25 above, I'm still a fan of using `printf` as
 opposed to creating `SimpleDateFormat` instances and concatenating
 strings.  I also believe it would be more efficient.  But I'm happy to
 postpone that change.  Suggestion: '''postpone'''

 Yes, it might be good to think a little about that.  The timestamp to
 string functionality might also be a candidate for 'metrics-tools' (cf.
 #20405), b/c it's used throughout Metrics' products.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Alright!  Please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-3 branch task-18910-3 in my repository].  I
 created fixup/squash commits for most things I found.  Please take a close
 look and challenge anything you don't like to be squashed into your
 commit.  In particular, please take a look why the unit tests don't pass
 anymore. ;)

 More comments below:

  - I believe that index.json would include the `.tmp` files written during
 sync, because we're only skipping files starting with `.`.  And that's not
 even a new issue, because the `.tmp` files written during normal updating
 would also be included.  The issue will only become more prevalent.  I'd
 say we adapt the indexer, which can happen in a separate commit.  Or we
 could change temporary files to start with `.`, but that seems a bit more
 error-prone.  Suggestion: '''fix'''

  - As mentioned before somewhere, we're appending server/extra-info
 descriptors to different files in the synchronization run than in the
 earlier update run.  The reason is that `SyncManager` creates its own
 `collectionDate` in the constructor.  However, we cannot just pass a
 `Date` instance to both `SyncManager` and `ArchiveWriter`, because then
 `SyncManager` would append to a file that we're already serving, in which
 case the file size would change once we're done synchronizing, and we
 should try to avoid that.  Let's fix this as soon as the current updaters
 are modules just like the synchronization module, and let's keep in mind
 not to close and rename temporary files to their final names between
 modules.  Suggestion: '''postpone'''

  - As said earlier, let's release metrics-lib and use that new release as
 basis for this patch.  Then we won't have to import and instantiate
 `org.torproject.descriptor.index.DescriptorIndexCollector` anymore, and we
 can change defaults for the `*SyncOrigins` properties in
 `collector.properties` back to `https://collector.torproject.org`.
 Suggestion: '''change'''

  - Regarding comment 25 above, I'm still a fan of using `printf` as
 opposed to creating `SimpleDateFormat` instances and concatenating
 strings.  I also believe it would be more efficient.  But I'm happy to
 postpone that change.  Suggestion: '''postpone'''

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 metrics-lib 1.4.1 sounds good, I just created a milestone for that.
 Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Yes, a release would be great!
 All three changes you name are important for CollecTor 1.1.0.

 Should it be metrics-lib 1.4.1?
 We need a new milestone for tracking the release.

 I'll look again, but I think these are the most relevant for now.  And,
 it's better to keep the release small in order to have it out quickly.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-18 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 So, I reviewed about half of your -3 branch now but still more time
 tomorrow to finish the review.  Here's a quick suggestion though:

 Should we put out a new metrics-lib release tomorrow that this branch will
 be "based on"?  I could imagine having the following changes in that
 metrics-lib release:

  - `DescriptorIndexCollector` is the new default.

  - That same class uses `/index/index.json[.gz]` as path if it only
 receives a base URL.  I believe I suggested such a thing on #20039.

  - metrics-lib master already contains a fix that would be useful to have
 for new CollecTor operators: "Avoid running into an IOException and
 logging a warning for it." (#20320)

 Note that I'm not suggesting to fix the 2G file size thing in that
 release.  Let's stay realistic. ;)

 If we do this, are there any other metrics-lib changes that we should
 include in the release that would help us with this CollecTor change?

 More feedback on the -3 branch tomorrow.  Please don't amend the commits
 in that branch until then.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-18 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 I also adapted the sync-urls for index.json.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-18 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please review this
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-3=b998073dbf8ce8e6e728b6154b2b3ad8dcfc69ed third branch].

 It contains two minor commits and a large big one (on top).  The final
 sync functionality with disabled ReferenceChecker.

 I added tests for also checking the content of descriptor files.  Thus,
 the tests can now catch annotation issues.  To have one place for all
 annotation I introduced o.t.c.conf.Annotation, which at some point should
 become metrics-lib code.

 The persist package is corrected and the early line breaks in the paths
 descriptions are there for more readability.

 Files to 'recent' are first written to a temporary file and then renamed.

 The current paths obey the new path structure.  The way the paths are
 defined now it is an easy task to change the structure in case if
 necessary.

 I think it is better to have it in one commit as all things belong
 together and should be read and reviewed as a whole.

 Well, can't think of anything else right now.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-17 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:49 iwakeh]:
 > Replying to [comment:46 karsten]:
 >
 > I found the a good reference for the paths topic: it's in the
 [https://gitweb.torproject.org/collector.git/tree/src/main/resources
 /create-tarballs.sh#n68 create tars script]: [...]

 Indeed, that script reflects what's on the server.

 > The above makes an inconsistency between relay descriptor and bridge
 descriptor paths visible.

 Right.  The reason is that we recently switched from bridge descriptor
 tarballs containing all bridge descriptors to separate tarballs for each
 type.  I didn't change the directory structure to avoid breaking stuff.
 In theory, it would have been an easy change, but going back to a previous
 version would have become more difficult.  And it didn't seem necessary to
 change the directory structure, because nobody would look at it anyway.

 > As the new structure from #20228, i.e. votes also grouped and all dates
 in recent paths are derived from the acquisition time, will be in place
 soon, I'd like to use the 'new way' already for the merge.
 >
 > With these as background the test diff has some soon to be outdated
 corrections (comments added after '//'):
 > > [...]
 >
 > When changing the structure now shouldn't relays and bridges be stored
 in similar ways?  I think the current difference (in 'out') caused some of
 the confusion here.

 Agreed on the acquisition time thing.  We could change that now.

 Regarding the directory structure, as I said above, I didn't think this
 was necessary a few weeks ago, but I see your point that it's confusing
 for contributors.

 So, if we change paths in `out/` now, should we use the exact same
 directory structure as for the `recent/` directory?  Whoever runs a
 CollecTor instance now will have to stop it, move and rename some
 directories, upgrade, and restart.  What we should avoid is change some
 paths now and some more paths in a few weeks.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-17 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:46 karsten]:

 I found the a good reference for the paths topic: it's in the
 [https://gitweb.torproject.org/collector.git/tree/src/main/resources
 /create-tarballs.sh#n68 create tars script]:
 {{{
   $OUTDIR/exit-lists/$YEARONE/$MONTHONE/
   $OUTDIR/exit-lists/$YEARTWO/$MONTHTWO/
   $OUTDIR/torperf/$YEARONE/$MONTHONE/
   $OUTDIR/torperf/$YEARTWO/$MONTHTWO/
   $OUTDIR/relay-descriptors/certs/
   $OUTDIR/relay-descriptors/microdesc/$YEARONE/$MONTHONE
   $OUTDIR/relay-descriptors/microdesc/$YEARTWO/$MONTHTWO
   $OUTDIR/relay-descriptors/consensus/$YEARONE/$MONTHONE
   $OUTDIR/relay-descriptors/consensus/$YEARTWO/$MONTHTWO
   $OUTDIR/relay-descriptors/vote/$YEARONE/$MONTHONE/
   $OUTDIR/relay-descriptors/vote/$YEARTWO/$MONTHTWO/
   $OUTDIR/relay-descriptors/server-descriptor/$YEARONE/$MONTHONE/
   $OUTDIR/relay-descriptors/server-descriptor/$YEARTWO/$MONTHTWO/
   $OUTDIR/relay-descriptors/extra-info/$YEARONE/$MONTHONE/
   $OUTDIR/relay-descriptors/extra-info/$YEARTWO/$MONTHTWO/
   $OUTDIR/bridge-descriptors/$YEARONE/$MONTHONE/statuses/
   $OUTDIR/bridge-descriptors/$YEARTWO/$MONTHTWO/statuses/
   $OUTDIR/bridge-descriptors/$YEARONE/$MONTHONE/server-descriptors/
   $OUTDIR/bridge-descriptors/$YEARTWO/$MONTHTWO/server-descriptors/
   $OUTDIR/bridge-descriptors/$YEARONE/$MONTHONE/extra-infos/
   $OUTDIR/bridge-descriptors/$YEARTWO/$MONTHTWO/extra-infos/
 }}}

 The above makes an inconsistency between relay descriptor and bridge
 descriptor paths visible.

 As the new structure from #20228, i.e. votes also grouped and all dates in
 recent paths are derived from the acquisition time, will be in place soon,
 I'd like to use the 'new way' already for the merge.

 With these as background the test diff has some soon to be outdated
 corrections (comments added after '//'):
 >
 > {{{
 > diff --git
 a/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 b/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 > index ad5ee6a..cdb90b8 100644
 > ---
 a/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 > +++
 b/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 > @@ -58,15 +58,15 @@ public class SyncPersistenceTest {
 >@Parameters
 >public static Collection pathFilename() {
 >  return Arrays.asList(new Object[][] {
 > -{"exit-lists/2016-10-05-19-06-17", // expected recent path  //
 acqu-date, which will be ok
 > +{"exit-lists/2016-09-20-13-02-00", // expected recent path
 >   "exit-lists/2016/09/20/2016-09-20-13-02-00", // expected out
 path
 >   "2016-09-20-13-02-00", // test-filename
 >   Integer.valueOf(1), // expected recent count of descs files
 >   Integer.valueOf(1)}, // expected output count of descs files
 >
 >  {"relay-descriptors/microdescs/consensus-microdesc/"
 > - + "2016-10-05-19-06-17-consensus-microdesc", // acqu-date,
 which will be ok
 > - "relay-descriptors/microdescs/2016/10/consensus-microdesc/"
 > + + "2016-10-02-17-00-00-consensus-microdesc",
 > + "relay-descriptors/microdesc/2016/10/consensus-microdesc/"
 >   + "02/2016-10-02-17-00-00-consensus-microdesc",
 >   "2016-10-02-17-00-00-consensus-microdesc",
 >   Integer.valueOf(1),
 > @@ -74,7 +74,7 @@ public class SyncPersistenceTest {
 >
 >  {"bridge-descriptors/server-descriptors/"
 >   + "2016-10-05-19-06-17-server-descriptors",
 > - "bridge-descriptors/server-descriptor/2016/10/"
 > + "bridge-descriptors/2016/10/server-descriptor/"
 >   + "A/8/A8A5509AD1393C8F36ABD2D8F0DE1BB751926872",
 >   "bridge-2016-10-02-16-09-00-server-descriptors",
 >   Integer.valueOf(1),
 > @@ -88,28 +88,30 @@ public class SyncPersistenceTest {
 >   Integer.valueOf(1),
 >   Integer.valueOf(10)},
 >
 > -{"relay-descriptors/consensuses/2016-10-05-19-06-17-consensus",
 // acqu-date, which will be ok
 > +{"relay-descriptors/consensuses/2016-09-20-13-00-00-consensus",
 >   "relay-
 descriptors/consensus/2016/09/20/2016-09-20-13-00-00-consensus",
 >   "2016-09-20-13-00-00-consensus",
 >   Integer.valueOf(1),
 >   Integer.valueOf(1)},
 >
 >  {"bridge-descriptors/statuses/"
 > 

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-14 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:47 karsten]:
 > Regarding the protocol, you're probably right that it's still not
 entirely correct.  To be honest, I'm having a hard time understanding the
 format there.  Would you mind looking at the test path changes above and
 adapting the protocol accordingly?  I can take another look at the
 protocol changes.

 Thanks for adding the test cases!
 I'll adapt the protocol after finishing the implementation changes and
 might add examples to clarify the definition.

 >
 > Okay, after reading/skimming this ticket, I think it would make sense
 for me to review that second branch with all the fixes, but I only see 1
 commit there that is not yet in master.  Can you put together a branch
 with all the changes somewhere?  Or if that's not possible, can you point
 to specific commits that I should be looking at?

 Ok, just wait with the review for the completed changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-14 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Regarding the protocol, you're probably right that it's still not entirely
 correct.  To be honest, I'm having a hard time understanding the format
 there.  Would you mind looking at the test path changes above and adapting
 the protocol accordingly?  I can take another look at the protocol
 changes.

 Okay, after reading/skimming this ticket, I think it would make sense for
 me to review that second branch with all the fixes, but I only see 1
 commit there that is not yet in master.  Can you put together a branch
 with all the changes somewhere?  Or if that's not possible, can you point
 to specific commits that I should be looking at?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-14 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Here's a diff of paths in the test class to reflect paths in the currently
 deployed code:

 {{{
 diff --git
 a/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 b/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 index ad5ee6a..cdb90b8 100644
 --- a/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 +++ b/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java
 @@ -58,15 +58,15 @@ public class SyncPersistenceTest {
@Parameters
public static Collection pathFilename() {
  return Arrays.asList(new Object[][] {
 -{"exit-lists/2016-10-05-19-06-17", // expected recent path
 +{"exit-lists/2016-09-20-13-02-00", // expected recent path
   "exit-lists/2016/09/20/2016-09-20-13-02-00", // expected out
 path
   "2016-09-20-13-02-00", // test-filename
   Integer.valueOf(1), // expected recent count of descs files
   Integer.valueOf(1)}, // expected output count of descs files

  {"relay-descriptors/microdescs/consensus-microdesc/"
 - + "2016-10-05-19-06-17-consensus-microdesc",
 - "relay-descriptors/microdescs/2016/10/consensus-microdesc/"
 + + "2016-10-02-17-00-00-consensus-microdesc",
 + "relay-descriptors/microdesc/2016/10/consensus-microdesc/"
   + "02/2016-10-02-17-00-00-consensus-microdesc",
   "2016-10-02-17-00-00-consensus-microdesc",
   Integer.valueOf(1),
 @@ -74,7 +74,7 @@ public class SyncPersistenceTest {

  {"bridge-descriptors/server-descriptors/"
   + "2016-10-05-19-06-17-server-descriptors",
 - "bridge-descriptors/server-descriptor/2016/10/"
 + "bridge-descriptors/2016/10/server-descriptor/"
   + "A/8/A8A5509AD1393C8F36ABD2D8F0DE1BB751926872",
   "bridge-2016-10-02-16-09-00-server-descriptors",
   Integer.valueOf(1),
 @@ -88,28 +88,30 @@ public class SyncPersistenceTest {
   Integer.valueOf(1),
   Integer.valueOf(10)},

 -{"relay-descriptors/consensuses/2016-10-05-19-06-17-consensus",
 +{"relay-descriptors/consensuses/2016-09-20-13-00-00-consensus",
   "relay-
 descriptors/consensus/2016/09/20/2016-09-20-13-00-00-consensus",
   "2016-09-20-13-00-00-consensus",
   Integer.valueOf(1),
   Integer.valueOf(1)},

  {"bridge-descriptors/statuses/"
 - +
 "20161005-190617-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1",
 - "bridge-descriptors/statuses/2016/09/20/"
 + +
 "20160920-063816-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1",
 + "bridge-descriptors/2016/09/statuses/20/"
   +
 "20160920-063816-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1",
   "20160920-063816-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1",
   Integer.valueOf(1),
   Integer.valueOf(1)},

  {"relay-descriptors/microdescs/micro/2016-10-05-19-06-17-micro",
 - "relay-descriptors/microdescs/2016/10/micro/f/b/"
 + "relay-descriptors/microdesc/2016/10/micro/f/b/"
   +
 "fbf0c4cb8216d950d267866ea879880e69216bd0b46878ff0f51941ee4eca707",
   "2016-09-20-15-31-19-micro",
   Integer.valueOf(1),
   Integer.valueOf(5)},

 -{"relay-descriptors/votes/2016-10-05-19-06-17-votes",
 +{"relay-descriptors/votes/2016/10/01/2016-10-01-16-00-00-vote"
 + + "-0232AF901C31A04EE9848595AF9BB7620D4C5B2E"
 + + "-FEE63B4AB7CE5A6BDD09E9A5C4F01BD61EB7E4F1",
   "relay-descriptors/vote/2016/10/01/2016-10-01-16-00-00-vote"
   + "-0232AF901C31A04EE9848595AF9BB7620D4C5B2E"
   + "-FEE63B4AB7CE5A6BDD09E9A5C4F01BD61EB7E4F1",
 @@ -118,7 +120,9 @@ public class SyncPersistenceTest {
   Integer.valueOf(1),
   Integer.valueOf(1)},

 -{"relay-descriptors/votes/2016-10-05-19-06-17-votes",
 +{"relay-descriptors/votes/2016/09/20/2016-09-20-13-00-00-vote-"
 + + "49015F787433103580E3B66A1707A00E60F2D15B"
 + + "-60ADC6BEC262AE921A1037D54C8A3976367DBE87",
   "relay-descriptors/vote/2016/09/20/2016-09-20-13-00-00-vote-"
   + "49015F787433103580E3B66A1707A00E60F2D15B"
   + "-60ADC6BEC262AE921A1037D54C8A3976367DBE87",
 @@ -128,7 +132,7 @@ 

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-13 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 I think below I reply to all of [comment:25] except the paths structures,
 which will have to wait for the commit of the tests.
 Changes can be found in the new commit of the
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=aafedaa74e8451303e8cdaa882fd2821927d29ab second branch].

 > ...  - All classes in the new `persist` package contains unused imports
 or unused attributes.  Can you remove those in a fixup commit?

 Removed in second branch.

 >
 >  - Can we somehow use `printf` patterns for putting together paths to
 make that bunch of `DescriptorPersistence` classes easier to read?  It
 should be possible to express all paths using placeholders for
 dates/times, strings, and chars.  And if we care about path separators
 being platform independent, we could split the resulting path at `/`
 characters and put it back together using `Paths.get()`.  Untested example
 for a bridge extra-info descriptor:
 >
 > {{{
 > String.format("bridge-descriptors/%tY/% desc.getPublishedMillis(),
 > desc.getExtraInfoDigest().charAt(0),
 > desc.getExtraInfoDigest().charAt(1),
 > desc.getExtraInfoDigest());
 > }}}
 >

 What makes this more readable to me, is the code formatting, i.e. one line
 per path element.  Whereas, I don't read string format patterns that
 often, which makes it less readable to me.  (String.format is also locale
 dependent and would have to have an additional argument. )
 Added the one line per path element to the new branch as well as some
 comments for the path elements. Hope this works for you too?

 >  - Building upon the previous idea, maybe we can avoid having
 `DescriptorPersistence` classes at all if all we need is a type annotation
 and two `printf` patterns.  We could define an order of `printf` arguments
 for all patterns, like: published, received, source, digest, first char of
 digest, second char of digest.  That would really save a few lines of
 code, wouldn't it?  Maybe something for later?

 The path deriving code surely can be done nicer.  The main goal for the
 current code was to extract the somewhat hidden (in huge methods/classes)
 path structures and have them in a defined location, which opens the door
 for future simplifications.
 Having persistence classes might be useful for other types of persistence
 or the statistics (cf. #20335, #20345).

 > ...
 >  - In `ConsensusPersistence`, the check for `null ==
 desc.getConsensusFlavor()` seems too fragile with respect to flavors we'll
 add in the future.  We wouldn't notice except for microdesc consensuses
 being overwritten.  We should check the actual consensus flavor if it's
 not `null`.

 If flavor is something other than "microdesc", the consensus type will be
 plain consensus.

 >
 >  - In `DescriptorPersistence`, I wonder if we should not accept
 `StandardOpenOption` or even `StandardOpenOption ...` to distinguish
 replace vs. append but rather just a boolean.  It seems like an
 implementation level detail we're handing out to the caller.  Are we sure
 we're implementing all options and in particular combination of options
 correctly?  Those checks where we're only looking at how many such option
 values we're being given look really fragile.  If we only want to support
 replace vs. append, we should only provide those two options in the
 interface.
 >
 >  - Same class, method `storeAll()` should be more explicit in the method
 name or at least documentation that `recent` will only be written if
 writing `archive` is successful.
 >

 The StandardOpenOptions are defined in the java api, so there is no
 implementation to be done.  There are more than two options of interest
 (afaict): create_new, append, and truncate_existing, i.e. do not do
 anything in case the file exists, append, and replace.
 I added more javadoc and overloaded the methods, so we have methods w/o
 options, and other methods with exactly one option per file.

 >  - Same class, methods `storeRecent()` and `storeOut()` might have wrong
 argument orders in their `Paths.get()` calls, but I'm not sure.

 I renamed the parameters to `recentRoot` and `outRoot` to better express
 their meaning.

 >
 >  - In `PersistenceUtils`, method `storeToFileSystem()`, that check for
 `append.length == 0` could be changed to `append.length <= i` to make the
 method a tiny bit 

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-13 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please clarify the path differences from comment:25 above.  I've been
 looking at the paths and the protocol for too long now that I caught some
 path-blindness (temporary).
 It would be very helpful, if you could use the test-cases listed
 
[https://gitweb.torproject.org/user/iwakeh/collector.git/tree/src/test/java/org/torproject/collector/sync/SyncPersistenceTest.java?h=task-18910
 -first-sync#n59 here] and indicate what should be changed, i.e. basically
 which paths are expected.  Then the tests can verify the structure.

 The protocol might need to be changed, if this path `./out/bridge-
 descriptors/2016/10/extra-
 infos/0/0/01764ef8b8b5bc9ed70b9e99225112ffd04` is  correct. It
 
[https://gitweb.torproject.org/collector.git/tree/src/main/resources/docs/PROTOCOL#n284
 says]:
 {{{
 5.2 'bridge-descriptors' below 'out'

'bridge-descriptors' contains the following subdirectories:

* extra-infos
* server-descriptors
* statuses

 5.2.1
'extra-infos' and 'server-descriptors' have the following
subdirectory structure

year SEP month SEP first SEP second

Where year is derived from the published date.
'first' and 'second' are the first and second symbol from the
router-digest, which also serves as the filename for the files
in the 'second' level directories.

Tars are named according to section 2.3 and have the following
substructure using the definitions from 2.3:

BRIDGE DASH marker DASH year DASH month SEP first SEP second

 5.2.2
'statuses' have a different substructure

year SEP month SEP day

 }}}

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-13 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:42 karsten]:
 >  ...
 > > >  - Totally unrelated to this commit: should we kill the updateindex
 module and instead have an `IndexManager` with a `synchronized update()`
 method that gets called at the end of each module?  Happy to move this to
 another ticket, but it just came to mind while looking at all the
 options...
 > >
 > > Currently, there is also the create tars script changing the file
 structure.  This should become java code.  => new ticket?
 >
 > Yes, please.
 >
 > > Index-update could then be changed from being a module into a feature
 of module runs.  => second ticket?
 >
 > Yes, please.

 Created #20350 and #20351.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-13 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:41 iwakeh]:
 > Replying to [comment:40 karsten]:
 > > ...
 > >  - Should we hold back the four `*Sync*` options until there's actual
 support for syncing?  I don't feel strongly about this, because we're
 going to add syncing support really soon.  But just in case you say "oh
 wow, I totally forgot to take them out, let me fix that!", okay.  Either
 way works for me.
 >
 > I extended the commit comment (see link below).
 >
 > >
 > >  - The new options `RelayLocalOrigins` and `BridgeLocalOrigins` each
 only accept a single path.  Should we rename them to singular then?
 Again, don't feel strongly.
 >
 > Thought about that , too, but finally decided to keep it a *Options.
 >
 > >
 > >  - The new option `BridgeSources` does not support `Remote` as stated
 in the comment of `collector.properties`.
 >
 > Corrected in
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=02681a4d9722bfb21516693366be194c90910e5b here] (amended).

 Cherry-picked and pushed to master.  Thanks!

 > >  - Totally unrelated to this commit: should we kill the updateindex
 module and instead have an `IndexManager` with a `synchronized update()`
 method that gets called at the end of each module?  Happy to move this to
 another ticket, but it just came to mind while looking at all the
 options...
 >
 > Currently, there is also the create tars script changing the file
 structure.  This should become java code.  => new ticket?

 Yes, please.

 > Index-update could then be changed from being a module into a feature of
 module runs.  => second ticket?

 Yes, please.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:40 karsten]:
 > ...
 >  - Should we hold back the four `*Sync*` options until there's actual
 support for syncing?  I don't feel strongly about this, because we're
 going to add syncing support really soon.  But just in case you say "oh
 wow, I totally forgot to take them out, let me fix that!", okay.  Either
 way works for me.

 I extended the commit comment (see link below).

 >
 >  - The new options `RelayLocalOrigins` and `BridgeLocalOrigins` each
 only accept a single path.  Should we rename them to singular then?
 Again, don't feel strongly.

 Thought about that , too, but finally decided to keep it a *Options.

 >
 >  - The new option `BridgeSources` does not support `Remote` as stated in
 the comment of `collector.properties`.

 Corrected in
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=02681a4d9722bfb21516693366be194c90910e5b here] (amended).

 >
 >  - Totally unrelated to this commit: should we kill the updateindex
 module and instead have an `IndexManager` with a `synchronized update()`
 method that gets called at the end of each module?  Happy to move this to
 another ticket, but it just came to mind while looking at all the
 options...

 Currently, there is also the create tars script changing the file
 structure.  This should become java code.  => new ticket?
 Index-update could then be changed from being a module into a feature of
 module runs.  => second ticket?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:35 iwakeh]:
 > Replying to [comment:31 karsten]:
 > > Regarding your comment 29, yes, picking shorter names sounds fine to
 me.  I would avoid renaming existing config options, but picking better
 names for new config options or values sounds great.  Please prepare a
 commit with the described changes.  Thanks!
 >
 >
 > Please review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=36d0b44f95d0cbf4cd38eb1dfd8a0584ffb7f194 this commit]
 with the additional changes.

 Thanks!  Here are some minor comments on 36d0b44:

  - Should we hold back the four `*Sync*` options until there's actual
 support for syncing?  I don't feel strongly about this, because we're
 going to add syncing support really soon.  But just in case you say "oh
 wow, I totally forgot to take them out, let me fix that!", okay.  Either
 way works for me.

  - The new options `RelayLocalOrigins` and `BridgeLocalOrigins` each only
 accept a single path.  Should we rename them to singular then?  Again,
 don't feel strongly.

  - The new option `BridgeSources` does not support `Remote` as stated in
 the comment of `collector.properties`.

  - Totally unrelated to this commit: should we kill the updateindex module
 and instead have an `IndexManager` with a `synchronized update()` method
 that gets called at the end of each module?  Happy to move this to another
 ticket, but it just came to mind while looking at all the options...

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 microdesc-sync ticket #20345 added.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:37 karsten]:
 > And here are the last three commits from your initial branch, starting
 with 5eb89c1:

 Thanks for working your way through and all the useful
 comments/suggestions!
 (I still need to reply to some.)

 >
 >  - Looks good, modulo whitespace issues that should become visible with
 checkstyle.

 The commits in the second branch all are checkstyle-compliant.

 >
 > And 4722ba1:
 >
 >  - I'd want to avoid merging this and instead investigate the problem
 further.  How would I reproduce the OOM?  Run the previous commit with
 syncing from the main CollecTor instance and wait for how many hours?

 Yes, use the collector jar from the previous commit.  Run it in a new
 location, i.e. with empty recent and out.  Configure relaydescs and sync.
 RunOnce will be sufficient the OOM heap dump contains very many Reference
 objects.  (Your assumption in one of the previous comments hinted in the
 right direction, I think, but I couldn't verify that yet.)

 The sync into an empty place approximately takes an hour depending on the
 internet connection.

 >
 > Last and probably least, d2fcb1b:
 >
 >  - We should simply squash this one into the earlier commit that added
 these lines.  And we should add a note to `collector.properties` that
 microdescs are exempt from syncing.  And we should create a ticket as a
 reminder to add microdescs to the sync part later on.

 I just don't add microdescs-sync to the new branch in first place and will
 open the new ticket.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 And here are the last three commits from your initial branch, starting
 with 5eb89c1:

  - Looks good, modulo whitespace issues that should become visible with
 checkstyle.

 And 4722ba1:

  - I'd want to avoid merging this and instead investigate the problem
 further.  How would I reproduce the OOM?  Run the previous commit with
 syncing from the main CollecTor instance and wait for how many hours?

 Last and probably least, d2fcb1b:

  - We should simply squash this one into the earlier commit that added
 these lines.  And we should add a note to `collector.properties` that
 microdescs are exempt from syncing.  And we should create a ticket as a
 reminder to add microdescs to the sync part later on.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 The naming discussion should continue in #20162.  Please, see comment 6 in
 #20162.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:31 karsten]:
 > Regarding your comment 29, yes, picking shorter names sounds fine to me.
 I would avoid renaming existing config options, but picking better names
 for new config options or values sounds great.  Please prepare a commit
 with the described changes.  Thanks!


 Please review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=36d0b44f95d0cbf4cd38eb1dfd8a0584ffb7f194 this commit]
 with the additional changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 1e07139 looks good.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Next in line is 4efd190:

  - In `SyncManager`, I think we'll have to obtain the `collectionDate`
 from the corresponding `CollecTorMain` subclass somehow, so that we're
 appending descriptors to the same files as previously downloaded or
 locally read descriptors.  Otherwise we'd have two files in the `recent/`
 directory per descriptor type, making the index larger than necessary.
 Let's avoid that if we can.

  - As stated in an earlier comment, there are unused variables and imports
 in this commit that should be removed.

  - Still in `SyncManager`, method `mergeWithLocalStorage()`, it seems that
 we're using the exact same path for parse history files for different
 descriptors.  Example: we're reading descriptor type A with history file
 X, and when we're done we're reading descriptor type B with the same
 history file X.  The effect is that we're not skipping any B descriptors
 and overwriting everything in X that has to do with A.  Basically, we're
 reading the entire directory of synced descriptors in every execution.
 What we need is a distinct history file per descriptor type.  Or we could
 use a single descriptor reader regardless of descriptor type with a single
 history file.  By the way, this could have caused your OOM troubles.

  - Your log statement "Done reading {} of type {}." is misleading.  The
 `readDescriptors()` method returns immediately, but that does not at all
 mean that there are any descriptors in the iterator at that point.  A
 better message would be "Started reading ...".

  - In `SyncPersistence`, method `storeDescs()`, the documentation says
 "OutputDirectory" where it should say "OutputPath".

  - Regarding the three `//` comments in `storeDescs()`: it's indeed
 unfortunate that we need to infer the authority fingerprint of a bridge
 network status from the file name.  We should consider adding a new line
 for that to sanitized bridge network statuses.  That's entirely unrelated
 to the changes here but probably worth a ticket.

  - The second comment there is probably moot with microdescriptor syncing
 being postponed.

  - Regarding the third comment about exit lists, we're currently using the
 initial download time for storing in `out/` and `recent/`.  But I think
 that's something to ensure in `ExitlistPersistence`, not here, so the code
 here can stay unchanged.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Regarding your comment 29, yes, picking shorter names sounds fine to me.
 I would avoid renaming existing config options, but picking better names
 for new config options or values sounds great.  Please prepare a commit
 with the described changes.  Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Adapted the changelog and pushed that other commit for `OutPath` to
 master.  Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Oh, I just noticed the comments 25 and 26.  But, I guess the my reply to
 26, 27 won't change after reading them.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 reply to comment 28:  adapted changelog, this can go to master.

 reply comment 27:
 Hmm, let me think:

 Well, `*Sources` is in use for the urls data is retrieved from, but these
 together with the path values for local imports could be named `*Origins`
 or similar.

 The naming of all descriptor sources should be more generic.  Actually,
 ImportDirectory, and SanitizedTarballs, all refer to a local import, i.e.
 the import SourceType enum provides: Cache, Local, Remote, Sync.
 The corresponding sources properties could be `Origin`, e.g.
 RelayCacheOrigins, BridgeLocalOrigins, BridgeSyncOrigins, etc.

 That would shorten the properties list.

 Ok, I can provide a commit with these changes, if we agree?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:24 iwakeh]:
 > Here is the
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=d8ae07bf9c6a86b9018889806b4d87920369443d second branch],
 so far only the properties changes.
 >
 > Checkstyle doesn't complain about these changes.

 Looks good!  The only request would be that we make the change log entry
 slightly more verbose, like: "Replace four properties for configuring
 where to write descriptors by a single 'OutPath' property."  Happy to make
 that change while merging.  Should I merge this commit to master now?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:26 karsten]:
 > Let's think about it.  We have 1 module with soon 4 descriptor sources
 and 2 modules with soon 2 descriptor sources.  How would you define config
 options to let operators activate each of the modules with any possible
 combination of descriptor sources?  And keep in mind that we might want to
 add a 5th or 3rd descriptor source in the future.

 Quick idea: how about we accept a list of descriptor sources for each
 module rather than have 4 or 2 separate boolean options?  Example:

 {{{
 RelaydescsSources ImportCached, ImportDirectory, Download, Sync
 BridgedescsSources SanitizeTarballs, Sync
 ExitlistSources Download, Sync
 }}}

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:23 iwakeh]:
 > This way of configuring only works for the current relaydescs module
 (whose properties are supposed to be removed partially in #20162).

 We're planning to remove properties there, but none of those properties
 would enable or disable a descriptor source.

 > The other two modules, bridgedescs and exitlists, don't have any
 additional properties that could be used for avoiding the 'SyncOnly'
 option.

 Yes, we'd have to add two boolean options for those rather than one tri-
 state.

 > In general, it is nicer to have a configuration value that expresses
 what it means.  The different download options for the relaydescs module
 where initially quite confusing.

 I admit, I forget to enable descriptor sources from time to time myself
 when setting up a new CollecTor with relaydescs module for testing.

 > But maybe, I complicate things?  So what other 'streamlined' way could
 achieve the SyncOnly setting?

 Let's think about it.  We have 1 module with soon 4 descriptor sources and
 2 modules with soon 2 descriptor sources.  How would you define config
 options to let operators activate each of the modules with any possible
 combination of descriptor sources?  And keep in mind that we might want to
 add a 5th or 3rd descriptor source in the future.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Alright, here's a review of f4026fc:

  - All classes in the new `persist` package contains unused imports or
 unused attributes.  Can you remove those in a fixup commit?

  - Can we somehow use `printf` patterns for putting together paths to make
 that bunch of `DescriptorPersistence` classes easier to read?  It should
 be possible to express all paths using placeholders for dates/times,
 strings, and chars.  And if we care about path separators being platform
 independent, we could split the resulting path at `/` characters and put
 it back together using `Paths.get()`.  Untested example for a bridge
 extra-info descriptor:

 {{{
 String.format("bridge-descriptors/%tY/%https://trac.torproject.org/projects/tor/ticket/18910#comment:25>
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Here is the
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-18910
 -first-sync-2=d8ae07bf9c6a86b9018889806b4d87920369443d second branch],
 so far only the properties changes.

 Checkstyle doesn't complain about these changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:22 karsten]:
 > ...
 > Here's how I'd explain configuring the relaydescs module to a new
 operator: "Want to collect relay descriptors?  Sure, just activate that
 module and also enable one or more descriptor sources: download from the
 directory authorities, read from a local directory, read from local cached
 descriptors, synchronize from other CollecTor instances.  If you don't
 enable any descriptor sources, you won't get any descriptors."
 >
 > This feels easier than: "Want to collect relay descriptors?  Sure, just
 activate that module and also enable between zero and three descriptors
 sources, and also consider synchronizing from other CollecTor instances."
 >

 This way of configuring only works for the current relaydescs module
 (whose properties are supposed to be removed partially in #20162).
 The other two modules, bridgedescs and exitlists, don't have any
 additional properties that could be used for avoiding the 'SyncOnly'
 option.
 In general, it is nicer to have a configuration value that expresses what
 it means.  The different download options for the relaydescs module where
 initially quite confusing.
 But maybe, I complicate things?  So what other 'streamlined' way could
 achieve the SyncOnly setting?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Replying to [comment:21 iwakeh]:
 > Replying to [comment:20 karsten]:
 > > [...] Shouldn't we instead remove those config options entirely, [...]
 >
 > This is only a warning that these will be deprecated.  They're still
 used, but already adressed in #20162.  But they should be removed here.
 That wouldn't be a small commit, though, and I'd rather have it separate.
 I'll add a new branch just up to that commit later today.

 Okay.  I was confused by the comment saying "only OutputPath is used".
 Looking forward to the new branch you're mentioning.

 > >  - Speaking of, can you include a change log entry for this commit?
 >
 > Sure!

 Thanks!

 > > [...] I wonder if we could simplify the configuration by avoiding that
 tri-state SyncType option and turning it into a boolean. [...]
 >
 > Sync-Options (this description should be added to package-info, when we
 agree):
 > There are currently three modules that can be synced: relaydescs,
 bridgedescs, and exitlists.
 >
 > A module can be turned on or off via its `*Activated` option, which is
 only configurable at start-up and determines if anything is run from this
 module.
 >
 > The runtime configurable tri-state `Sync*` option values are
 >
 > * `NoSync`:  for simply running the module w/o fetching additional data.
 Useful when just the directly (from the Tor network) accessible data is of
 interest, or when having access to bridgedescs etc.
 > * `Sync` performs the module run and then fetches data from other
 instances.
 > * `SyncOnly` just fetches data from other instances.  Useful for mirrors
 that don't have access to bridgedescs, or that fetch from a main instance
 they trust and don't have net access otherwise, etc.
 >
 > `Sync*` options can be adapted during runtime.  It is possible to switch
 to syncing and then turn it off during runtime.
 >
 > So, reducing the tri-state would complicate the combination of
 'activated' and sync-settings, which is separated from sync, i.e.
 `Scheduler` doesn't and shouldn't now about syncing.

 Hmm, I don't understand how the change I suggested would affect the
 ability to reconfigure at runtime.  I admit I didn't read that code yet
 (which comes in later commits, I think), but in theory it should still be
 possible to enable or disable syncing if it's yet another descriptor
 source.

 Here's how I'd explain configuring the relaydescs module to a new
 operator: "Want to collect relay descriptors?  Sure, just activate that
 module and also enable one or more descriptor sources: download from the
 directory authorities, read from a local directory, read from local cached
 descriptors, synchronize from other CollecTor instances.  If you don't
 enable any descriptor sources, you won't get any descriptors."

 This feels easier than: "Want to collect relay descriptors?  Sure, just
 activate that module and also enable between zero and three descriptors
 sources, and also consider synchronizing from other CollecTor instances."

 The three modules could still use the same code for synchronizing, so I'm
 not asking to change much code here.  But I'm worried that the tri-state
 config option will be harder to grasp for operators than it has to be.

 > >  - I found at least one long line that checkstyle should complain
 about, though I didn't run it myself.
 >
 > Oh, I usually run this at the last commit.  Will check again.

 Okay, no rush. :)

 > > In case you want to start working on any of these comments, can you
 please write `--fixup` commits that resolve those issues in this
 particular commit?  In any case, please don't modify commits 2 to 7 in
 that branch at this point, because I already started reviewing those.
 >
 > As said above, I'll add another branch with commits we agree on and in a
 way that the commits make sense in the final master version.

 Thanks!

 > > More tomorrow.  Thanks!
 >
 > Thanks, for reading your way through all that un-commented code!

 Still reviewing the other other commits...

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Replying to [comment:20 karsten]:
 > ...
 >  - It seems we're deprecating a few config options by adding a comment
 to the default `collector.properties` and not paying attention to those
 options in the code anymore.  Shouldn't we instead remove those config
 options entirely, so that operators notice for sure that they need to
 change these config options?  We could mention this in the change log as
 medium change.
 >

 This is only a warning that these will be deprecated.  They're still used,
 but already adressed in #20162.  But they should be removed here.  That
 wouldn't be a small commit, though, and I'd rather have it separate.  I'll
 add a new branch just up to that commit later today.

 >  - Speaking of, can you include a change log entry for this commit?
 >

 Sure!

 >  - I wonder if we could simplify the configuration by avoiding that tri-
 state SyncType option and turning it into a boolean.  Consider
 `ImportCachedRelayDescriptors`, `ImportDirectoryArchives`, and
 `DownloadRelayDescriptors` in the relaydescs module.  We could just add a
 fourth option `SyncRelayDescriptors` that can be `true` or `false`.
 Basically, syncing descriptors from other CollecTor instances would be the
 fourth source for collecting relay descriptors.  This shouldn't change
 much in the code you wrote, but it might make things a bit simpler for
 future operators.  For bridge descriptors and exit lists there would have
 to be two new options to activate the current sources, that is, sanitize
 bridge descriptors found in a local directory or download exit lists from
 the exit list server.
 >

 Sync-Options (this description should be added to package-info, when we
 agree):
 There are currently three modules that can be synced: relaydescs,
 bridgedescs, and exitlists.

 A module can be turned on or off via its `*Activated` option, which is
 only configurable at start-up and determines if anything is run from this
 module.

 The runtime configurable tri-state `Sync*` option values are

 * `NoSync`:  for simply running the module w/o fetching additional data.
 Useful when just the directly (from the Tor network) accessible data is of
 interest, or when having access to bridgedescs etc.
 * `Sync` performs the module run and then fetches data from other
 instances.
 * `SyncOnly` just fetches data from other instances.  Useful for mirrors
 that don't have access to bridgedescs, or that fetch from a main instance
 they trust and don't have net access otherwise, etc.

 `Sync*` options can be adapted during runtime.  It is possible to switch
 to syncing and then turn it off during runtime.

 So, reducing the tri-state would complicate the combination of 'activated'
 and sync-settings, which is separated from sync, i.e. `Scheduler` doesn't
 and shouldn't now about syncing.

 >  - I found at least one long line that checkstyle should complain about,
 though I didn't run it myself.

 Oh, I usually run this at the last commit.  Will check again.

 >
 > In case you want to start working on any of these comments, can you
 please write `--fixup` commits that resolve those issues in this
 particular commit?  In any case, please don't modify commits 2 to 7 in
 that branch at this point, because I already started reviewing those.

 As said above, I'll add another branch with commits we agree on and in a
 way that the commits make sense in the final master version.

 >
 > More tomorrow.  Thanks!

 Thanks, for reading your way through all that un-commented code!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-11 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by karsten):

 Okay, I started reviewing those seven commits, but I'm running out of
 brains here.  However, before getting back to reviewing tomorrow, I
 figured I could share my thoughts on the first of the seven commits,
 16e1c84:

  - It seems we're deprecating a few config options by adding a comment to
 the default `collector.properties` and not paying attention to those
 options in the code anymore.  Shouldn't we instead remove those config
 options entirely, so that operators notice for sure that they need to
 change these config options?  We could mention this in the change log as
 medium change.

  - Speaking of, can you include a change log entry for this commit?

  - I wonder if we could simplify the configuration by avoiding that tri-
 state SyncType option and turning it into a boolean.  Consider
 `ImportCachedRelayDescriptors`, `ImportDirectoryArchives`, and
 `DownloadRelayDescriptors` in the relaydescs module.  We could just add a
 fourth option `SyncRelayDescriptors` that can be `true` or `false`.
 Basically, syncing descriptors from other CollecTor instances would be the
 fourth source for collecting relay descriptors.  This shouldn't change
 much in the code you wrote, but it might make things a bit simpler for
 future operators.  For bridge descriptors and exit lists there would have
 to be two new options to activate the current sources, that is, sanitize
 bridge descriptors found in a local directory or download exit lists from
 the exit list server.

  - I found at least one long line that checkstyle should complain about,
 though I didn't run it myself.

 In case you want to start working on any of these comments, can you please
 write `--fixup` commits that resolve those issues in this particular
 commit?  In any case, please don't modify commits 2 to 7 in that branch at
 this point, because I already started reviewing those.

 More tomorrow.  Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-11 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_review
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-
Changes (by iwakeh):

 * status:  assigned => needs_review


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-10-11 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  assigned
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-

Comment (by iwakeh):

 Please review the seven commits on top of
 [https://gitweb.torproject.org/user/iwakeh/collector.git?h=task-18910
 -first-sync this branch].

 There are two new packages:
 `sync` for sync-merge functionality and `persist` as new modular way to
 persist descriptors (currently file-system, but could be extended or
 changed in future).  The latter should step by step be used for all
 persisting of descriptors, i.e. be used instead of the `store*` methods
 throughout the various modules.  (that is useful for removing the tight
 circular coupling of ArchiveWriter, DescriptorDownloader and
 DescriptorParser for example).

 Persisting is based on `DescriptorPersistence` defining methods for
 storing.  The classes extending `DescriptorPersistence` just need to
 define the explicit storage path.  For convenience
 `PersistenceUtils`provides date-time to string methods. Thus, providing
 methods for code that is repeatedly defined in the current code base.

 `CollecTorMain` extends `SyncManager` in a way that all synchronization
 options can be configured during runtime, i.e. syncing of a module can be
 turned on or off and sources can be changed without restart.

 (I'll add package-info later for the two packages.)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-09-19 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  assigned
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-
Changes (by iwakeh):

 * status:  needs_information => assigned


Comment:

 Thanks for the detailed replies!
 Setting to assigned.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-09-16 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Alright, I read the wiki page and the comment above.  Just two
 clarifications of what I meant above:

  - The only code with permission to write to (and delete from)
 `SyncFolder` should be `DescriptorCollector`, which would also delete
 files as soon as they disappear remotely.  We shouldn't move away or
 delete files while going through that directory and looking at
 descriptors, because that would mean that `DescriptorCollector` would have
 to download them again next time.  Every time it runs.

  - There will now be two cases where we want to write a descriptor and
 need to check if we already have it: 1) when downloading it locally and 2)
 when syncing from another CollecTor instance.  In the first case, if a
 file already exists and has different contents, we now overwrite it.  It
 could be a descriptor we synced from another instance in a previous run or
 obtained earlier by downloading it from somewhere.  Note that we're
 looking at our `out/` directory to decide whether we already have a
 descriptor, not at our `recent/` directory.  In the second case, we simply
 don't store the descriptor.  I think that's fine as initial strategy.

 If this all makes sense, feel free to work on the code, and I'll take a
 look once there's something to review.  If it doesn't make sense yet, feel
 free to ask more questions.  Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-09-16 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by iwakeh):

 Thanks for the remarks and suggestions!
 I'm replying inline below and also add a wiki page
 [wiki:doc/CollecTor/DescriptorDistribution CollecTor Sync] that contains
 the current status of the discussion.  Please, take a look there to see
 the entire picture.

 Replying to [comment:14 karsten]:
 > Hmm, the suggested config options would imply that there's only one new
 sync manager module that syncs all descriptors from the various sources
 and that runs, say, once per hour?  I wonder how to schedule that in a way
 that it does not interfere with the other modules.  So far, modules were
 pretty much independent, but this new module would create a dependency
 between modules.

 You're right, they should stay independent.  I intended that, too, but I
 had a different (more complicated) architecture in mind.

 >
 > Alternative suggestion: we add four (sets of) configurations, one for
 each module, that internally re-use the same code for syncing descriptors
 and for importing them.  For example, `SyncRelayDescriptors`,
 `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`.

 Good idea! So we run the sync-function after or instead of the module run
 (see wiki page for more).

 > We could then provide a remote path where to find descriptor files (like
 `/recent/relay-descriptors/`) and could implictly only consider descriptor
 types that the respective module understands (like
 `RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
 `BridgeServerDescriptor`).

 Actually, the directory structure of a CollecTor's 'recent' is given, i.e.
 the different mirrors won't or shouldn't use a different directory
 sructure than the main instance.  So, it suffices to activate the module
 and set the sync or sync-only option.  The path structure for the actual
 download is determined. The straightforward paths for torperf and
 exitlists and the more complex structure for bridge- and relay-
 descriptors.

 >
 > Here's a potential policy we could apply to decided whether to keep a
 local or remote descriptor: while syncing, if we find out that a remotely
 obtained descriptor would be stored under a file name that already exists
 locally, we always discard that;...

 So, //while syncing// means while retrieving descriptors from a different
 instance and writing them to the local `SyncFolder` structure.  And,
 during this process descriptors already available in the sync-folder are
 not replaced.

 > ... and while processing descriptors locally, if we find that we already
 have a file locally with different content, which we likely received while
 syncing, we always overwrite that.  This means that we're only adding data
 but never replacing data.

 This refers to the process of comparing the descriptors fetched from
 remote instances with descriptors already in the 'recent' folder of the
 syncing instance?  Such local descriptors could have been obtained by
 direct download or a different syncing operation. Did I miss something
 here?

 >
 > Regarding deleting synced descriptors, we should never do that, but we
 should rather let `DescriptorCollector` clean up the local directory when
 it finds that a local file does not exist anymore remotely.

 True, if this refers to descriptors in the SyncFolder.

 >
 > Here's something else to watch out for while writing this code: whenever
 we learn descriptors from syncing, we'll have to include them in our
 `/recent/` directory, too.  This wasn't entirely clear to me from the
 description above, so if this was already the plan, never mind.

 That was intended, but should be clearly stated; will be added to the wiki
 page.

 Hope I don't see things too complicated.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-09-15 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Hmm, the suggested config options would imply that there's only one new
 sync manager module that syncs all descriptors from the various sources
 and that runs, say, once per hour?  I wonder how to schedule that in a way
 that it does not interfere with the other modules.  So far, modules were
 pretty much independent, but this new module would create a dependency
 between modules.

 Alternative suggestion: we add four (sets of) configurations, one for each
 module, that internally re-use the same code for syncing descriptors and
 for importing them.  For example, `SyncRelayDescriptors`,
 `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`.  We
 could then provide a remote path where to find descriptor files (like
 `/recent/relay-descriptors/`) and could implictly only consider descriptor
 types that the respective module understands (like
 `RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
 `BridgeServerDescriptor`).

 (If we're worried that there are too many config options already, I'm more
 than happy to make a list of options that can go away!  But this shouldn't
 mean we should hold back useful new options.)

 Here's a potential policy we could apply to decided whether to keep a
 local or remote descriptor: while syncing, if we find out that a remotely
 obtained descriptor would be stored under a file name that already exists
 locally, we always discard that; and while processing descriptors locally,
 if we find that we already have a file locally with different content,
 which we likely received while syncing, we always overwrite that.  This
 means that we're only adding data but never replacing data.

 Regarding deleting synced descriptors, we should never do that, but we
 should rather let `DescriptorCollector` clean up the local directory when
 it finds that a local file does not exist anymore remotely.

 Here's something else to watch out for while writing this code: whenever
 we learn descriptors from syncing, we'll have to include them in our
 `/recent/` directory, too.  This wasn't entirely clear to me from the
 description above, so if this was already the plan, never mind.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-09-15 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  High   |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---
Changes (by iwakeh):

 * priority:  Medium => High


Comment:

 The following is a summary of the discussion above and elsewhere, and
 should give an overview of the first sync-version functionality.

 == Functionality and design of descriptor distribution in CollecTor 1.1.0
 === Configuration
 1. General settings
  Add a SyncManager configuration in the Scheduler section of the
 properties file.
  Property `SyncFolder` contains the path for storing the downloded
 descriptors.
 1. Choice of sync-sources
  Add a configuration property `SyncSources` containing an array of strings
 specifying a source name and source URL for each CollecTor instance to
 retireve descriptors from. This setup is similar to the current torperf
 configuration.
 1. Choice of descriptors
  Add a configuration property `SyncDescriptorLists`, which will contain
 comma separated lists (separated by space) with a source name defined in
 `SyncSources` and a list of descriptor designations.
 1. Backup of replaced local files
  if `KeepReplaceBackup` is set to true, keep a copy of the old local
 descriptors in `BackupFolder`.
 === SyncManager
 The SyncManager module will be started by the Scheduler accordinng to the
 configuration defined above.
 Each SyncManager run will perform the following steps:
 a. Retrieve descriptors from the CollecTor instances defined in
 `SyncSources`.  These descriptors are stored in `SyncFolder` under the
 host part of the instance's url, e.g. {{{my-sync-
 folder/collector.torproject.org/recent/exit-lists}}} for exitlists from
 the main instance.
 b. Following retrieval the fetched descriptors are examined:
   i. discard descriptor files that do not contain what they should (see
 comment:11) and log a warning with sync-source info and reason (see
 criteria).
   i. move valid descriptors (see criteria) without a pre-existing local
 copy to the localstore.
   i. if there is a local copy already, decide which copy to keep (see
 criteria).
 I. local copy is kept, log debug message with source and reason and
 delete fetched descriptor.
 I. local and fetched are identical, log debug message with source and
 reason and delete fetched descriptor.
 I. fetched copy should replace local descriptor. Depending on
 `KeepReplaceBackup` move local copy to `BackupFolder` and move fetched
 copy to main storage. If `KeepReplaceBackup` is false, replace local copy
 by fetched. In all cases log debug message with source and reason.

 === Replacement criteria
 As the replacement criteria are not fully defined yet and it is very
 likely that there will be more criteria in future a modular/pluggable
 approach seems useful, i.e.:
 1. define `KeepCriterium` and `ReplaceCriterium` interfaces
 1. register implementing classes with the SyncManager, which will apply
 these for the selection steps described above.

 == Open Questions
 A. Which `KeepCriterium` and `ReplaceCriterium` classes shuld be
 implemented initially?
  currently there are
  1. a `ReplaceCriterium` keep the consensus with more signatures and
  1. a `KeepCriterium` only keep descriptors that contain what they claim
 to be.
  1. More criteria that should be implemented with release 1.1.0?
 A. Should the applied criteria be configurable?  E.g. this could be done
 by listing the classes in collector.properties, but we have already more
 than fifty config settings, which is a lot.
 A. The data combination mentioned in comment:11 part two is not yet
 considered, but the design will be open to add this later.
  Anyway some questions: What kind of data enhancement could be there? What
 about descriptor signatures?

 -
 Set to high in order to solve the open questions quickly.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-08-18 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by iwakeh):

 depends on #19934 (and #19791 as mentioned above)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-08-01 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Moving forward here, after thinking about this problem a bit more.  I'd
 say let's give up on the `@source` annotation idea and, for now, simply
 trust whatever we get from other (trusted) CollecTor instances.  The goal
 should be to start syncing data soon to finally turn the single point of
 failure into many.  And if the spam problem turns out to be a real
 problem, let's solve it.  However, let's keep potentially malicious
 CollecTor instances in mind by taking the following precautions:

  - Allow the operator not only to configure which CollecTor instances to
 sync from, but also let them configure which descriptor types to sync from
 a given instance.  This includes looking at synced descriptor contents and
 skipping unwanted descriptor types (example: bridge descriptor
 "accidentally" contained in synced relay descriptor files).  For example,
 it makes little sense for the primary CollecTor instance to sync bridge
 descriptors from anywhere, because it's the only source for them.  (Oh,
 while writing this, please disregard this suggestion if the plan was to
 limit this feature to relay descriptors anyway.)

  - Check whether the local instance already contains synced data and only
 store remote data if it's better than local data.  For example, it might
 be that a remotely obtained consensus contains fewer signatures than the
 local copy of that consensus, in which case the local copy should be kept.
 But in some cases it's worth adding parts of remote data or even replace
 local data, after being sure that no information gets lost.  Requires a
 per-case consideration.  (Note that this enhancement is not specific to
 syncing from CollecTor mirrors but that it also makes sense to make it for
 fetching from different directory authorities.  It just gets even more
 important now.)

 What precautions did I miss?  And what else is missing to build this?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-08-01 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by iwakeh):

 depends on #19791

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Replying to [comment:7 iwakeh]:
 > Some thoughts:
 >
 > === The CollecTor side
 > Maybe CollecTor (or the Metrics Team) needs a data collection and
 handling policy?
 > (Or, is there anything like that I didn't find yet other than the
 license and of course the Tor-wide privacy goals?)

 There is no explicit policy like that, but it would be useful to document
 that in the medium term.

 I guess a CollecTor policy would make more sense than one that applies to
 all metrics-related products, because then we'd have to either enforce
 that policy for all metrics-related tools or manually confirm that a tool
 conforms to the policy.  Other tools could have their own policies.

 > In general, CollecTor shouldn't attempt to make received data better
 than it is
 > by dropping unwanted things.

 Agreed, and a nice way to phrase this. :)

 > At least not without some defined process.
 > And collected data should only be changed when there is a reason for
 obfuscation or
 > when it is enhanced (e.g. by adding the @source tag).

 Look, that's the beginning of a policy!  I like that.

 > === Handling of //unwanted// data
 > Incomplete unreferenced server descs could be stored differently:
 > * referenced server descs can be stored in the way it is done now and
 > * unreferenced can be kept, but stored seperately.
 >
 > The synch-process could first concentrate on the referenced descriptors.

 I'm not sold on this part with respect to the process.  I can see how
 we're switching from a model where we're trusting everyone (all relays and
 bridges, all directory authorities, all other CollecTor instances) to just
 a small set of nodes (for example, the set of directory authorities listed
 in tor.git at a certain point in time).  But doing so is a major
 engineering effort, whereas continuing to trust everyone and risking to
 get spammed is easy.  Also, once we limit trust we can always go through
 the tarballs and rip out everything we shouldn't have accepted.  Hence,
 I'd say let's handle all data, wanted or unwanted, the same for now.

 But in the future, yes, let's consider doing this.  Once we do we should
 talk to ln5 about his plans to apply certificate transparency concepts to
 create a Tor network data archive, where spam descriptors turned out to be
 a major issue, too.

 > === Regarding the repeated uploads:
 > What is the reason for all these server descriptors gabelmoo received?
 > Is there some benign explanation for the uploads?

 Probably not.  But even if we find the reason and fix this, we cannot undo
 that it happened in the past, we cannot guarantee that there will be no
 future bugs like this one, and we cannot prevent malicious relays from
 flooding the directory authorities with random descriptors without there
 being a bug.  Or did you mean that directory authorities shouldn't accept
 as many descriptors from a single source?  I'm not sure how that would
 work, and for the directory authorities it's not that much of a problem to
 get spammed temporarily.  So, I think we might not be able to fix our
 issue with spam descriptors in the tor daemon.

 > Maybe, we should actually search the old data for more upload frencies
 like the one triggering this discussion?

 We could, but what would we do once we find similar events?  When does a
 malicious descriptor flood begin and what's still expected behavior?  I
 think if we want to solve the descriptor spam problem we'll have to limit
 ourselves to descriptors published by trusted entities and descriptors
 referenced from such descriptors directly or indirectly.

 Sorry for the long response.  It's a difficult problem, it seems.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-26 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by iwakeh):

 Some thoughts:

 === The CollecTor side
 Maybe CollecTor (or the Metrics Team) needs a data collection and handling
 policy?
 (Or, is there anything like that I didn't find yet other than the license
 and of course the Tor-wide privacy goals?)

 In general, CollecTor shouldn't attempt to make received data better than
 it is
 by dropping unwanted things. At least not without some defined process.
 And collected data should only be changed when there is a reason for
 obfuscation or
 when it is enhanced (e.g. by adding the @source tag).

 === Handling of //unwanted// data
 Incomplete unreferenced server descs could be stored differently:
 * referenced server descs can be stored in the way it is done now and
 * unreferenced can be kept, but stored seperately.

 The synch-process could first concentrate on the referenced descriptors.

 === Regarding the repeated uploads:
 What is the reason for all these server descriptors gabelmoo received?
 Is there some benign explanation for the uploads?

 There are two routers uploading more than 5000
 [https://collector.torproject.org/recent/relay-descriptors/server-
 descriptors/2016-06-23-11-05-13-server-descriptors server-descriptors] in
 less than an hour:

 {{{
 router ThePuppetMasterIN 94.23.181.19 9001 0 9030
 router ThePuppetMasterMID 94.23.181.18 9001 0 9030

 grep -c "PuppetMasterIN" /tmp/2016-06-23-11-05-13-server-descriptors
 1800
 grep -c "PuppetMasterMID" /tmp/2016-06-23-11-05-13-server-descriptors
 3596
 }}}

 These two routers shouldn't upload descriptors again and again.
 The descriptors do not differ in relevant fields according to
 [https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n336 dir-
 spec].
 Is this not a problem that should be tackled on the Tor side?


 Maybe, we should actually search the old data for more upload frencies
 like the one triggering this discussion?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-24 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 I haven't fully made up my mind about the following, but maybe it's food
 for thought:

 Recently, gabelmoo's cached-descriptors file contained hundreds of server
 descriptors that had no corresponding extra-info descriptors.  We cannot
 blame gabelmoo for accepting these valid descriptors, and even if we were
 to add a @source tag to these descriptors saying they came only from
 gabelmoo, we wouldn't later go and delete all descriptors by gabelmoo.
 The real problem is that anyone can produce as many descriptors as they
 want.  Neither of the solutions above (which are based on our previous
 discussions) would help here.

 I believe the only fix is to discard relay and bridge descriptors that are
 not referenced from votes or consensuses.  And I know that I stated
 earlier that I'd also want to archive other descriptors.  But I don't see
 yet how to achieve both.

 From an implementation point of view we could build this in two phases: 1.
 we fetch from other CollecTor instances and believe everything we get
 without attaching @source tags, and 2. we create a staging area of some
 sort where we store descriptors that are not referenced yet and delete
 them after a week or so unless we see a descriptor we trust that
 references them.  It's probably smart to do 1. first in order to make
 CollecTor more robust.  We'll have to repackage old tarballs anyway after
 implementing 2., so there's no big rush there.

 Again, not sure yet what to do here.  Sorry for the confusion, but it
 seems it's not easy to do this right.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-23 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by iwakeh):

 Replying to [comment:4 karsten]:
 > ...
 > Overall, I think I'd rather want us to keep things simple here for now
 and think about optimizing later.  What do you think?
 Yes , agreed. Let's just go with the simple download solution and first
 focus on the tagging of the descriptors from different sources.

 There are several approaches here:
 1. Only one @source tag from the direct provider.  Implications:
  * a descriptor downloaded from an authority A will be tagged with A's IP.
  * a descriptor downloaded from another CollecTor B, which received it
 from authority A, will be tagged with B's IP.
  * a descriptor downloaded from another CollecTor B, which received it
 from CollecTor C, will be tagged with B's IP.
  * a known descriptor downloaded again from a different source will be
 ignored.
 1. Several @source tags (no duplicates) from various direct providers.
 Implications hereof are the same as above except for the last one: every
 time a descriptor is seen a @source tag is added to the descriptor.
 1. Create a structure of source tags that keeps the initial source. This
 quickly turns into a very complex situation.

 Wanting to keep it simple and considering that at first the
 trustworthiness of all synchronizing CollecTors is established externally
 1. or 2. might be the way to go. The design should allow for a later
 extension and more complex approach of source designation.


 One question about the initial description:
 > It's important that we'd only add those @source annotations to
 > archived descriptors, not to recent descriptors, or we'd serve those
 > descriptors as new every time we're adding a @source.
 How can the sources be determined when archiving?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Replying to [comment:3 iwakeh]:
 > I agree, the protocol option is too much implementation effort.  The
 protocol design could be made simple by copying the existing protocol, but
 implementation of this protocol and having a service up and running all
 the time answering requests is a lot work and not really necessary.

 Agreed.

 > Regarding you're suggestion for the download-option from 'recent' I'm
 wondering if this could be designed a little more fine grained, in order
 to save a bit bandwidth, processing time, and memory?
 > Usually there are only a few descriptors missing and it is easy to
 determine which document to download. For votes and consensus the download
 url can be constructed directly and for the referenced descriptors it is
 possible to infer (using a directory listing from the remote collector
 instance, e.g. /recent/relay-descriptors/extra-infos/)
 which doc respective url should provide the missing information.
 > Would that be a feasible approach?

 My sense is that we shouldn't worry about bandwidth, processing time, and
 memory yet but instead go for the solution that takes the least
 engineering effort and is hence potentially more robust.

 But I also don't fully understand your suggestion above.  Sure, votes and
 consensuses and in general all files containing just a single descriptor
 could be skipped just from looking at file name, file size, or file last
 modified time.  But how would we handle files containing dozens or even
 hundreds of descriptors?  It seems that those files would be different in
 almost all cases, except when two instances download the exact same
 descriptors in a given hour, which won't happen if one instance reads
 cached-* descriptors or another instance fetches a missing descriptor from
 a third instance.

 Overall, I think I'd rather want us to keep things simple here for now and
 think about optimizing later.  What do you think?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-06-12 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+---
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---
Changes (by iwakeh):

 * status:  assigned => needs_information


Comment:

 I agree, the protocol option is too much implementation effort.  The
 protocol design could be made simple by copying the existing protocol, but
 implementation of this protocol and having a service up and running all
 the time answering requests is a lot work and not really necessary.

 Regarding you're suggestion for the download-option from 'recent' I'm
 wondering if this could be designed a little more fine grained, in order
 to save a bit bandwidth, processing time, and memory?
 Usually there are only a few descriptors missing and it is easy to
 determine which document to download. For votes and consensus the download
 url can be constructed directly and for the referenced descriptors it is
 possible to infer (using a directory listing from the remote collector
 instance, e.g. /recent/relay-descriptors/extra-infos/)
 which doc respective url should provide the missing information.
 Would that be a feasible approach?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

2016-04-28 Thread Tor Bug Tracker & Wiki
#18910: distributing descriptors accross CollecTor instances
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  enhancement| Status:  assigned
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords:  ctip   |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by karsten):

 Fine question.  I also thought about that a while ago and concluded (but
 apparently forgot to mention) that it's fine to simply use metrics-lib's
 DescriptorCollector to mirror the other instance's `recent/` directory and
 import that.  That directory is currently 2.8G and covers roughly 72 hours
 of data.  That would be about 40M new data per hour (per mirror), which we
 might be able to shrink when using compression.  The upside is that the
 engineering effort for this solution would be almost trivial, because the
 code already exists and is used by Onionoo, Metrics, and ExoneraTor.

 I admit that your option 2 is very tempting, mostly because designing
 protocols is fun, but also because it would be the more efficient approach
 with potentially other benefits we don't get with option 1.  But it's also
 a possible time sink of unknown depth.  I'd say (but can be convinced
 otherwise) that we should go for option 1.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs


  1   2   >