Re: [tor-bugs] #33502 [Metrics/CollecTor]: Do not let appended descriptor files grow too large

2020-04-21 Thread Tor Bug Tracker & Wiki
#33502: Do not let appended descriptor files grow too large
---+-
 Reporter:  karsten|  Owner:  karsten
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-
Changes (by karsten):

 * status:  needs_review => new


Comment:

 irl and I just talked this over and concluded that producing tarballs is
 the better design here. It solves the large files issue, and it might even
 fix data integrity/consistency issues that just haven't surfaced yet. I'm
 going to write a patch for the tarball idea some time in the next weeks.
 Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33502 [Metrics/CollecTor]: Do not let appended descriptor files grow too large

2020-03-10 Thread Tor Bug Tracker & Wiki
#33502: Do not let appended descriptor files grow too large
---+--
 Reporter:  karsten|  Owner:  karsten
 Type:  enhancement| Status:  needs_review
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by karsten):

 Here's another option: rather than append multiple descriptors to a single
 flat file we could produce a tarball containing the few hundred or
 thousand descriptor files. Basically,

   `https://collector.torproject.org/recent/relay-descriptors/server-
 descriptors/2020-03-10-14-05-00-server-descriptors`

 containing 596 descriptors concatenated to a 1.4 MiB file would then be
 replaced by

   `https://collector.torproject.org/recent/relay-descriptors/server-
 descriptors/2020-03-10-14-05-00-server-descriptors.tar`

 containing 596 descriptor files.

 Advantage over the approach sketched out above would be that we wouldn't
 have three output file formats anymore (flat file with 1 descriptor, flat
 file with >= 1 descriptors, tarball). Disadvantage might be that
 processing tarballs can be less convenient than processing flat files.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

[tor-bugs] #33502 [Metrics/CollecTor]: Do not let appended descriptor files grow too large

2020-03-02 Thread Tor Bug Tracker & Wiki
#33502: Do not let appended descriptor files grow too large
---+--
 Reporter:  karsten|  Owner:  karsten
 Type:  enhancement| Status:  assigned
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal |   Keywords:
Actual Points: |  Parent ID:
   Points: |   Reviewer:
  Sponsor: |
---+--
 I revisited #20395 last week. The issue is that metrics-lib cannot handle
 large descriptor files, because it first reads the entire file into memory
 before splitting it into single descriptors and parsing them. While it
 would be possible to parse large descriptor files after making some major
 code changes (using `FileChannel` and doing lazy parsing), I don't think
 that we have to do that. After all, we're writing these large descriptor
 files ourselves in CollecTor, and it's up to us to stop doing that.

 Going back in time, the original reason for concatenating multiple
 descriptors into a single file was that rsyncing many tiny files from one
 host to another host was just slow. So we appended server descriptors and
 extra-info descriptors into a single file. This works well with server
 descriptors or extra-info descriptors published within 1 hour or even 10
 hours. It does not work that well anymore with all server descriptors or
 extra-info descriptors synced from another CollecTor instance when
 starting a new instance (#20335). It works even less well when importing
 one or more monthly tarballs containing server descriptors or extra-info
 descriptors (#27716).

 My suggestion is that we define a configurable limit for appended
 descriptor files of, say, 20 MiB. And when storing a descriptor, we check
 whether appending a descriptor to an existing descriptor file would exceed
 this limit and start a new descriptor file in that case.

 There are some technical details to work out, but I think they can be
 solved. I also don't expect this to produce a lot of code, not even
 complex code changes. The benefit would be that we could resolve #20395
 and #27716 by implementing this.

 Thoughts on the general idea?

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33502 [Metrics/CollecTor]: Do not let appended descriptor files grow too large

2020-03-02 Thread Tor Bug Tracker & Wiki
#33502: Do not let appended descriptor files grow too large
---+--
 Reporter:  karsten|  Owner:  karsten
 Type:  enhancement| Status:  needs_review
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by karsten):

 * status:  assigned => needs_review


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs