Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-26 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  closed
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:  fixed
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+
Changes (by karsten):

 * status:  merge_ready => closed
 * resolution:   => fixed


Comment:

 Merged, closing. Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-26 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+-
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  merge_ready
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+-
Changes (by karsten):

 * status:  needs_review => merge_ready


Comment:

 Looks good! Ready to be merged.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-26 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  needs_review
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by iwakeh):

 * status:  needs_revision => needs_review


Comment:

 Thanks for catching that!
 Please also review
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-25317
 the fixup commit].

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-26 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  needs_revision
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 Two issues:
  - Should `return new LocalDate[]{LocalDate.MAX, LocalDate.MIN};` have
 `MAX` and `MIN` exchanged? If not, can you document why this is correct?
  - In the block after `if (count >= LISTLIMIT) {`, the local variable
 `count` is not reset. This means that this code will be run after each
 line! Should be sufficient to set `count = 0;` in that block.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-26 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  needs_review
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by iwakeh):

 * status:  needs_revision => needs_review


Comment:

 Please review another
 [https://gitweb.torproject.org/user/iwakeh/collector.git/commit/?h=task-25317
 commit] on the current patch branch.

 Timing with the test batch is a little faster and memory usage looks way
 healthier now (stays between 2G and 9G mostly around 5G with very few
 peaks at 11G).  A more concave build-up, that drops down again to even
 less than 2G often.  The higher memory usage is during the compressing and
 writing phase which makes sense.

 Processing yearly batches for larger imports is much faster by about a
 factor of three on the aroides logs.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-24 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  needs_revision
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+
Changes (by iwakeh):

 * status:  needs_review => needs_revision


Comment:

 This patch needs revision.  One more memory issue to clean out.

 (Anyway, a yearly batched import could already be attempted, too.)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-23 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  needs_review
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by iwakeh):

 * status:  accepted => needs_review


Comment:

 Please review two commits
 [https://gitweb.torproject.org/user/iwakeh/collector.git/log/?h=task-25317
 this patch branch] based on the current master branch.  The first commit
 adapts collector to the changes introduced to metrics-lib when adding the
 log line sub-interfaces.  The next tackles the memory issues.  See commit
 comment for details.  There is also a speed-up of 50% compared to the
 previous version.

 This depends on the metrics-lib patch of #25329.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-21 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  accepted
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by iwakeh):

 * status:  assigned => accepted


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-21 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  assigned
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by iwakeh):

 See #25329 for the metrics-lib changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

[tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

2018-02-21 Thread Tor Bug Tracker & Wiki
#25317: Enable webstats to process large (> 2G) logfiles
---+--
 Reporter:  iwakeh |  Owner:  iwakeh
 Type:  defect | Status:  assigned
 Priority:  High   |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal |   Keywords:
Actual Points: |  Parent ID:
   Points: |   Reviewer:
  Sponsor: |
---+--
 Quote from #25161, comment 12:
Looking at the stack trace and the input log files, I noticed that two
 log files are larger than 2G when decompressed:

 {{{
 3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531
 584K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531.xz
 2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601
 404K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601.xz
 }}}

   I just ran another bulk import with just those two files as import and
 ran into the same exception.

   It seems like we shouldn't attempt to decompress these files into a
 `byte[]` in `FileType.decompress`, because Java can only handle arrays
 with up to 2 billion elements:
 https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we
 should work with streams there, not `byte[]`.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs