#25317: Enable webstats to process large (> 2G) logfiles -----------------------------------+---------------------- Reporter: iwakeh | Owner: iwakeh Type: defect | Status: assigned Priority: High | Milestone: Component: Metrics/CollecTor | Version: Severity: Normal | Keywords: Actual Points: | Parent ID: Points: | Reviewer: Sponsor: | -----------------------------------+---------------------- Quote from #25161, comment 12: Looking at the stack trace and the input log files, I noticed that two log files are larger than 2G when decompressed:
{{{ 3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org- access.log-20160531 584K in/webstats/archeotrichon.torproject.org/dist.torproject.org- access.log-20160531.xz 2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org- access.log-20160601 404K in/webstats/archeotrichon.torproject.org/dist.torproject.org- access.log-20160601.xz }}} I just ran another bulk import with just those two files as import and ran into the same exception. It seems like we shouldn't attempt to decompress these files into a `byte[]` in `FileType.decompress`, because Java can only handle arrays with up to 2 billion elements: https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we should work with streams there, not `byte[]`. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25317> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs