#27716: Out of memory when loading in multiple years of relay descriptors -----------------------------------+-------------------------- Reporter: irl | Owner: metrics-team Type: defect | Status: new Priority: Medium | Milestone: Component: Metrics/CollecTor | Version: Severity: Normal | Keywords: Actual Points: | Parent ID: Points: | Reviewer: Sponsor: | -----------------------------------+-------------------------- The interesting thing here is that this happens **after** the descriptors have been imported and written into the `out/` and `recent/` directories. This implies that we're not that far off being able to handle huge imports of relay descriptors.
I believe this is happening in the `ReferenceChecker` but I haven't confirmed this. {{{ 2018-09-14 17:26:42,757 INFO o.t.m.c.c.Scheduler:150 New Thread created: CollecTor-Scheduled-Thread-1 2018-09-14 17:26:42,758 INFO o.t.m.c.c.CollecTorMain:66 Starting relaydescs module of CollecTor. 2018-09-14 17:26:42,762 INFO o.t.d.DescriptorSourceFactory:150 Serving implementation org.torproject.descriptor.impl.DescriptorParserImpl for descriptor.parser. 2018-09-14 18:32:21,639 INFO o.t.m.c.r.ArchiveReader:273 Finished importing relay descriptors from local directory: Parsed 5253040, ignored 0 files. 2018-09-14 18:32:21,759 INFO o.t.m.c.r.ArchiveWriter:484 Finished writing relay descriptors to disk. While importing relay descriptors from local directory, we stored 0 consensus(es), 0 microdesc consensus(es), 0 vote(s), 0 certificate(s), 5253040 server descriptor(s), 0 extra-info descriptor(s), and 0 microdescriptor(s) to disk. Statistics on the completeness of written relay descriptors: 2018-09-14 18:32:21,760 WARN o.t.m.c.r.ArchiveWriter:535 The last known relay server descriptor was published at 2018-09-13 19:32:29, which is more than 5:30 hours in the past. 2018-09-14 18:32:21,760 WARN o.t.m.c.r.ArchiveWriter:542 The last known relay extra-info descriptor was published at 2018-09-13 19:32:29, which is more than 5:30 hours in the past. 2018-09-14 18:32:21,761 WARN o.t.m.c.r.ArchiveWriter:549 The last known relay microdescriptor was contained in a microdesc consensus that was valid after 2018-09-13 14:00:00, which is more than 5:30 hours in the past. 2018-09-14 18:32:22,528 INFO o.t.d.DescriptorSourceFactory:150 Serving implementation org.torproject.descriptor.impl.DescriptorReaderImpl for descriptor.reader. 2018-09-14 18:32:22,558 ERROR o.t.d.i.DescriptorReaderImpl:165 Bug: uncaught exception or error while reading descriptors: Required array size too large java.lang.OutOfMemoryError: Required array size too large at java.nio.file.Files.readAllBytes(Files.java:3156) at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readDescriptorFile(DescriptorReaderImpl.java:333) at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readDescriptorFiles(DescriptorReaderImpl.java:255) at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:161) at java.lang.Thread.run(Thread.java:748) 2018-09-14 18:32:22,559 ERROR o.t.m.c.c.CollecTorMain:71 The relaydescs module failed: Saving history is only permitted after reading descriptors. java.lang.IllegalStateException: Saving history is only permitted after reading descriptors. at org.torproject.descriptor.impl.DescriptorReaderImpl.saveHistoryFile(DescriptorReaderImpl.java:116) at org.torproject.metrics.collector.relaydescs.ReferenceChecker.readNewDescriptors(ReferenceChecker.java:201) at org.torproject.metrics.collector.relaydescs.ReferenceChecker.check(ReferenceChecker.java:89) at org.torproject.metrics.collector.relaydescs.ArchiveWriter.startProcessing(ArchiveWriter.java:194) at org.torproject.metrics.collector.cron.CollecTorMain.run(CollecTorMain.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-09-14 18:32:22,561 INFO o.t.m.c.c.ShutdownHook:23 Shutdown in progress ... 2018-09-14 18:32:22,561 INFO o.t.m.c.c.Scheduler:127 Waiting at most 10 minutes for termination of running tasks ... 2018-09-14 18:32:22,562 INFO o.t.m.c.c.Scheduler:132 Shutdown of all scheduled tasks completed successfully. 2018-09-14 18:32:22,562 INFO o.t.m.c.c.ShutdownHook:25 Shutdown finished. Exiting. }}} {{{ -rw-r--r-- 1 collector collector 9.4G Sep 14 19:32 2018-09-14-17-26-42 -server-descriptors }}} -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27716> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs