[ https://issues.apache.org/jira/browse/OAK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Egli updated OAK-1605: ----------------------------- Attachment: OAK1605mp3Lookalike.bin Further narrowed the problem down to the following: * when lucene index stumbles across a binary property (jcr:content/jcr:data) which looks like the attached file (eg 4 bytes: FF'FF'C3'A9), it interprets it as audio/mpeg * when it parses the binary property with the corresponding parser - which is Mp3Parser - it ends up using MpegStream, does a skipFrame, and in there runs into the endless loop already reported in TIKA-991 In short: it looks like certain mp3-like files can cause tika to loop endlessly. And a fix for this is to switch to tika 1.5. To reproduce: upload attached OAK1605mp3Lookalike.bin into the repository and watch the CPU go 100% or more forever > Running into endless loop due to tika 1.4 > ----------------------------------------- > > Key: OAK-1605 > URL: https://issues.apache.org/jira/browse/OAK-1605 > Project: Jackrabbit Oak > Issue Type: Bug > Components: oak-lucene > Affects Versions: 0.19 > Reporter: Stefan Egli > Priority: Critical > Attachments: OAK1605mp3Lookalike.bin > > > Narrowed down an endless loop [1] which happened in oak 0.19 to be related to > TIKA-991: > * tika's mp3.MpegStream.skipStream calls InputStream.skip() until skipped > far enough or that method returns -1 > * In case that InputStream is a TailStream, there's a bug in tika 1.4 where > TailStream.skip(long) does not return -1 even though the end of stream was > reached > Switching to tika 1.5 should solve the issue as TIKA-991 in [0] mentions the > exact same endless loop and the tika-991_3.patch fixed the -1 problem. > I'll check if I can create a test to reproduce with reasonable effort.. > -- > [0] > https://issues.apache.org/jira/browse/TIKA-991?focusedCommentId=13579487&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579487 > [1] {code}"pool-8-thread-5" prio=5 tid=7f80a34ea800 nid=0x119cb8000 runnable > [119cb6000] > java.lang.Thread.State: RUNNABLE > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > - locked <7768956a0> (a java.io.BufferedInputStream) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at org.apache.tika.io.TailStream.read(TailStream.java:117) > at org.apache.tika.io.TailStream.skip(TailStream.java:140) > at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) <- > endless loop in here > at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) > at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.parseStringValue(LuceneIndexEditor.java:254) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addBinaryValue(LuceneIndexEditor.java:245) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.makeDocument(LuceneIndexEditor.java:200) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addOrUpdate(LuceneIndexEditor.java:178) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:108) > at > org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:64) > at > org.apache.jackrabbit.oak.spi.commit.CompositeEditor.leave(CompositeEditor.java:74) > at > org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeAdded(EditorDiff.java:130) > at > org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160) > at > org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:385) > at > org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeAdded(EditorDiff.java:125) > at > org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:440) > at > org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530) > at > org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148) > at > org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430) > at > org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530) > at > org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148) > at > org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430) > at > org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530) > at > org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148) > at > org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430) > at > org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530) > at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:52) > at > org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:143) > - locked <76c63aae0> (a > org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate) > at > org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105) > at org.quartz.core.JobRunShell.run(JobRunShell.java:207) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:695){code} -- This message was sent by Atlassian JIRA (v6.2#6252)