Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end
Hm, I see where it is throwing the exception. Would you create a Jira ticket for this feature request and attach at least one example gz file and a failing JUnit test? TY, Gary On Tue, Aug 15, 2023, 12:31 PM Tim Allison wrote: > Gary, > > I'm sorry for my delay. I'm just back to the keyboard from some time away. > > This is an example from the gz stream. We had similar messages from some > bzip2 and xz. > > Caused by: java.io.IOException: Garbage after a valid .gz stream > at > org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:240) > at > org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:391) > at > org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205) > at java.base/java.io > .BufferedInputStream.fill(BufferedInputStream.java:252) > at java.base/java.io > .BufferedInputStream.read1(BufferedInputStream.java:292) > at java.base/java.io > .BufferedInputStream.read(BufferedInputStream.java:351) > at > org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205) > > Thank you! > > On 2023/07/29 14:49:23 Gary Gregory wrote: > > Hi Tim, > > > > Do you have a stack trace? Maybe this is an option we can add... > > > > Gary > > > > On Wed, Jul 26, 2023, 3:22 PM Tim Allison wrote: > > > > > We recently had a request to change our default behavior to turn on > > > processing multiple/concatenated compressor streams for gzip, bzip2, > etc. > > > When we made this change and compared the updated results with our > previous > > > results, we lost quite a few attachments because of the "garbage after > a > > > valid x" exception and because of how we're buffering/digesting the > stream. > > > > > > Is there any way to turn on extraction of concatenated compressor > streams, > > > but have it silently stop reading instead of throwing a garbage > exception? > > > > > > Thank you! > > > > > > Best, > > > > > > Tim > > > > > > > > > [0] https://issues.apache.org/jira/browse/TIKA-4048 > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > >
Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end
Gary, I'm sorry for my delay. I'm just back to the keyboard from some time away. This is an example from the gz stream. We had similar messages from some bzip2 and xz. Caused by: java.io.IOException: Garbage after a valid .gz stream at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:240) at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:391) at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252) at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351) at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205) Thank you! On 2023/07/29 14:49:23 Gary Gregory wrote: > Hi Tim, > > Do you have a stack trace? Maybe this is an option we can add... > > Gary > > On Wed, Jul 26, 2023, 3:22 PM Tim Allison wrote: > > > We recently had a request to change our default behavior to turn on > > processing multiple/concatenated compressor streams for gzip, bzip2, etc. > > When we made this change and compared the updated results with our previous > > results, we lost quite a few attachments because of the "garbage after a > > valid x" exception and because of how we're buffering/digesting the stream. > > > > Is there any way to turn on extraction of concatenated compressor streams, > > but have it silently stop reading instead of throwing a garbage exception? > > > > Thank you! > > > > Best, > > > > Tim > > > > > > [0] https://issues.apache.org/jira/browse/TIKA-4048 > > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end
Hi Tim, Do you have a stack trace? Maybe this is an option we can add... Gary On Wed, Jul 26, 2023, 3:22 PM Tim Allison wrote: > We recently had a request to change our default behavior to turn on > processing multiple/concatenated compressor streams for gzip, bzip2, etc. > When we made this change and compared the updated results with our previous > results, we lost quite a few attachments because of the "garbage after a > valid x" exception and because of how we're buffering/digesting the stream. > > Is there any way to turn on extraction of concatenated compressor streams, > but have it silently stop reading instead of throwing a garbage exception? > > Thank you! > > Best, > > Tim > > > [0] https://issues.apache.org/jira/browse/TIKA-4048 >