Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end

2023-08-15 Thread Gary Gregory
Hm, I see where it is throwing the exception. Would you create a Jira
ticket for this feature request and attach at least one example gz file and
a failing JUnit test?

TY,
Gary

On Tue, Aug 15, 2023, 12:31 PM Tim Allison  wrote:

> Gary,
>
> I'm sorry for my delay.  I'm just back to the keyboard from some time away.
>
> This is an example from the gz stream.  We had similar messages from some
> bzip2 and xz.
>
> Caused by: java.io.IOException: Garbage after a valid .gz stream
> at
> org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:240)
> at
> org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:391)
> at
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)
> at java.base/java.io
> .BufferedInputStream.fill(BufferedInputStream.java:252)
> at java.base/java.io
> .BufferedInputStream.read1(BufferedInputStream.java:292)
> at java.base/java.io
> .BufferedInputStream.read(BufferedInputStream.java:351)
> at
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)
>
> Thank you!
>
> On 2023/07/29 14:49:23 Gary Gregory wrote:
> > Hi Tim,
> >
> > Do you have a stack trace? Maybe this is an option we can add...
> >
> > Gary
> >
> > On Wed, Jul 26, 2023, 3:22 PM Tim Allison  wrote:
> >
> > > We recently had a request to change our default behavior to turn on
> > > processing multiple/concatenated compressor streams for gzip, bzip2,
> etc.
> > > When we made this change and compared the updated results with our
> previous
> > > results, we lost quite a few attachments because of the "garbage after
> a
> > > valid x" exception and because of how we're buffering/digesting the
> stream.
> > >
> > > Is there any way to turn on extraction of concatenated compressor
> streams,
> > > but have it silently stop reading instead of throwing a garbage
> exception?
> > >
> > > Thank you!
> > >
> > > Best,
> > >
> > > Tim
> > >
> > >
> > > [0] https://issues.apache.org/jira/browse/TIKA-4048
> > >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end

2023-08-15 Thread Tim Allison
Gary,

I'm sorry for my delay.  I'm just back to the keyboard from some time away.

This is an example from the gz stream.  We had similar messages from some bzip2 
and xz.

Caused by: java.io.IOException: Garbage after a valid .gz stream
at 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:240)
at 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:391)
at 
org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)
at 
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at 
java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
at 
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at 
org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)

Thank you!

On 2023/07/29 14:49:23 Gary Gregory wrote:
> Hi Tim,
> 
> Do you have a stack trace? Maybe this is an option we can add...
> 
> Gary
> 
> On Wed, Jul 26, 2023, 3:22 PM Tim Allison  wrote:
> 
> > We recently had a request to change our default behavior to turn on
> > processing multiple/concatenated compressor streams for gzip, bzip2, etc.
> > When we made this change and compared the updated results with our previous
> > results, we lost quite a few attachments because of the "garbage after a
> > valid x" exception and because of how we're buffering/digesting the stream.
> >
> > Is there any way to turn on extraction of concatenated compressor streams,
> > but have it silently stop reading instead of throwing a garbage exception?
> >
> > Thank you!
> >
> > Best,
> >
> > Tim
> >
> >
> > [0] https://issues.apache.org/jira/browse/TIKA-4048
> >
> 

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [COMPRESS] Gracefully handling multiple compressor streams with garbage at end

2023-07-29 Thread Gary Gregory
Hi Tim,

Do you have a stack trace? Maybe this is an option we can add...

Gary

On Wed, Jul 26, 2023, 3:22 PM Tim Allison  wrote:

> We recently had a request to change our default behavior to turn on
> processing multiple/concatenated compressor streams for gzip, bzip2, etc.
> When we made this change and compared the updated results with our previous
> results, we lost quite a few attachments because of the "garbage after a
> valid x" exception and because of how we're buffering/digesting the stream.
>
> Is there any way to turn on extraction of concatenated compressor streams,
> but have it silently stop reading instead of throwing a garbage exception?
>
> Thank you!
>
> Best,
>
> Tim
>
>
> [0] https://issues.apache.org/jira/browse/TIKA-4048
>