Re: Strange 'gzip error' running Beam on Dataflow

2018-10-12 Thread Randal Moore
The files have no content-encoding set. They are no big query exports but rather crafted by a service of mine. Note that my doFunc gets called for each line of the file, something that I don't think would happen - wouldn't it apply gunzip to the whole content? On Fri, Oct 12, 2018, 5:04 PM Jose

Re: Strange 'gzip error' running Beam on Dataflow

2018-10-12 Thread Jose Ignacio Honrado
Hi Randal, You might be experiencing the automatic decompressive transcoding from GCS. Take a look at this to see if it helps: https://cloud.google.com/storage/docs/transcoding It seems like a compressed file is expected (as for the gz extension), but the file is returned decompressed by GCS.

Strange 'gzip error' running Beam on Dataflow

2018-10-12 Thread Randal Moore
Using Beam Java SDK 2.6. I have a batch pipeline that has run successfully in its current several times. Suddenly I am getting strange errors complaining about the format of the input. As far as I know, the pipeline didn't change at all since the last successful run. The error: