Re: gzipped json files not named .json.gz

Scott Kinney Fri, 01 Jul 2016 09:23:30 -0700

Hi Jason, 
Thanks for getting back to me. We were able to get the spark job to append the 
.json.gz so we are ok for now. 
I tried working with local files of json. Drill will not query it if it's not 
named .json. I didn't try gzipped. But since we got them renamed in s3 I'm out 
of the woods.


thanks!


________________________________
Scott Kinney | DevOps
stem   |   m  510.282.1299
100 Rollins Road, Millbrae, California 94030

This e-mail and/or any attachments contain Stem, Inc. confidential and 
proprietary information and material for the sole use of the intended 
recipient(s). Any review, use or distribution that has not been expressly 
authorized by Stem, Inc. is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies. Thank you.

________________________________________
From: Jason Altekruse <[email protected]>
Sent: Tuesday, June 28, 2016 3:05 PM
To: user
Subject: Re: gzipped json files not named .json.gz

Hi Scott,

>From some quick testing, setting the defaultInputFormat to "json" appears
to be working as it was designed. It is true that we have the limitation of
relying entirely on extensions for detecting compression of text and json
files.

I am able to read all of these files in a workspace with JSON set as the
default format. Were you not seeing this behavior?

a        a.gz        a.json        a.json.gz

We could consider adding default compression as an option in a workspace,
but are you really unable to move the files? It seems like the best option
might be to just rename, as I would think other tools would have trouble
reading these as well.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, Jun 28, 2016 at 2:48 PM, Parth Chandra <[email protected]>
wrote:

> Yes, I believe that would work if the file is not compressed.
>
> On Tue, Jun 28, 2016 at 12:01 PM, Scott Kinney <[email protected]>
> wrote:
>
> > Well that's a bummer but I believe it setting "defaultInputFormat":
> "json"
> > doesn't seem to have any effect.
> >
> >
> > ________________________________
> > Scott Kinney | DevOps
> > stem   |   m  510.282.1299
> > 100 Rollins Road, Millbrae, California 94030
> >
> > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > proprietary information and material for the sole use of the intended
> > recipient(s). Any review, use or distribution that has not been expressly
> > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > intended recipient, please contact the sender and delete all copies.
> Thank
> > you.
> >
> > ________________________________________
> > From: Parth Chandra <[email protected]>
> > Sent: Tuesday, June 28, 2016 11:36 AM
> > To: [email protected]
> > Subject: Re: gzipped json files not named .json.gz
> >
> > Hi Scott,
> >
> >   Unlikely that this will work without the extension. Drill uses Hadoop's
> > CompressionCodecFactory class [1] that infers the compression type from
> the
> > extension.
> >
> > Parth
> >
> > [1]
> >
> >
> https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec(org.apache.hadoop.fs.Path)
> >
> > On Tue, Jun 28, 2016 at 8:47 AM, Scott Kinney <[email protected]>
> > wrote:
> >
> > > Can I have drill open gzipped json files who's names do not end in
> > > .json.gz?
> > >
> > > We have a spark job generating these files and it just dosn't want to
> > > change the name or append the .json.gz.
> > >
> > > ?
> > >
> > >
> > > ________________________________
> > > Scott Kinney | DevOps
> > > stem <http://www.stem.com/>   |   m  510.282.1299
> > > 100 Rollins Road, Millbrae, California 94030
> > >
> > > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > > proprietary information and material for the sole use of the intended
> > > recipient(s). Any review, use or distribution that has not been
> expressly
> > > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > Thank
> > > you.
> > >
> >
>

Re: gzipped json files not named .json.gz

Reply via email to