Hi Jason, Thanks for getting back to me. We were able to get the spark job to append the .json.gz so we are ok for now. I tried working with local files of json. Drill will not query it if it's not named .json. I didn't try gzipped. But since we got them renamed in s3 I'm out of the woods.
thanks! ________________________________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. ________________________________________ From: Jason Altekruse <[email protected]> Sent: Tuesday, June 28, 2016 3:05 PM To: user Subject: Re: gzipped json files not named .json.gz Hi Scott, >From some quick testing, setting the defaultInputFormat to "json" appears to be working as it was designed. It is true that we have the limitation of relying entirely on extensions for detecting compression of text and json files. I am able to read all of these files in a workspace with JSON set as the default format. Were you not seeing this behavior? a a.gz a.json a.json.gz We could consider adding default compression as an option in a workspace, but are you really unable to move the files? It seems like the best option might be to just rename, as I would think other tools would have trouble reading these as well. Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Tue, Jun 28, 2016 at 2:48 PM, Parth Chandra <[email protected]> wrote: > Yes, I believe that would work if the file is not compressed. > > On Tue, Jun 28, 2016 at 12:01 PM, Scott Kinney <[email protected]> > wrote: > > > Well that's a bummer but I believe it setting "defaultInputFormat": > "json" > > doesn't seem to have any effect. > > > > > > ________________________________ > > Scott Kinney | DevOps > > stem | m 510.282.1299 > > 100 Rollins Road, Millbrae, California 94030 > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > proprietary information and material for the sole use of the intended > > recipient(s). Any review, use or distribution that has not been expressly > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > Thank > > you. > > > > ________________________________________ > > From: Parth Chandra <[email protected]> > > Sent: Tuesday, June 28, 2016 11:36 AM > > To: [email protected] > > Subject: Re: gzipped json files not named .json.gz > > > > Hi Scott, > > > > Unlikely that this will work without the extension. Drill uses Hadoop's > > CompressionCodecFactory class [1] that infers the compression type from > the > > extension. > > > > Parth > > > > [1] > > > > > https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec(org.apache.hadoop.fs.Path) > > > > On Tue, Jun 28, 2016 at 8:47 AM, Scott Kinney <[email protected]> > > wrote: > > > > > Can I have drill open gzipped json files who's names do not end in > > > .json.gz? > > > > > > We have a spark job generating these files and it just dosn't want to > > > change the name or append the .json.gz. > > > > > > ? > > > > > > > > > ________________________________ > > > Scott Kinney | DevOps > > > stem <http://www.stem.com/> | m 510.282.1299 > > > 100 Rollins Road, Millbrae, California 94030 > > > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > > proprietary information and material for the sole use of the intended > > > recipient(s). Any review, use or distribution that has not been > expressly > > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > > intended recipient, please contact the sender and delete all copies. > > Thank > > > you. > > > > > >
