Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-15 Thread Dave Challis
One gotcha with JSON vs parquet is that there's a longstanding bug that causes errors when trying to read from Parquet files containing 0 rows. For cases where we're converting from datasets that might be empty, we use JSON, and for everything else, Parquet.

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-14 Thread Uwe L. Korn
Message- > From: Divya Gehlot [mailto:divya.htco...@gmail.com] > Sent: Tuesday, June 12, 2018 5:25 AM > To: user@drill.apache.org > Subject: Re: Which perform better JSON or convert JSON to parquet format ? > > [EXTERNAL EMAIL] > > > Hi David, > How to create the schema

RE: Which perform better JSON or convert JSON to parquet format ?

2018-06-12 Thread Lee, David
Message- From: Divya Gehlot [mailto:divya.htco...@gmail.com] Sent: Tuesday, June 12, 2018 5:25 AM To: user@drill.apache.org Subject: Re: Which perform better JSON or convert JSON to parquet format ? [EXTERNAL EMAIL] Hi David, How to create the schema first using parquet library ? Can you please

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-12 Thread Divya Gehlot
ng to query parquet. > > -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Monday, June 11, 2018 4:47 AM > To: user > Subject: Re: Which perform better JSON or convert JSON to parquet format ? > > [EXTERNAL EMAIL] > > > Yes. Drill is good

RE: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Lee, David
json which always ends in index out of bound (server crashing) errors when trying to query parquet. -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Monday, June 11, 2018 4:47 AM To: user Subject: Re: Which perform better JSON or convert JSON to parquet format ?

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Ted Dunning
Yes. Drill is good at JSON. But Parquet will be faster during a scan. Faster may be better. Or other things may be more important. You have to decide what is important to you. The great virtue of drill is that you have the choice. On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot wrote: > Thank

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Divya Gehlot
Thanks to all for your opinions ! As Drill has been popularised as complex JSON reader as compare to other tools in space . Was wondering does drill works better for JSON rather than parquet.

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Ted Dunning
I am going to play the contrarian here. Parquet is not *always* faster than JSON. The (almost unique) case where it is better to leave data as JSON (or whatever) is when the average number of times that a file is read is equal to or less than roughly 1. The point is that to convert read the file

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-10 Thread Padma Penumarthy
Yes, parquet is always better for multiple reasons. With JSON, we have to read the whole file from a single reader thread and have to parse to read individual columns. Parquet compresses and encodes data on disk. So, we read much less data from disk. Drill can read individual columns with in eac

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-10 Thread Abhishek Girish
I would suggest converting the JSON files to parquet for better performance. JSON supports a more free form data model, so that's a trade-off you need to consider, in my opinion. On Sun, Jun 10, 2018 at 8:08 PM Divya Gehlot wrote: > Hi, > I am looking for the advise regarding the performance for

Which perform better JSON or convert JSON to parquet format ?

2018-06-10 Thread Divya Gehlot
Hi, I am looking for the advise regarding the performance for below : 1. keep the JSON as is 2. Convert the JSON file to parquet files My JSON files data is not in fixed format and file size varies from 10 KB to 1 MB. Appreciate the community users advise on above ! Thanks, Divya