The slowness you saw with Parquet can be heavily dependent on on how your CTAS was written. Did you cast to types as needed? Drill could be making some fast and loose assumptions about your data, and thus typing incorrectly. When I was in a similar scenario, I used some stronger typing and saw quite a bit of improvement with the Parquet files. This can be difficult if everything is nested though, your mileage may vary.
As to Jacques comment, the profile.json is important, if they are large files and the planner is going through lots of files, that may make up the bulk of your data. A partitioning strategy, should you be able to find one can help here, but there are still some issues that can crop up. I think the planning inefficiencies are being worked on. John On Mon, Mar 7, 2016 at 9:58 PM, Ted Dunning <[email protected]> wrote: > > On Mon, Mar 7, 2016 at 3:02 PM, Eric Pederson <[email protected]> wrote: > > I also tried converting the JSON files to Parquet using CTAS. The > Parquet > > queries took much longer than the JSON queries. Is that expected as > well? > > No. That is not expected. >
