Re: Querying parquet files

2015-07-07 Thread Ted Dunning
How many columns do you have? Do you understand about columnar data stores and how selecting only a single column means that much less data needs to be read? If your data consists, say, of integers, then Drill only needs to read 160MB to satisfy your query which is quite reasonable to be read in

Re: Querying parquet files

2015-07-07 Thread Yousef Lasi
We are currently running(testing) with Veritas CFS (attached to EMC SAN storage) which is visible across 6 servers. We also have a single test MapR node, but that's a small sandbox. The production implementation will be with a 10 node HDFS cluster The data files are 20 GB to 40 GB in size.

Hive version

2015-07-07 Thread Paul Mogren
I see that Drill 1.1.0 declares support for Hive 1.0, which is not yet provided by Amazon EMR. Any chance Hive 0.13 will still work? Can you characterize when 0.13 would or would not work? In general I think users will want to upgrade Drill much more frequently than they are able to upgrade Hive.

Re: Drill 1.1 and partition by

2015-07-07 Thread Steven Phillips
The feature was added late in the release cycle, and it wasn't tested as thoroughly as the default option. I think it should be perfectly ok to use; just be aware that it may lead to decreased performance when running CTAS operations. On the other hand, this could drastically reduce the number of

Re: Querying parquet files

2015-07-07 Thread Ted Dunning
No. A very simple model like that breaks down on many levels. The most important level that reality intrudes in is the fact that your I/O probably can't really be threaded so widely. What kind of storage are you using? How big is your data? Sent from my iPhone On Jul 7, 2015, at 6:38,

Re: Querying parquet files

2015-07-07 Thread Christopher Matta
You might also want to check out the new partitioned Parquet creation that was launched with 1.1.0: https://drill.apache.org/docs/partition-by-clause/ This would increase your read speed if your queries tend to use predicates. Chris Matta cma...@mapr.com 215-701-3146 On Tue, Jul 7, 2015 at