Hi All, We are considering using drill to access data for large scale analytics on top of parquet files stored on HDFS. We would like to add data to this data-set in real-time, as it arrives into our system. One propose solution was to use drill to perform both the inserts and the selects on our data set.
Some questions that arose: >From what I understand, Drill enables concurrency by queuing requests. If we >are preforming many reads, will writes to the same file be queued until >completion of the reads ? This potentially could create a bottle neck How does Drill manage parquet file partitioning, when using CTAS. Can we control horizontal / vertical partitioning in some way by configuring the drill bit ? Any alternative suggestions to the approach above? In terms of read performance, would this result in better performance (for columnar type data), than by using something like HBASE? Thanks, Yadid
