concurrency questions

Yadid Ayzenberg Thu, 10 Mar 2016 10:27:03 -0800

Hi All,

We are considering using drill to access data for large scale analytics on top 
of parquet files stored on HDFS.
We would like to add data to this data-set in real-time, as it arrives into our 
system. One propose solution was to use drill to perform both the inserts and 
the selects on our data set.


Some questions that arose:

>From what I understand, Drill enables concurrency by queuing requests. If we 
>are preforming many reads, will writes to the same file be queued until 
>completion of the reads ? This potentially could create a bottle neck 
How does Drill manage parquet file partitioning, when using CTAS. Can we 
control horizontal / vertical partitioning in some way by configuring the drill 
bit ?
Any alternative suggestions to the approach above? In terms of read 
performance, would this result in better performance (for columnar type data), 
than by using something like HBASE?
Thanks,

Yadid

concurrency questions

Reply via email to