[HANGOUT] Topics for 9/5/2017

2017-09-04 Thread Boaz Ben-Zvi
We shall have a Drill hangout tomorrow (Tuesday Sept 5) at 10 AM Pacific. Please suggest any topics by replying to this thread or bring them up during the hangout. Hangout link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Thanks, Boaz

Re: Best way to partition the data

2017-09-04 Thread Divya Gehlot
Hi, I also face the similar issue like JinFeng when querying the data on columns year,month and day which were partioning column too . It created lots of small files and querying took almost 20x times more than reading non partiioning data . Another issue I faced when I query the data by just

Re: Best way to partition the data

2017-09-04 Thread Damien Profeta
Hello, The metadata cache is well used. The issue is that most of the time is spent in planning. It is not a huge amount of time (around 10s) but that's seem a lot to handle 50k files. The cardinality for key1 and key2 is around 300, so key1*key2 the number of files in tens of thousands.