RE: sqlline parquet to tsv filesize imabalance causing slow sqoop export to MS sql server

2017-11-16 Thread Kunal Khatua
It might be that your parallelization is causing it to generate 4 files, where only <= 3 files are sufficient. Try experimenting with the planner. width .max_per_query to a value of 3 ... that might help. https://drill.apache.org/docs/configuration-options-introduction/ -Original

sqlline parquet to tsv filesize imabalance causing slow sqoop export to MS sql server

2017-11-16 Thread Reed Villanueva
I am new to using drill and am trying to convert a table stored on hadoop dfs as .parquet to .tsv format using sqlline that came with the drill package. The problem is that when doing this, the tsv files are poorly 'balanced'. When checking the sizes of the converted files, I see:

Re: Apache Arrow Integration

2017-11-16 Thread Sonny Heer
where does Dremio fit in with this. I believe they are using both drill and arrow...? On Thu, Nov 16, 2017 at 10:52 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > Hi all, > > I wanted to get some thoughts on leveraging Apache Arrow for improving > Drill speed. I believe this was

Re: Apache Arrow Integration

2017-11-16 Thread Paul Rogers
Hi Saurabh, Here is my two cents, FWIW. Arrow integration is not about speed; Arrow’s memory layout and operations are very much like Drill’s (not surprising; they evolved from Drill’s value vectors.) Rather, the value of integration is the integration itself. Arrow allows Drill to get out of

Apache Arrow Integration

2017-11-16 Thread Saurabh Mahapatra
Hi all, I wanted to get some thoughts on leveraging Apache Arrow for improving Drill speed. I believe this was discussed in the Drill hackathon in September. So what was decided? Any thoughts are more than welcome. Am I right when I say that leveraging an in-memory representation like Arrow is