It might be that your parallelization is causing it to generate 4 files, where
only <= 3 files are sufficient.
Try experimenting with the planner. width .max_per_query to a value of 3 ...
that might help.
https://drill.apache.org/docs/configuration-options-introduction/
-Original
I am new to using drill and am trying to convert a table stored on hadoop
dfs as .parquet to .tsv format using sqlline that came with the drill
package. The problem is that when doing this, the tsv files are poorly
'balanced'. When checking the sizes of the converted files, I see:
where does Dremio fit in with this. I believe they are using both drill
and arrow...?
On Thu, Nov 16, 2017 at 10:52 AM, Saurabh Mahapatra <
saurabhmahapatr...@gmail.com> wrote:
> Hi all,
>
> I wanted to get some thoughts on leveraging Apache Arrow for improving
> Drill speed. I believe this was
Hi Saurabh,
Here is my two cents, FWIW.
Arrow integration is not about speed; Arrow’s memory layout and operations are
very much like Drill’s (not surprising; they evolved from Drill’s value
vectors.) Rather, the value of integration is the integration itself.
Arrow allows Drill to get out of
Hi all,
I wanted to get some thoughts on leveraging Apache Arrow for improving
Drill speed. I believe this was discussed in the Drill hackathon in
September.
So what was decided? Any thoughts are more than welcome.
Am I right when I say that leveraging an in-memory representation like
Arrow is