Consider adjusting the following config options [1]:

planner.slice_target
planner.width.max_per_node
planner.width.max_per_query

[1] https://drill.apache.org/docs/configuration-options-introduction/

On Wed, Sep 26, 2018 at 4:58 AM Divya Gehlot <[email protected]>
wrote:

> Even I looked for it and I couldn't find it .
> Only workaround for now which even I implemented is to merge the files
> using hadoop commands
> hadoop fs -merge /path/to/files  /path/to/mergedfile
>
> and if your data contain headers then you might  have to look to this
> <
> https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv/41785085#41785085
> >
>  .
>
> Hope this helps !
>
> Thanks,
> Divya
>
> On Wed, 26 Sep 2018 at 06:10, Reed Villanueva <[email protected]>
> wrote:
>
> > Is it possible to limit the number of files use to create / represent a
> > table when using apache drill's create table statement?
> >
> > Currently have sets of parquet files stored in HDFS and am converting
> them
> > to TSVs via drill CREATE TABLE, eg.
> >
> >     alter session set `store.format`='tsv';
> >     create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as
> >     select col1, col2, from_unixtime(extract_date/1000) as etl_date
> >     from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`;
> >
> > The problem is that doing this process can turn ~12 parquet files into
> ~30
> > TSV files, which is causing other problems for downstream operations. Is
> > there a way to limit how many files are used in the creation of this
> > TSV-version of the table?
> >
> > Could not find any such info in the docs (here
> > https://drill.apache.org/docs/create-table-as-ctas/ or here
> > https://drill.apache.org/docs/configuration-options-introduction/),
> though
> > the PARTITION BY clause appears to come close (
> >
> >
> https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data
> > )
> > (but not all the tables have nice partitionable fields).
> >
> > --
> > This electronic message is intended only for the named
> > recipient, and may
> > contain information that is confidential or
> > privileged. If you are not the
> > intended recipient, you are
> > hereby notified that any disclosure, copying,
> > distribution or
> > use of the contents of this message is strictly
> > prohibited. If
> > you have received this message in error or are not the
> > named
> > recipient, please notify us immediately by contacting the
> > sender at
> > the electronic mail address noted above, and delete
> > and destroy all copies
> > of this message. Thank you.
> >
>

Reply via email to