Consider adjusting the following config options [1]: planner.slice_target planner.width.max_per_node planner.width.max_per_query
[1] https://drill.apache.org/docs/configuration-options-introduction/ On Wed, Sep 26, 2018 at 4:58 AM Divya Gehlot <[email protected]> wrote: > Even I looked for it and I couldn't find it . > Only workaround for now which even I implemented is to merge the files > using hadoop commands > hadoop fs -merge /path/to/files /path/to/mergedfile > > and if your data contain headers then you might have to look to this > < > https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv/41785085#41785085 > > > . > > Hope this helps ! > > Thanks, > Divya > > On Wed, 26 Sep 2018 at 06:10, Reed Villanueva <[email protected]> > wrote: > > > Is it possible to limit the number of files use to create / represent a > > table when using apache drill's create table statement? > > > > Currently have sets of parquet files stored in HDFS and am converting > them > > to TSVs via drill CREATE TABLE, eg. > > > > alter session set `store.format`='tsv'; > > create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as > > select col1, col2, from_unixtime(extract_date/1000) as etl_date > > from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`; > > > > The problem is that doing this process can turn ~12 parquet files into > ~30 > > TSV files, which is causing other problems for downstream operations. Is > > there a way to limit how many files are used in the creation of this > > TSV-version of the table? > > > > Could not find any such info in the docs (here > > https://drill.apache.org/docs/create-table-as-ctas/ or here > > https://drill.apache.org/docs/configuration-options-introduction/), > though > > the PARTITION BY clause appears to come close ( > > > > > https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data > > ) > > (but not all the tables have nice partitionable fields). > > > > -- > > This electronic message is intended only for the named > > recipient, and may > > contain information that is confidential or > > privileged. If you are not the > > intended recipient, you are > > hereby notified that any disclosure, copying, > > distribution or > > use of the contents of this message is strictly > > prohibited. If > > you have received this message in error or are not the > > named > > recipient, please notify us immediately by contacting the > > sender at > > the electronic mail address noted above, and delete > > and destroy all copies > > of this message. Thank you. > > >
