Even I looked for it and I couldn't find it .
Only workaround for now which even I implemented is to merge the files
using hadoop commands
hadoop fs -merge /path/to/files  /path/to/mergedfile

and if your data contain headers then you might  have to look to this
<https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv/41785085#41785085>
 .

Hope this helps !

Thanks,
Divya

On Wed, 26 Sep 2018 at 06:10, Reed Villanueva <[email protected]> wrote:

> Is it possible to limit the number of files use to create / represent a
> table when using apache drill's create table statement?
>
> Currently have sets of parquet files stored in HDFS and am converting them
> to TSVs via drill CREATE TABLE, eg.
>
>     alter session set `store.format`='tsv';
>     create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as
>     select col1, col2, from_unixtime(extract_date/1000) as etl_date
>     from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`;
>
> The problem is that doing this process can turn ~12 parquet files into ~30
> TSV files, which is causing other problems for downstream operations. Is
> there a way to limit how many files are used in the creation of this
> TSV-version of the table?
>
> Could not find any such info in the docs (here
> https://drill.apache.org/docs/create-table-as-ctas/ or here
> https://drill.apache.org/docs/configuration-options-introduction/), though
> the PARTITION BY clause appears to come close (
>
> https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data
> )
> (but not all the tables have nice partitionable fields).
>
> --
> This electronic message is intended only for the named
> recipient, and may
> contain information that is confidential or
> privileged. If you are not the
> intended recipient, you are
> hereby notified that any disclosure, copying,
> distribution or
> use of the contents of this message is strictly
> prohibited. If
> you have received this message in error or are not the
> named
> recipient, please notify us immediately by contacting the
> sender at
> the electronic mail address noted above, and delete
> and destroy all copies
> of this message. Thank you.
>

Reply via email to