Even I looked for it and I couldn't find it . Only workaround for now which even I implemented is to merge the files using hadoop commands hadoop fs -merge /path/to/files /path/to/mergedfile
and if your data contain headers then you might have to look to this <https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv/41785085#41785085> . Hope this helps ! Thanks, Divya On Wed, 26 Sep 2018 at 06:10, Reed Villanueva <[email protected]> wrote: > Is it possible to limit the number of files use to create / represent a > table when using apache drill's create table statement? > > Currently have sets of parquet files stored in HDFS and am converting them > to TSVs via drill CREATE TABLE, eg. > > alter session set `store.format`='tsv'; > create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as > select col1, col2, from_unixtime(extract_date/1000) as etl_date > from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`; > > The problem is that doing this process can turn ~12 parquet files into ~30 > TSV files, which is causing other problems for downstream operations. Is > there a way to limit how many files are used in the creation of this > TSV-version of the table? > > Could not find any such info in the docs (here > https://drill.apache.org/docs/create-table-as-ctas/ or here > https://drill.apache.org/docs/configuration-options-introduction/), though > the PARTITION BY clause appears to come close ( > > https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data > ) > (but not all the tables have nice partitionable fields). > > -- > This electronic message is intended only for the named > recipient, and may > contain information that is confidential or > privileged. If you are not the > intended recipient, you are > hereby notified that any disclosure, copying, > distribution or > use of the contents of this message is strictly > prohibited. If > you have received this message in error or are not the > named > recipient, please notify us immediately by contacting the > sender at > the electronic mail address noted above, and delete > and destroy all copies > of this message. Thank you. >
