Is it possible to limit the number of files use to create / represent a
table when using apache drill's create table statement?

Currently have sets of parquet files stored in HDFS and am converting them
to TSVs via drill CREATE TABLE, eg.

    alter session set `store.format`='tsv';
    create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as
    select col1, col2, from_unixtime(extract_date/1000) as etl_date
    from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`;

The problem is that doing this process can turn ~12 parquet files into ~30
TSV files, which is causing other problems for downstream operations. Is
there a way to limit how many files are used in the creation of this
TSV-version of the table?

Could not find any such info in the docs (here
https://drill.apache.org/docs/create-table-as-ctas/ or here
https://drill.apache.org/docs/configuration-options-introduction/), though
the PARTITION BY clause appears to come close (
https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data)
(but not all the tables have nice partitionable fields).

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to