Is it possible to limit the number of files use to create / represent a
table when using apache drill's create table statement?
Currently have sets of parquet files stored in HDFS and am converting them
to TSVs via drill CREATE TABLE, eg.
alter session set `store.format`='tsv';
create table dfs.ucera_internal.`/my/workspace/path/tablename/tsv` as
select col1, col2, from_unixtime(extract_date/1000) as etl_date
from dfs.ucera_internal.`/my/workspace/path/tablename/parquet`;
The problem is that doing this process can turn ~12 parquet files into ~30
TSV files, which is causing other problems for downstream operations. Is
there a way to limit how many files are used in the creation of this
TSV-version of the table?
Could not find any such info in the docs (here
https://drill.apache.org/docs/create-table-as-ctas/ or here
https://drill.apache.org/docs/configuration-options-introduction/), though
the PARTITION BY clause appears to come close (
https://drill.apache.org/docs/partition-by-clause/#creating-a-partitioned-table-of-ngram-data)
(but not all the tables have nice partitionable fields).
--
This electronic message is intended only for the named
recipient, and may
contain information that is confidential or
privileged. If you are not the
intended recipient, you are
hereby notified that any disclosure, copying,
distribution or
use of the contents of this message is strictly
prohibited. If
you have received this message in error or are not the
named
recipient, please notify us immediately by contacting the
sender at
the electronic mail address noted above, and delete
and destroy all copies
of this message. Thank you.