Error while reading from Parquet during CTAS : DATA_READ ERROR

sreeparna bhabani Sat, 09 May 2020 01:11:11 -0700

Hi Team,

Reach out to you for one issue regarding Apache Drill while creating Parquet
file from another Parquet file (generated from another tool). Please find
the details below. I have created following Jira ticket with more details-
https://issues.apache.org/jira/browse/DRILL-7736


*Summary-*

I am re-writing one Parquet file from another Parquet file using CTAS
PARTITION BY (). The source Parquet file is generated from Python. But when
I am trying to rewrite the parquet in Drill I am getting error. The details
of the error is given below.

*Version of Apache Drill* -

1.17

*Memory config-*

DRILL_HEAP=16 G
DRILL_MAX_DIRECT_MEMORY=32G

*Config information which I tried-*

exec.sort.disable_managed=true

store.parquet.reader.pagereader.async=true;

store.parquet.reader.pagereader.bufferedread=false;

planner.memory.max_query_memory_per_node=31147483648

drill.exec.memory.operator.output_batch_size=4194304

*Details of volume of data-*

The number of rows for which I am trying to CTAS is - 25245241. No of
columns 145.

FYI - I am able to create Parquet using CTAS for less number of rows.

*CTAS script-*

CREATE TABLE dfs.root.<Table_name>
PARTITION BY (<Column1>,<Column2>,<Column3>)
AS SELECT *
FROM dfs.root.<source_parquet>;

Please suggest me how we can fix this.

Thanks n Regards,
*Sreeparna Bhabani*

Error while reading from Parquet during CTAS : DATA_READ ERROR

Reply via email to