I wanted to find the optimized parquet file size. It looks like no matther
how much I put on set block size, hive always gave the same result on
parquet file sizes.

I was copying everything from a table to another same dummy table for the
experiment. There are a lot small files. Here are the table properties. Can
anyone help me? Thanks in advance!



SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

Reply via email to