I wanted to find the optimized parquet file size. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes.
I was copying everything from a table to another same dummy table for the experiment. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance! SET hive.exec.dynamic.partition.mode=nonstrict; SET parquet.column.index.access=true; SET hive.merge.mapredfiles=true; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK; SET parquet.compression=SNAPPY; SET dfs.block.size=445644800; SET parquet.block.size=445644800;