Hi All, I have the following code which produces 1 600MB parquet file as expected, however within this parquet file there are 42 row groups! I would expect it to crate max 6 row groups, could someone please shed some light on this? Is there any config setting which I can enable while submitting application using spark-submit?
df = spark.read.parquet(INPUT_PATH) df.coalesce(1).write.parquet(OUT_PATH) I did try --conf spark.parquet.block.size & spark.dfs.blocksize, but that makes no difference. -- Regards, Rishi Shah