I am doing some performance testing, and per the Impala documentation, I am
trying to use a block size of 1024m in both Drill and MapR FS.  When I set
the MFS block size to 512 and the Drill (default) block size I saw some
performance improvements, and wanted to try the 1024 to see how it worked,
however, my query hung and I got into that "bad state" where the nodes are
not responding right and I have to restart my whole cluster (This really
bothers me that a query can make the cluster be unresponsive)

That said, what memory settings can I tweak to help the query work. This is
quite a bit of data, a CTAS from Parquet to Parquet, 100-130G of data per
data (I am doing a day at a time), 103 columns.   I have to use the
"use_new_reader" option due to my other issues, but other than that I am
just setting the block size on MFS and then updating the block size in
Drill, and it's dying. Since this is a simple CTAS (no sort) which settings
can be beneficial for what is happening here?

Thanks

John

Reply via email to