I am doing some performance testing, and per the Impala documentation, I am trying to use a block size of 1024m in both Drill and MapR FS. When I set the MFS block size to 512 and the Drill (default) block size I saw some performance improvements, and wanted to try the 1024 to see how it worked, however, my query hung and I got into that "bad state" where the nodes are not responding right and I have to restart my whole cluster (This really bothers me that a query can make the cluster be unresponsive)
That said, what memory settings can I tweak to help the query work. This is quite a bit of data, a CTAS from Parquet to Parquet, 100-130G of data per data (I am doing a day at a time), 103 columns. I have to use the "use_new_reader" option due to my other issues, but other than that I am just setting the block size on MFS and then updating the block size in Drill, and it's dying. Since this is a simple CTAS (no sort) which settings can be beneficial for what is happening here? Thanks John
