I have a gzipped json sample file ~2GB which I can create a parquet table from perfectly fine on my laptop.
I have spun up a new EMR cluster running MapR M5 and am using the bootstrap script: https://www.mapr.com/blog/bootstrap-apache-drill-amazon-emr Running the same CTAS I can see if gets through reading around 1/5th of the rows from the json and crashes with the following: Error: SYSTEM ERROR: RuntimeException: Error closing operators What is my best way to really diagnose what is going on? I wouldn't have thought it would be a memory issue as these EMR nodes have far more memory that that of my laptop. Any advice is appreciated. Thanks
