Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - the answer was spark.cassandra.input.split.sizeInMB. The default value is 512MBytes.  Setting this to 50 resulted in a lot more splits and the job ran in under 11 minutes; no timeout errors.  In this case the job was a simple count.  10 minutes 48 seconds for over 8.2 billion rows. 

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - I believe that for large tables, the spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or longer.  The job now runs much longer, but still doesn't complete.  I'm now facing this all too familiar error: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: