Greetings! SPARK-1476 says that there is a 2G limit for "blocks".Is this the same as a 2G limit for partitions (or approximately so?)?
What I had been attempting to do is the following.1) Start with a moderately large data set (currently about 100GB, but growing).2) Create about 1,000 files (yes, files) each representing a subset of the data. The current attempt I am working on is something like this.1) Do a "map" whose output key indicates which of the 1,000 files it will go into and whose value is what I will want to stick into the file.2) Partition the data and use the body of mapPartition to open a file and save the data. My apologies, this is actually embedded in a bigger mess, so I won't post it. However, I get errors telling me that there is an "IllegalArgumentException: Size exceeds Inter.MAX_VALUE", with sun.nio.ch.FileChannelImpl.map at the top of the stack. This leads me to think that I have hit the limit or partition and/or block size. Perhaps this is not a good way to do it? I suppose I could run 1,000 passes over the data, each time collecting the output for one of my 1,000 final files, but that seems likely to be painfully slow to run. Am I missing something? Admittedly, this is an odd use case.... Thanks! Sincerely, Mike Albert