The limit is on blocks, not partitions. Partitions have many blocks. It sounds like you are creating very large values in memory, but I'm not sure given your description. You will run into problems if a single object is more than 2GB, of course. More of the stack trace might show what is mapping that much memory.
If you simply want data into 1000 files it's a lot simpler. Just repartition into 1000 partitions and save the data. If you need more control over what goes into which partition, use a Partitioner, yes. On Mon, Feb 2, 2015 at 8:40 PM, Michael Albert <m_albert...@yahoo.com.invalid> wrote: > Greetings! > > SPARK-1476 says that there is a 2G limit for "blocks". > Is this the same as a 2G limit for partitions (or approximately so?)? > > > What I had been attempting to do is the following. > 1) Start with a moderately large data set (currently about 100GB, but > growing). > 2) Create about 1,000 files (yes, files) each representing a subset of the > data. > > The current attempt I am working on is something like this. > 1) Do a "map" whose output key indicates which of the 1,000 files it will go > into and whose value is what I will want to stick into the file. > 2) Partition the data and use the body of mapPartition to open a file and > save the data. > > My apologies, this is actually embedded in a bigger mess, so I won't post > it. > > However, I get errors telling me that there is an "IllegalArgumentException: > Size exceeds Inter.MAX_VALUE", with sun.nio.ch.FileChannelImpl.map at the > top of the stack. This leads me to think that I have hit the limit or > partition and/or block size. > > Perhaps this is not a good way to do it? > > I suppose I could run 1,000 passes over the data, each time collecting the > output for one of my 1,000 final files, but that seems likely to be > painfully slow to run. > > Am I missing something? > > Admittedly, this is an odd use case.... > > Thanks! > > Sincerely, > Mike Albert --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org