For uniform partitioning, you can try custom Partitioner.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p26477.html
Sent from the Apache Spark User List mailing list archive at
Are you having this issue with spark 1.5 as well? We had similar OOM issue
and was told by databricks to upgrade to 1.5 to resolve that. I guess they
are trying to sell Tachyon :)
--
View this message in context:
"I don't think you could avoid this
in general, right, in any system? "
Really? nosql databases do efficient lookups(and scan) based on key and
partition. look at cassandra, hbase
--
View this message in context:
Looks like this has been supported from 1.4 release :)
https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions
--
View this message in context:
Hi DB Tsai-2,
I am trying to run singleton sparkcontext in my container (spring-boot
tomcat container). When my application bootstrap I used to create
sparkContext and keep the reference for future job submission. I got it
working with standalone spark perfectly but I am having trouble with yarn
I updated code sample so people can understand better what are my inputs and
outputs.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-RDD-from-Iterable-from-groupByKey-results-tp23328p23341.html
Sent from the Apache Spark User List mailing list
Have you found answer to this? I am also looking for exact same solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-of-Iterable-String-tp15016p23329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.