Is it because countByValue or toArray put too much stress on the driver, if
there are many unique words
To me it is a typical word count problem, then you can solve it as follows
(correct me if I am wrong)
val textFile = sc.textFile(“file)
val counts = textFile.flatMap(line = line.split(
Hi Zhan,
Thanks for looking into this. I'm actually using the hash map as an example
of the simplest snippet of code that is failing for me. I know that this is
just the word count. In my actual problem I'm using a Trie data structure
to find substring matches.
On Sun, Aug 17, 2014 at 11:35 PM,
Not sure exactly how you use it. My understanding is that in spark it would be
better to keep the overhead of driver as less as possible. Is it possible to
broadcast trie to executors, do computation there and then aggregate the
counters (??) in reduct phase?
Thanks.
Zhan Zhang
On Aug 18,
Apache Spark Developers List, email
ml-node+s1001551n1...@n3.nabble.com
To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here
http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7865code
-in-1-1-0-tp7865p7866.html
To start a new topic under Apache Spark Developers List, email
ml-node+s1001551n1...@n3.nabble.com
To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here.
NAML
--
View this message in context:
http://apache-spark-developers-list.1001551.n3
Setting spark.driver.memory has no effect. It's still hanging trying to
compute result.count when I'm sampling greater than 35% regardless of what
value of spark.driver.memory I'm setting.
Here's my settings:
export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g
export SPARK_MEM=10g
in