Is it because countByValue or toArray put too much stress on the driver, if
there are many unique words
To me it is a typical word count problem, then you can solve it as follows
(correct me if I am wrong)
val textFile = sc.textFile(“file)
val counts = textFile.flatMap(line = line.split(
Hi Zhan,
Thanks for looking into this. I'm actually using the hash map as an example
of the simplest snippet of code that is failing for me. I know that this is
just the word count. In my actual problem I'm using a Trie data structure
to find substring matches.
On Sun, Aug 17, 2014 at 11:35 PM,
Not sure exactly how you use it. My understanding is that in spark it would be
better to keep the overhead of driver as less as possible. Is it possible to
broadcast trie to executors, do computation there and then aggregate the
counters (??) in reduct phase?
Thanks.
Zhan Zhang
On Aug 18,
Hi all,
I'm reading the implementation of the shuffle in Spark.
My understanding is that it's not overlapping with upstream stage.
Is it helpful to overlap the computation of upstream stage w/ the shuffle (I
mean the network copy, like in Hadoop)? If it is, is there any plan to
implement it in
I think there's some discussion of this at
https://issues.apache.org/jira/browse/SPARK-2387 and
https://github.com/apache/spark/pull/1328.
- Josh
On Mon, Aug 18, 2014 at 9:46 AM, zycodefish opensourcecodef...@gmail.com
wrote:
Hi all,
I'm reading the implementation of the shuffle in Spark.
The exception indicates that the forked process doesn’t executed as
expected, thus the test case *should* fail.
Instead of replacing the exception with a logWarning, capturing and
printing stdout/stderr of the forked process can be helpful for diagnosis.
Currently the only information we have at
Hi,
We are running the snapshots (new spark features) on YARN and I was
wondering if the webui is available on YARN mode...
The deployment document does not mention webui on YARN mode...
Is it available ?
Thanks.
Deb