I proposed a fix https://github.com/apache/spark/pull/2524
Glad to receive feedbacks
--
Nan Zhu
On Tuesday, September 23, 2014 at 9:06 PM, Sandy Ryza wrote:
Filed https://issues.apache.org/jira/browse/SPARK-3642 for documenting these
nuances.
-Sandy
On Mon, Sep 22, 2014 at
If you think it as necessary to fix, I would like to resubmit that PR (seems to
have some conflicts with the current DAGScheduler)
My suggestion is to make it as an option in accumulator, e.g. some algorithms
utilizing accumulator for result calculation, it needs a deterministic
accumulator,
MapReduce counters do not count duplications. In MapReduce, if a task
needs to be re-run, the value of the counter from the second task
overwrites the value from the first task.
-Sandy
On Mon, Sep 22, 2014 at 4:55 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
If you think it as necessary to fix,
I see, thanks for pointing this out
--
Nan Zhu
On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote:
MapReduce counters do not count duplications. In MapReduce, if a task needs
to be re-run, the value of the counter from the second task overwrites the
value from the first
Hmm, good point, this seems to have been broken by refactorings of the
scheduler, but it worked in the past. Basically the solution is simple -- in a
result stage, we should not apply the update for each task ID more than once --
the same way we don't call job.listener.taskSucceeded more than
Hey Sandy,
On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote:
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data