Very abstract.
EC2 is unlikely culprit.
What are you trying to do. Spark is typically not inconsistent like that
but huge intermediate data, reduce size issues could be involved, but hard
to help without some more detail of what you are trying to achieve.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Tue, Apr 22, 2014 at 7:30 PM, Aureliano Buendia <buendia...@gmail.com>wrote:

> Hi,
>
> Sometimes running the very same spark application binary, behaves
> differently with every execution.
>
>    - The Ganglia profile is different with every execution: sometimes it
>    takes 0.5 TB of memory, the next time it takes 1 TB of memory, the next
>    time it is 0.75 TB...
>    - Spark UI shows number of succeeded tasks is more than total number
>    of tasks, eg: 3500/3000. There are no failed tasks. At this stage the
>    computation keeps carrying on for a long time without returning an answer.
>    - The only way to get an answer from an application is to hopelessly
>    keep running that application multiple times, until by some luck it gets
>    converged.
>
> I was not able to regenerate this by a minimal code, as it seems some
> random factors affect this behavior. I have a suspicion, but I'm not sure,
> that use of one or more groupByKey() calls intensifies this problem.
>
> Another source of suspicion is the unpredicted performance of ec2 clusters
> with latency and io.
>
> Is this a known issue with spark?
>

Reply via email to