Yep, and it works fine for operations which does not involve any shuffle
(like foreach,, count etc) and those which involves shuffle operations ends
up in an infinite loop. Spark should somehow indicate this instead of going
in an infinite loop.
Thanks
Best Regards
On Thu, Aug 13, 2015 at 11:37
What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are used in spark [1].
They cannot be arbitrarily numbered,have duplicates, etc.
Extending RDD to add functionality is typically for niche
Thanks for the clarifications Mrithul.
Thanks
Best Regards
On Fri, Aug 14, 2015 at 1:04 PM, Mridul Muralidharan mri...@gmail.com
wrote:
What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are
oh I see, you are defining your own RDD Partition types, and you had a
bug where partition.index did not line up with the partitions slot in
rdd.getPartitions. Is that correct?
On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
I figured that out, And these are my
yikes.
Was this a one-time thing? Or does it happen consistently? can you turn
on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)
On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Hi
My Spark job (running in local[*] with spark 1.4.1)
Hi
My Spark job (running in local[*] with spark 1.4.1) reads data from a
thrift server(Created an RDD, it will compute the partitions in
getPartitions() call and in computes hasNext will return records from these
partitions), count(), foreach() is working fine it returns the correct
number of