Re: Kafka Streaming and partitioning

2017-02-26 Thread tonyye
Hi Dave, I had the same question and was wondering if you had found a way to do the join without causing a shuffle? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955p28425.html Sent from the Apache Spark User List

Re: Kafka Streaming and partitioning

2016-01-13 Thread David D
>>>> 1. Explicitly call PartitionBy(CutomParitioner) on the input stream RDD >>>> followed by a join. This results in a shuffle of the input stream RDD >>>> and >>>> then the co-partitioned join to take place. >>>> 2. Call join on the refer

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
ark will do a shuffle under the hood in this case and the join will >>> take >>> place. The join will do its best to run on a node that has local access >>> to >>> the reference data RDD. >>> >>> Is there any difference between the 2 methods above

Re: Kafka Streaming and partitioning

2016-01-13 Thread Dave
r any help on this issue. Dave. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
DD and not to do a shuffle. >> Spark in this case trusts that the data is setup correctly (as in the use >> case above) and simply fills in the necessary meta data on the RDD >> partitions i.e. check the first entry in each partition to determine the >> partition number of t

Re: Kafka Streaming and partitioning

2016-01-13 Thread Dave
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To un

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
artitions i.e. check the first entry in each partition to determine the > partition number of the data. > > Thank you in advance for any help on this issue. > Dave. > > > > -- > View this message in contex

Kafka Streaming and partitioning

2016-01-13 Thread ddav
on the RDD partitions i.e. check the first entry in each partition to determine the partition number of the data. Thank you in advance for any help on this issue. Dave. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning