Thank you very much. I had overlooked the differences between the two.
The public API part is understandable.
Coming to second part. - I see that it creates an instance of UnionRDD with
all RDDs as parent there by preventing long lineage chain.
Is my understanding correct?
On 5 February 2018 at
First, the public API cannot be changed except when there is a major
version change, and there is no way that we are going to do Spark 3.0.0
just for this change.
Second, the change would be a mistake since the two different union methods
are quite different. The method in RDD only ever works on t
There is one on RDD but `SparkContext.union` prevents lineage from growing.
Check https://stackoverflow.com/q/34461804
Sent with [ProtonMail](https://protonmail.com) Secure Email.
Original Message
On February 5, 2018 5:04 PM, Suchith J N wrote:
> Hi,
>
> Seems like simple cl
Hi,
Seems like simple clean up - Why do we have union() on RDDs in
SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems
like a wrapper around this.
Regards,
Suchith