Re: Union of RDDs without the overhead of Union
well the "hadoop" way is to save to a/b and a/c and read from a/* :) On Tue, Feb 2, 2016 at 11:05 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Spark users and developers, > > anyone knows how to union two RDDs without the overhead of it? > > say rdd1.union(rdd2).saveTextFile(..) > This requires a stage to union the 2 rdds before saveAsTextFile (2 > stages). Is there a way to skip the union step but have the contents of the > two rdds save to the same output text file? > > Thank you! > > Jerry >
Re: Union of RDDs without the overhead of Union
i am surprised union introduces a stage. UnionRDD should have only narrow dependencies. On Tue, Feb 2, 2016 at 11:25 PM, Koert Kuipers <ko...@tresata.com> wrote: > well the "hadoop" way is to save to a/b and a/c and read from a/* :) > > On Tue, Feb 2, 2016 at 11:05 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi Spark users and developers, >> >> anyone knows how to union two RDDs without the overhead of it? >> >> say rdd1.union(rdd2).saveTextFile(..) >> This requires a stage to union the 2 rdds before saveAsTextFile (2 >> stages). Is there a way to skip the union step but have the contents of the >> two rdds save to the same output text file? >> >> Thank you! >> >> Jerry >> > >
Re: Union of RDDs without the overhead of Union
Agree with Koert that UnionRDD should have a narrow dependencies . Although union of two RDDs increases the number of tasks to be executed ( rdd1.partitions + rdd2.partitions) . If your two RDDs have same number of partitions , you can also use zipPartitions, which causes lesser number of tasks, hence less overhead. On Wed, Feb 3, 2016 at 9:58 AM, Koert Kuipers <ko...@tresata.com> wrote: > i am surprised union introduces a stage. UnionRDD should have only narrow > dependencies. > > On Tue, Feb 2, 2016 at 11:25 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> well the "hadoop" way is to save to a/b and a/c and read from a/* :) >> >> On Tue, Feb 2, 2016 at 11:05 PM, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi Spark users and developers, >>> >>> anyone knows how to union two RDDs without the overhead of it? >>> >>> say rdd1.union(rdd2).saveTextFile(..) >>> This requires a stage to union the 2 rdds before saveAsTextFile (2 >>> stages). Is there a way to skip the union step but have the contents of the >>> two rdds save to the same output text file? >>> >>> Thank you! >>> >>> Jerry >>> >> >> > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra
Union of RDDs without the overhead of Union
Hi Spark users and developers, anyone knows how to union two RDDs without the overhead of it? say rdd1.union(rdd2).saveTextFile(..) This requires a stage to union the 2 rdds before saveAsTextFile (2 stages). Is there a way to skip the union step but have the contents of the two rdds save to the same output text file? Thank you! Jerry