Re: using multiple dstreams together (spark streaming)
@TD: Doesn't transformWith need both of the DStreams to be of same slideDuration. [Spark Version: 1.3.1] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-multiple-dstreams-together-spark-streaming-tp9947p24839.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: using multiple dstreams together (spark streaming)
Thanks! On Wed, Jul 16, 2014 at 6:34 PM, Tathagata Das wrote: > Have you taken a look at DStream.transformWith( ... ) . That allows you > apply arbitrary transformation between RDDs (of the same timestamp) of two > different streams. > > So you can do something like this. > > 2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2: > RDD[...]) => { > ... > // return a new RDD > }) > > > And streamingContext.transform() extends it to N DStreams. :) > > Hope this helps! > > TD > > > > > On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat > wrote: > >> hey at least it's something (thanks!) ... not sure what i'm going to do >> if i can't find a solution (other than not use spark) as i really need >> these capabilities. anyone got anything else? >> >> >> On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez < >> langel.gro...@gmail.com> wrote: >> >>> hum... maybe consuming all streams at the same time with an actor that >>> would act as a new DStream source... but this is just a random idea... I >>> don't really know if that would be a good idea or even possible. >>> >>> >>> 2014-07-16 18:30 GMT+01:00 Walrus theCat : >>> >>> Yeah -- I tried the .union operation and it didn't work for that reason. Surely there has to be a way to do this, as I imagine this is a commonly desired goal in streaming applications? On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < langel.gro...@gmail.com> wrote: > I'm joining several kafka dstreams using the join operation but you > have the limitation that the duration of the batch has to be same,i.e. 1 > second window for all dstreams... so it would not work for you. > > > 2014-07-16 18:08 GMT+01:00 Walrus theCat : > > Hi, >> >> My application has multiple dstreams on the same inputstream: >> >> dstream1 // 1 second window >> dstream2 // 2 second window >> dstream3 // 5 minute window >> >> >> I want to write logic that deals with all three windows (e.g. when >> the 1 second window differs from the 2 second window by some delta ...) >> >> I've found some examples online (there's not much out there!), and I >> can only see people transforming a single dstream. In conventional >> spark, >> we'd do this sort of thing with a cartesian on RDDs. >> >> How can I deal with multiple Dstreams at once? >> >> Thanks >> > > >>> >> >
Re: using multiple dstreams together (spark streaming)
Have you taken a look at DStream.transformWith( ... ) . That allows you apply arbitrary transformation between RDDs (of the same timestamp) of two different streams. So you can do something like this. 2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2: RDD[...]) => { ... // return a new RDD }) And streamingContext.transform() extends it to N DStreams. :) Hope this helps! TD On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat wrote: > hey at least it's something (thanks!) ... not sure what i'm going to do if > i can't find a solution (other than not use spark) as i really need these > capabilities. anyone got anything else? > > > On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >> hum... maybe consuming all streams at the same time with an actor that >> would act as a new DStream source... but this is just a random idea... I >> don't really know if that would be a good idea or even possible. >> >> >> 2014-07-16 18:30 GMT+01:00 Walrus theCat : >> >> Yeah -- I tried the .union operation and it didn't work for that reason. >>> Surely there has to be a way to do this, as I imagine this is a commonly >>> desired goal in streaming applications? >>> >>> >>> On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < >>> langel.gro...@gmail.com> wrote: >>> I'm joining several kafka dstreams using the join operation but you have the limitation that the duration of the batch has to be same,i.e. 1 second window for all dstreams... so it would not work for you. 2014-07-16 18:08 GMT+01:00 Walrus theCat : Hi, > > My application has multiple dstreams on the same inputstream: > > dstream1 // 1 second window > dstream2 // 2 second window > dstream3 // 5 minute window > > > I want to write logic that deals with all three windows (e.g. when the > 1 second window differs from the 2 second window by some delta ...) > > I've found some examples online (there's not much out there!), and I > can only see people transforming a single dstream. In conventional spark, > we'd do this sort of thing with a cartesian on RDDs. > > How can I deal with multiple Dstreams at once? > > Thanks > >>> >> >
Re: using multiple dstreams together (spark streaming)
hey at least it's something (thanks!) ... not sure what i'm going to do if i can't find a solution (other than not use spark) as i really need these capabilities. anyone got anything else? On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez < langel.gro...@gmail.com> wrote: > hum... maybe consuming all streams at the same time with an actor that > would act as a new DStream source... but this is just a random idea... I > don't really know if that would be a good idea or even possible. > > > 2014-07-16 18:30 GMT+01:00 Walrus theCat : > > Yeah -- I tried the .union operation and it didn't work for that reason. >> Surely there has to be a way to do this, as I imagine this is a commonly >> desired goal in streaming applications? >> >> >> On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < >> langel.gro...@gmail.com> wrote: >> >>> I'm joining several kafka dstreams using the join operation but you have >>> the limitation that the duration of the batch has to be same,i.e. 1 second >>> window for all dstreams... so it would not work for you. >>> >>> >>> 2014-07-16 18:08 GMT+01:00 Walrus theCat : >>> >>> Hi, My application has multiple dstreams on the same inputstream: dstream1 // 1 second window dstream2 // 2 second window dstream3 // 5 minute window I want to write logic that deals with all three windows (e.g. when the 1 second window differs from the 2 second window by some delta ...) I've found some examples online (there's not much out there!), and I can only see people transforming a single dstream. In conventional spark, we'd do this sort of thing with a cartesian on RDDs. How can I deal with multiple Dstreams at once? Thanks >>> >>> >> >
Re: using multiple dstreams together (spark streaming)
Or, if not, is there a way to do this in terms of a single dstream? Keep in mind that dstream1, dstream2, and dstream3 have already had transformations applied. I tried creating the dstreams by calling .window on the first one, but that ends up with me having ... 3 dstreams... which is the same problem. On Wed, Jul 16, 2014 at 10:30 AM, Walrus theCat wrote: > Yeah -- I tried the .union operation and it didn't work for that reason. > Surely there has to be a way to do this, as I imagine this is a commonly > desired goal in streaming applications? > > > On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >> I'm joining several kafka dstreams using the join operation but you have >> the limitation that the duration of the batch has to be same,i.e. 1 second >> window for all dstreams... so it would not work for you. >> >> >> 2014-07-16 18:08 GMT+01:00 Walrus theCat : >> >> Hi, >>> >>> My application has multiple dstreams on the same inputstream: >>> >>> dstream1 // 1 second window >>> dstream2 // 2 second window >>> dstream3 // 5 minute window >>> >>> >>> I want to write logic that deals with all three windows (e.g. when the 1 >>> second window differs from the 2 second window by some delta ...) >>> >>> I've found some examples online (there's not much out there!), and I can >>> only see people transforming a single dstream. In conventional spark, we'd >>> do this sort of thing with a cartesian on RDDs. >>> >>> How can I deal with multiple Dstreams at once? >>> >>> Thanks >>> >> >> >
Re: using multiple dstreams together (spark streaming)
hum... maybe consuming all streams at the same time with an actor that would act as a new DStream source... but this is just a random idea... I don't really know if that would be a good idea or even possible. 2014-07-16 18:30 GMT+01:00 Walrus theCat : > Yeah -- I tried the .union operation and it didn't work for that reason. > Surely there has to be a way to do this, as I imagine this is a commonly > desired goal in streaming applications? > > > On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >> I'm joining several kafka dstreams using the join operation but you have >> the limitation that the duration of the batch has to be same,i.e. 1 second >> window for all dstreams... so it would not work for you. >> >> >> 2014-07-16 18:08 GMT+01:00 Walrus theCat : >> >> Hi, >>> >>> My application has multiple dstreams on the same inputstream: >>> >>> dstream1 // 1 second window >>> dstream2 // 2 second window >>> dstream3 // 5 minute window >>> >>> >>> I want to write logic that deals with all three windows (e.g. when the 1 >>> second window differs from the 2 second window by some delta ...) >>> >>> I've found some examples online (there's not much out there!), and I can >>> only see people transforming a single dstream. In conventional spark, we'd >>> do this sort of thing with a cartesian on RDDs. >>> >>> How can I deal with multiple Dstreams at once? >>> >>> Thanks >>> >> >> >
Re: using multiple dstreams together (spark streaming)
Yeah -- I tried the .union operation and it didn't work for that reason. Surely there has to be a way to do this, as I imagine this is a commonly desired goal in streaming applications? On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < langel.gro...@gmail.com> wrote: > I'm joining several kafka dstreams using the join operation but you have > the limitation that the duration of the batch has to be same,i.e. 1 second > window for all dstreams... so it would not work for you. > > > 2014-07-16 18:08 GMT+01:00 Walrus theCat : > > Hi, >> >> My application has multiple dstreams on the same inputstream: >> >> dstream1 // 1 second window >> dstream2 // 2 second window >> dstream3 // 5 minute window >> >> >> I want to write logic that deals with all three windows (e.g. when the 1 >> second window differs from the 2 second window by some delta ...) >> >> I've found some examples online (there's not much out there!), and I can >> only see people transforming a single dstream. In conventional spark, we'd >> do this sort of thing with a cartesian on RDDs. >> >> How can I deal with multiple Dstreams at once? >> >> Thanks >> > >
Re: using multiple dstreams together (spark streaming)
I'm joining several kafka dstreams using the join operation but you have the limitation that the duration of the batch has to be same,i.e. 1 second window for all dstreams... so it would not work for you. 2014-07-16 18:08 GMT+01:00 Walrus theCat : > Hi, > > My application has multiple dstreams on the same inputstream: > > dstream1 // 1 second window > dstream2 // 2 second window > dstream3 // 5 minute window > > > I want to write logic that deals with all three windows (e.g. when the 1 > second window differs from the 2 second window by some delta ...) > > I've found some examples online (there's not much out there!), and I can > only see people transforming a single dstream. In conventional spark, we'd > do this sort of thing with a cartesian on RDDs. > > How can I deal with multiple Dstreams at once? > > Thanks >