Re: SPARK MLLib - How to tie back Model.predict output to original data?
There is a spark-ts package developed by Sandy which has rdd version. Not sure about the dataframe roadmap. http://sryza.github.io/spark-timeseries/0.3.0/index.html On Aug 18, 2016 12:42 AM, "ayan guha"wrote: > Thanks a lot. I resolved it using an UDF. > > Qs: does spark support any time series model? Is there any roadmap to know > when a feature will be roughly available? > On 18 Aug 2016 16:46, "Yanbo Liang" wrote: > >> If you want to tie them with other data, I think the best way is to use >> DataFrame join operation on condition that they share an identity column. >> >> Thanks >> Yanbo >> >> 2016-08-16 20:39 GMT-07:00 ayan guha : >> >>> Hi >>> >>> Thank you for your reply. Yes, I can get prediction and original >>> features together. My question is how to tie them back to other parts of >>> the data, which was not in LP. >>> >>> For example, I have a bunch of other dimensions which are not part of >>> features or label. >>> >>> Sorry if this is a stupid question. >>> >>> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang >>> wrote: >>> MLlib will keep the original dataset during transformation, it just append new columns to existing DataFrame. That is you can get both prediction value and original features from the output DataFrame of model.transform. Thanks Yanbo 2016-08-16 17:48 GMT-07:00 ayan guha : > Hi > > I have a dataset as follows: > > DF: > amount:float > date_read:date > meter_number:string > > I am trying to predict future amount based on past 3 weeks consumption > (and a heaps of weather data related to date). > > My Labelpoint looks like > > label (populated from DF.amount) > features (populated from a bunch of other stuff) > > Model.predict output: > label > prediction > > Now, I am trying to put together this prediction value back to meter > number and date_read from original DF? > > One way to assume order of records in DF and Model.predict will be > exactly same and zip two RDDs. But any other (possibly better) solution? > > -- > Best Regards, > Ayan Guha > >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >>
Re: SPARK MLLib - How to tie back Model.predict output to original data?
Thanks a lot. I resolved it using an UDF. Qs: does spark support any time series model? Is there any roadmap to know when a feature will be roughly available? On 18 Aug 2016 16:46, "Yanbo Liang"wrote: > If you want to tie them with other data, I think the best way is to use > DataFrame join operation on condition that they share an identity column. > > Thanks > Yanbo > > 2016-08-16 20:39 GMT-07:00 ayan guha : > >> Hi >> >> Thank you for your reply. Yes, I can get prediction and original features >> together. My question is how to tie them back to other parts of the data, >> which was not in LP. >> >> For example, I have a bunch of other dimensions which are not part of >> features or label. >> >> Sorry if this is a stupid question. >> >> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang wrote: >> >>> MLlib will keep the original dataset during transformation, it just >>> append new columns to existing DataFrame. That is you can get both >>> prediction value and original features from the output DataFrame of >>> model.transform. >>> >>> Thanks >>> Yanbo >>> >>> 2016-08-16 17:48 GMT-07:00 ayan guha : >>> Hi I have a dataset as follows: DF: amount:float date_read:date meter_number:string I am trying to predict future amount based on past 3 weeks consumption (and a heaps of weather data related to date). My Labelpoint looks like label (populated from DF.amount) features (populated from a bunch of other stuff) Model.predict output: label prediction Now, I am trying to put together this prediction value back to meter number and date_read from original DF? One way to assume order of records in DF and Model.predict will be exactly same and zip two RDDs. But any other (possibly better) solution? -- Best Regards, Ayan Guha >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >
Re: SPARK MLLib - How to tie back Model.predict output to original data?
If you want to tie them with other data, I think the best way is to use DataFrame join operation on condition that they share an identity column. Thanks Yanbo 2016-08-16 20:39 GMT-07:00 ayan guha: > Hi > > Thank you for your reply. Yes, I can get prediction and original features > together. My question is how to tie them back to other parts of the data, > which was not in LP. > > For example, I have a bunch of other dimensions which are not part of > features or label. > > Sorry if this is a stupid question. > > On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang wrote: > >> MLlib will keep the original dataset during transformation, it just >> append new columns to existing DataFrame. That is you can get both >> prediction value and original features from the output DataFrame of >> model.transform. >> >> Thanks >> Yanbo >> >> 2016-08-16 17:48 GMT-07:00 ayan guha : >> >>> Hi >>> >>> I have a dataset as follows: >>> >>> DF: >>> amount:float >>> date_read:date >>> meter_number:string >>> >>> I am trying to predict future amount based on past 3 weeks consumption >>> (and a heaps of weather data related to date). >>> >>> My Labelpoint looks like >>> >>> label (populated from DF.amount) >>> features (populated from a bunch of other stuff) >>> >>> Model.predict output: >>> label >>> prediction >>> >>> Now, I am trying to put together this prediction value back to meter >>> number and date_read from original DF? >>> >>> One way to assume order of records in DF and Model.predict will be >>> exactly same and zip two RDDs. But any other (possibly better) solution? >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> > > > -- > Best Regards, > Ayan Guha >
Re: SPARK MLLib - How to tie back Model.predict output to original data?
Hi Thank you for your reply. Yes, I can get prediction and original features together. My question is how to tie them back to other parts of the data, which was not in LP. For example, I have a bunch of other dimensions which are not part of features or label. Sorry if this is a stupid question. On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liangwrote: > MLlib will keep the original dataset during transformation, it just append > new columns to existing DataFrame. That is you can get both prediction > value and original features from the output DataFrame of model.transform. > > Thanks > Yanbo > > 2016-08-16 17:48 GMT-07:00 ayan guha : > >> Hi >> >> I have a dataset as follows: >> >> DF: >> amount:float >> date_read:date >> meter_number:string >> >> I am trying to predict future amount based on past 3 weeks consumption >> (and a heaps of weather data related to date). >> >> My Labelpoint looks like >> >> label (populated from DF.amount) >> features (populated from a bunch of other stuff) >> >> Model.predict output: >> label >> prediction >> >> Now, I am trying to put together this prediction value back to meter >> number and date_read from original DF? >> >> One way to assume order of records in DF and Model.predict will be >> exactly same and zip two RDDs. But any other (possibly better) solution? >> >> -- >> Best Regards, >> Ayan Guha >> > > -- Best Regards, Ayan Guha
Re: SPARK MLLib - How to tie back Model.predict output to original data?
MLlib will keep the original dataset during transformation, it just append new columns to existing DataFrame. That is you can get both prediction value and original features from the output DataFrame of model.transform. Thanks Yanbo 2016-08-16 17:48 GMT-07:00 ayan guha: > Hi > > I have a dataset as follows: > > DF: > amount:float > date_read:date > meter_number:string > > I am trying to predict future amount based on past 3 weeks consumption > (and a heaps of weather data related to date). > > My Labelpoint looks like > > label (populated from DF.amount) > features (populated from a bunch of other stuff) > > Model.predict output: > label > prediction > > Now, I am trying to put together this prediction value back to meter > number and date_read from original DF? > > One way to assume order of records in DF and Model.predict will be exactly > same and zip two RDDs. But any other (possibly better) solution? > > -- > Best Regards, > Ayan Guha >
SPARK MLLib - How to tie back Model.predict output to original data?
Hi I have a dataset as follows: DF: amount:float date_read:date meter_number:string I am trying to predict future amount based on past 3 weeks consumption (and a heaps of weather data related to date). My Labelpoint looks like label (populated from DF.amount) features (populated from a bunch of other stuff) Model.predict output: label prediction Now, I am trying to put together this prediction value back to meter number and date_read from original DF? One way to assume order of records in DF and Model.predict will be exactly same and zip two RDDs. But any other (possibly better) solution? -- Best Regards, Ayan Guha