Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-18 Thread janardhan shetty
There is a spark-ts package developed by Sandy which has rdd version.
Not sure about the dataframe roadmap.

http://sryza.github.io/spark-timeseries/0.3.0/index.html
On Aug 18, 2016 12:42 AM, "ayan guha"  wrote:

> Thanks a lot. I resolved it using an UDF.
>
> Qs: does spark support any time series model? Is there any roadmap to know
> when a feature will be roughly available?
> On 18 Aug 2016 16:46, "Yanbo Liang"  wrote:
>
>> If you want to tie them with other data, I think the best way is to use
>> DataFrame join operation on condition that they share an identity column.
>>
>> Thanks
>> Yanbo
>>
>> 2016-08-16 20:39 GMT-07:00 ayan guha :
>>
>>> Hi
>>>
>>> Thank you for your reply. Yes, I can get prediction and original
>>> features together. My question is how to tie them back to other parts of
>>> the data, which was not in LP.
>>>
>>> For example, I have a bunch of other dimensions which are not part of
>>> features or label.
>>>
>>> Sorry if this is a stupid question.
>>>
>>> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang 
>>> wrote:
>>>
 MLlib will keep the original dataset during transformation, it just
 append new columns to existing DataFrame. That is you can get both
 prediction value and original features from the output DataFrame of
 model.transform.

 Thanks
 Yanbo

 2016-08-16 17:48 GMT-07:00 ayan guha :

> Hi
>
> I have a dataset as follows:
>
> DF:
> amount:float
> date_read:date
> meter_number:string
>
> I am trying to predict future amount based on past 3 weeks consumption
> (and a heaps of weather data related to date).
>
> My Labelpoint looks like
>
> label (populated from DF.amount)
> features (populated from a bunch of other stuff)
>
> Model.predict output:
> label
> prediction
>
> Now, I am trying to put together this prediction value back to meter
> number and date_read from original DF?
>
> One way to assume order of records in DF and Model.predict will be
> exactly same and zip two RDDs. But any other (possibly better) solution?
>
> --
> Best Regards,
> Ayan Guha
>


>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>


Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-18 Thread ayan guha
Thanks a lot. I resolved it using an UDF.

Qs: does spark support any time series model? Is there any roadmap to know
when a feature will be roughly available?
On 18 Aug 2016 16:46, "Yanbo Liang"  wrote:

> If you want to tie them with other data, I think the best way is to use
> DataFrame join operation on condition that they share an identity column.
>
> Thanks
> Yanbo
>
> 2016-08-16 20:39 GMT-07:00 ayan guha :
>
>> Hi
>>
>> Thank you for your reply. Yes, I can get prediction and original features
>> together. My question is how to tie them back to other parts of the data,
>> which was not in LP.
>>
>> For example, I have a bunch of other dimensions which are not part of
>> features or label.
>>
>> Sorry if this is a stupid question.
>>
>> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang  wrote:
>>
>>> MLlib will keep the original dataset during transformation, it just
>>> append new columns to existing DataFrame. That is you can get both
>>> prediction value and original features from the output DataFrame of
>>> model.transform.
>>>
>>> Thanks
>>> Yanbo
>>>
>>> 2016-08-16 17:48 GMT-07:00 ayan guha :
>>>
 Hi

 I have a dataset as follows:

 DF:
 amount:float
 date_read:date
 meter_number:string

 I am trying to predict future amount based on past 3 weeks consumption
 (and a heaps of weather data related to date).

 My Labelpoint looks like

 label (populated from DF.amount)
 features (populated from a bunch of other stuff)

 Model.predict output:
 label
 prediction

 Now, I am trying to put together this prediction value back to meter
 number and date_read from original DF?

 One way to assume order of records in DF and Model.predict will be
 exactly same and zip two RDDs. But any other (possibly better) solution?

 --
 Best Regards,
 Ayan Guha

>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-18 Thread Yanbo Liang
If you want to tie them with other data, I think the best way is to use
DataFrame join operation on condition that they share an identity column.

Thanks
Yanbo

2016-08-16 20:39 GMT-07:00 ayan guha :

> Hi
>
> Thank you for your reply. Yes, I can get prediction and original features
> together. My question is how to tie them back to other parts of the data,
> which was not in LP.
>
> For example, I have a bunch of other dimensions which are not part of
> features or label.
>
> Sorry if this is a stupid question.
>
> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang  wrote:
>
>> MLlib will keep the original dataset during transformation, it just
>> append new columns to existing DataFrame. That is you can get both
>> prediction value and original features from the output DataFrame of
>> model.transform.
>>
>> Thanks
>> Yanbo
>>
>> 2016-08-16 17:48 GMT-07:00 ayan guha :
>>
>>> Hi
>>>
>>> I have a dataset as follows:
>>>
>>> DF:
>>> amount:float
>>> date_read:date
>>> meter_number:string
>>>
>>> I am trying to predict future amount based on past 3 weeks consumption
>>> (and a heaps of weather data related to date).
>>>
>>> My Labelpoint looks like
>>>
>>> label (populated from DF.amount)
>>> features (populated from a bunch of other stuff)
>>>
>>> Model.predict output:
>>> label
>>> prediction
>>>
>>> Now, I am trying to put together this prediction value back to meter
>>> number and date_read from original DF?
>>>
>>> One way to assume order of records in DF and Model.predict will be
>>> exactly same and zip two RDDs. But any other (possibly better) solution?
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-16 Thread ayan guha
Hi

Thank you for your reply. Yes, I can get prediction and original features
together. My question is how to tie them back to other parts of the data,
which was not in LP.

For example, I have a bunch of other dimensions which are not part of
features or label.

Sorry if this is a stupid question.

On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang  wrote:

> MLlib will keep the original dataset during transformation, it just append
> new columns to existing DataFrame. That is you can get both prediction
> value and original features from the output DataFrame of model.transform.
>
> Thanks
> Yanbo
>
> 2016-08-16 17:48 GMT-07:00 ayan guha :
>
>> Hi
>>
>> I have a dataset as follows:
>>
>> DF:
>> amount:float
>> date_read:date
>> meter_number:string
>>
>> I am trying to predict future amount based on past 3 weeks consumption
>> (and a heaps of weather data related to date).
>>
>> My Labelpoint looks like
>>
>> label (populated from DF.amount)
>> features (populated from a bunch of other stuff)
>>
>> Model.predict output:
>> label
>> prediction
>>
>> Now, I am trying to put together this prediction value back to meter
>> number and date_read from original DF?
>>
>> One way to assume order of records in DF and Model.predict will be
>> exactly same and zip two RDDs. But any other (possibly better) solution?
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha


Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-16 Thread Yanbo Liang
MLlib will keep the original dataset during transformation, it just append
new columns to existing DataFrame. That is you can get both prediction
value and original features from the output DataFrame of model.transform.

Thanks
Yanbo

2016-08-16 17:48 GMT-07:00 ayan guha :

> Hi
>
> I have a dataset as follows:
>
> DF:
> amount:float
> date_read:date
> meter_number:string
>
> I am trying to predict future amount based on past 3 weeks consumption
> (and a heaps of weather data related to date).
>
> My Labelpoint looks like
>
> label (populated from DF.amount)
> features (populated from a bunch of other stuff)
>
> Model.predict output:
> label
> prediction
>
> Now, I am trying to put together this prediction value back to meter
> number and date_read from original DF?
>
> One way to assume order of records in DF and Model.predict will be exactly
> same and zip two RDDs. But any other (possibly better) solution?
>
> --
> Best Regards,
> Ayan Guha
>


SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-16 Thread ayan guha
Hi

I have a dataset as follows:

DF:
amount:float
date_read:date
meter_number:string

I am trying to predict future amount based on past 3 weeks consumption (and
a heaps of weather data related to date).

My Labelpoint looks like

label (populated from DF.amount)
features (populated from a bunch of other stuff)

Model.predict output:
label
prediction

Now, I am trying to put together this prediction value back to meter number
and date_read from original DF?

One way to assume order of records in DF and Model.predict will be exactly
same and zip two RDDs. But any other (possibly better) solution?

-- 
Best Regards,
Ayan Guha