Re: Apply ML to grouped dataframe

2016-08-23 Thread Wen Pei Yu

Thank you Ayan.

For example, I have a dataframe below. And consider column "group" as key
to split this dataframe to three part, then want use kmeans to each split
part. To get each group's kmeans result.

+---+-++
| userID|group|features|
+---+-++
|12462563356|1|  [5.0,43.0]|
|12462563701|2|   [1.0,8.0]|
|12462563701|1|  [2.0,12.0]|
|12462564356|1|   [1.0,1.0]|
|12462565487|3|   [2.0,3.0]|
|12462565698|2|   [1.0,1.0]|
|12462565698|1|   [1.0,1.0]|
|12462566081|2|   [1.0,2.0]|
|12462566081|1|  [1.0,15.0]|
|12462566225|2|   [1.0,1.0]|
|12462566225|1|  [9.0,85.0]|
|12462566526|2|   [1.0,1.0]|
|12462566526|1|  [3.0,79.0]|
|12462567006|2| [11.0,15.0]|
|12462567006|1| [10.0,15.0]|
|12462567006|3| [10.0,15.0]|
|12462586595|2|  [2.0,42.0]|
|12462586595|3|  [2.0,16.0]|
|12462589343|3|   [1.0,1.0]|
+---+-++



From:   ayan guha <guha.a...@gmail.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: user <user@spark.apache.org>, Nirmal Fernando <nir...@wso2.com>
Date:   08/23/2016 05:13 PM
Subject:    Re: Apply ML to grouped dataframe



I would suggest you to construct a toy problem and post for solution. At
this moment it's a little unclear what your intentions are.


Generally speaking, group by on a data frame created another data frame,
not multiple ones.


On 23 Aug 2016 16:35, "Wen Pei Yu" <yuw...@cn.ibm.com> wrote:
  Hi Mirmal

  Filter works fine if I want handle one of grouped dataframe. But I has
  multiple grouped dataframe, I wish I can apply ML algorithm to all of
  them in one job, but not in for loops.

  Wenpei.

  Inactive hide details for Nirmal Fernando ---08/23/2016 01:55:46 PM---On
  Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuwenp@cn.iNirmal Fernando
  ---08/23/2016 01:55:46 PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu
  <yuw...@cn.ibm.com> wrote: > We can group a dataframe b

  From: Nirmal Fernando <nir...@wso2.com>
  To: Wen Pei Yu/China/IBM@IBMCN
  Cc: User <user@spark.apache.org>
  Date: 08/23/2016 01:55 PM
  Subject: Re: Apply ML to grouped dataframe





  On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
We can group a dataframe by one column like

df.groupBy(df.col("gender"))


  On top of this DF, use a filter that would enable you to extract the
  grouped DF as separated DFs. Then you can apply ML on top of each DF.

  eg: xyzDF.filter(col("x").equalTo(x))

It like split a dataframe to multiple dataframe. Currently, we can
only apply simple sql function to this GroupedData like agg, max
etc.

What we want is apply one ML algorithm to each group.

Regards.

Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48
PM---Hi Wen, AFAIK Spark MLlib implements its machine learning
Nirmal Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark
MLlib implements its machine learning algorithms on top of

From: Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
    Cc: User <user@spark.apache.org>
Date: 08/23/2016 01:14 PM



Subject: Re: Apply ML to grouped dataframe



Hi Wen,

AFAIK Spark MLlib implements its machine learning algorithms on top
of Spark dataframe API. What did you mean by a grouped dataframe?

On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <yuw...@cn.ibm.com>
wrote:
Hi Nirmal

I didn't get your point.
Can you tell me more about how to use MLlib to grouped
dataframe?

Regards.
Wenpei.

Inactive hide details for Nirmal Fernando ---08/23/2016
10:26:36 AM---You can use Spark MLlib
http://spark.apache.org/docs/lateNirmal Fernando
---08/23/2016 10:26:36 AM---You can use Spark MLlib

http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas


From: Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
    Cc: User <user@spark.apache.org>
Date: 08/23/2016 10:26 AM
Subject: Re: Apply ML to grouped dataframe




You can use Spark MLlib

http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api


On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <
yuw...@cn.ibm.com> wrote:
Hi

We have a dataframe, then want
   

Re: Apply ML to grouped dataframe

2016-08-23 Thread ayan guha
I would suggest you to construct a toy problem and post for solution. At
this moment it's a little unclear what your intentions are.

Generally speaking, group by on a data frame created another data frame,
not multiple ones.
On 23 Aug 2016 16:35, "Wen Pei Yu" <yuw...@cn.ibm.com> wrote:

> Hi Mirmal
>
> Filter works fine if I want handle one of grouped dataframe. But I has
> multiple grouped dataframe, I wish I can apply ML algorithm to all of them
> in one job, but not in for loops.
>
> Wenpei.
>
> [image: Inactive hide details for Nirmal Fernando ---08/23/2016 01:55:46
> PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuwenp@cn.i]Nirmal
> Fernando ---08/23/2016 01:55:46 PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen
> Pei Yu <yuw...@cn.ibm.com> wrote: > We can group a dataframe b
>
> From: Nirmal Fernando <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 01:55 PM
> Subject: Re: Apply ML to grouped dataframe
> --
>
>
>
>
>
> On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
> <yuw...@cn.ibm.com>> wrote:
>
>We can group a dataframe by one column like
>
>df.groupBy(df.col("gender"))
>
>
>
> On top of this DF, use a filter that would enable you to extract the
> grouped DF as separated DFs. Then you can apply ML on top of each DF.
>
> eg: xyzDF.filter(col("x").equalTo(x))
>
>
>It like split a dataframe to multiple dataframe. Currently, we can
>only apply simple sql function to this GroupedData like agg, max etc.
>
>What we want is apply one ML algorithm to each group.
>
>Regards.
>
>[image: Inactive hide details for Nirmal Fernando ---08/23/2016
>01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements its machine 
> learning]Nirmal
>Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements
>its machine learning algorithms on top of
>
>From: Nirmal Fernando <*nir...@wso2.com* <nir...@wso2.com>>
>To: Wen Pei Yu/China/IBM@IBMCN
>Cc: User <*user@spark.apache.org* <user@spark.apache.org>>
>Date: 08/23/2016 01:14 PM
>
>
>Subject: Re: Apply ML to grouped dataframe
>--
>
>
>
>Hi Wen,
>
>AFAIK Spark MLlib implements its machine learning algorithms on top of
>Spark dataframe API. What did you mean by a grouped dataframe?
>
>On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
><yuw...@cn.ibm.com>> wrote:
>   Hi Nirmal
>
>  I didn't get your point.
>  Can you tell me more about how to use MLlib to grouped dataframe?
>
>  Regards.
>  Wenpei.
>
>  [image: Inactive hide details for Nirmal Fernando ---08/23/2016
>  10:26:36 AM---You can use Spark MLlib 
> http://spark.apache.org/docs/late]Nirmal
>  Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib
>  
> *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas*
>  
> <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas>
>
>  From: Nirmal Fernando <*nir...@wso2.com* <nir...@wso2.com>>
>  To: Wen Pei Yu/China/IBM@IBMCN
>  Cc: User <*user@spark.apache.org* <user@spark.apache.org>>
>  Date: 08/23/2016 10:26 AM
>  Subject: Re: Apply ML to grouped dataframe
>  --
>
>
>
>
>  You can use Spark MLlib
>  
> *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api*
>  
> <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api>
>
>  On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
>  <yuw...@cn.ibm.com>> wrote:
> Hi
>
>  We have a dataframe, then want group it and apply a
>  ML algorithm or statistics(say t test) to each one. Is 
> there any efficient
>  way for this situation?
>
>  Currently, we transfer to pyspark, use groupbykey
>  and apply numpy function to array. But this wasn't an 
> efficient way, right?
>
>  Regards.
>  Wenpei.
>
>
>
>
>  --
>
>  Thanks & regards,
>  Nirmal
>
>  Team Lead - WSO2 Machine Learner
>  Associate Technical Lead - Data Technologies Team, WSO2 Inc.
&

Re: Apply ML to grouped dataframe

2016-08-23 Thread Wen Pei Yu

Hi Mirmal

Filter works fine if I want handle one of grouped dataframe. But I has
multiple grouped dataframe, I wish I can apply ML algorithm to all of them
in one job, but not in for loops.

Wenpei.



From:   Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
Date:   08/23/2016 01:55 PM
Subject:    Re: Apply ML to grouped dataframe





On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
  We can group a dataframe by one column like

  df.groupBy(df.col("gender"))



On top of this DF, use a filter that would enable you to extract the
grouped DF as separated DFs. Then you can apply ML on top of each DF.

eg: xyzDF.filter(col("x").equalTo(x))

  It like split a dataframe to multiple dataframe. Currently, we can only
  apply simple sql function to this GroupedData like agg, max etc.

  What we want is apply one ML algorithm to each group.

  Regards.

  Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48 PM---Hi
  Wen, AFAIK Spark MLlib implements its machine learningNirmal Fernando
  ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements its
  machine learning algorithms on top of

  From: Nirmal Fernando <nir...@wso2.com>
  To: Wen Pei Yu/China/IBM@IBMCN
  Cc: User <user@spark.apache.org>
  Date: 08/23/2016 01:14 PM



  Subject: Re: Apply ML to grouped dataframe



  Hi Wen,

  AFAIK Spark MLlib implements its machine learning algorithms on top of
  Spark dataframe API. What did you mean by a grouped dataframe?

  On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
Hi Nirmal

I didn't get your point.
Can you tell me more about how to use MLlib to grouped dataframe?

Regards.
Wenpei.

Inactive hide details for Nirmal Fernando ---08/23/2016 10:26:36
AM---You can use Spark MLlib http://spark.apache.org/docs/late
Nirmal Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib

http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas


From: Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
    Date: 08/23/2016 10:26 AM
Subject: Re: Apply ML to grouped dataframe




You can use Spark MLlib

http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api


On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <yuw...@cn.ibm.com>
wrote:
Hi

We have a dataframe, then want group it and apply a ML
algorithm or statistics(say t test) to each one. Is
there any efficient way for this situation?

Currently, we transfer to pyspark, use groupbykey and
apply numpy function to array. But this wasn't an
efficient way, right?

Regards.
Wenpei.




--

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/






  --

  Thanks & regards,
  Nirmal

  Team Lead - WSO2 Machine Learner
  Associate Technical Lead - Data Technologies Team, WSO2 Inc.
  Mobile: +94715779733
  Blog: http://nirmalfdo.blogspot.com/








--

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/





Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:

> We can group a dataframe by one column like
>
> df.groupBy(df.col("gender"))
>

On top of this DF, use a filter that would enable you to extract the
grouped DF as separated DFs. Then you can apply ML on top of each DF.

eg: xyzDF.filter(col("x").equalTo(x))

>
> It like split a dataframe to multiple dataframe. Currently, we can only
> apply simple sql function to this GroupedData like agg, max etc.
>
> What we want is apply one ML algorithm to each group.
>
> Regards.
>
> [image: Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48
> PM---Hi Wen, AFAIK Spark MLlib implements its machine learning]Nirmal
> Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements
> its machine learning algorithms on top of
>
> From: Nirmal Fernando <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 01:14 PM
>
> Subject: Re: Apply ML to grouped dataframe
> --
>
>
>
> Hi Wen,
>
> AFAIK Spark MLlib implements its machine learning algorithms on top of
> Spark dataframe API. What did you mean by a grouped dataframe?
>
> On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
> <yuw...@cn.ibm.com>> wrote:
>
>Hi Nirmal
>
>I didn't get your point.
>Can you tell me more about how to use MLlib to grouped dataframe?
>
>Regards.
>Wenpei.
>
>[image: Inactive hide details for Nirmal Fernando ---08/23/2016
>10:26:36 AM---You can use Spark MLlib 
> http://spark.apache.org/docs/late]Nirmal
>Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib
>
> *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas*
>
> <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas>
>
>    From: Nirmal Fernando <*nir...@wso2.com* <nir...@wso2.com>>
>To: Wen Pei Yu/China/IBM@IBMCN
>Cc: User <*user@spark.apache.org* <user@spark.apache.org>>
>Date: 08/23/2016 10:26 AM
>Subject: Re: Apply ML to grouped dataframe
>--
>
>
>
>
>You can use Spark MLlib
>
> *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api*
>
> <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api>
>
>On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
><yuw...@cn.ibm.com>> wrote:
>   Hi
>
>  We have a dataframe, then want group it and apply a ML algorithm
>  or statistics(say t test) to each one. Is there any efficient way 
> for this
>  situation?
>
>  Currently, we transfer to pyspark, use groupbykey and apply
>  numpy function to array. But this wasn't an efficient way, right?
>
>  Regards.
>  Wenpei.
>
>
>
>
>--
>
>Thanks & regards,
>Nirmal
>
>Team Lead - WSO2 Machine Learner
>Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>Mobile: *+94715779733* <%2B94715779733>
>Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/>
>
>
>
>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/>
>
>
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/


Re: Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu

We can group a dataframe by one column like

df.groupBy(df.col("gender"))

It like split a dataframe to multiple dataframe. Currently, we can only
apply simple sql function to this GroupedData like agg, max etc.

What we want is apply one ML algorithm to each group.

Regards.



From:   Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
Date:   08/23/2016 01:14 PM
Subject:    Re: Apply ML to grouped dataframe



Hi Wen,

AFAIK Spark MLlib implements its machine learning algorithms on top of
Spark dataframe API. What did you mean by a grouped dataframe?

On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
  Hi Nirmal

  I didn't get your point.
  Can you tell me more about how to use MLlib to grouped dataframe?

  Regards.
  Wenpei.

  Inactive hide details for Nirmal Fernando ---08/23/2016 10:26:36 AM---You
  can use Spark MLlib http://spark.apache.org/docs/lateNirmal Fernando
  ---08/23/2016 10:26:36 AM---You can use Spark MLlib
  http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas


  From: Nirmal Fernando <nir...@wso2.com>
  To: Wen Pei Yu/China/IBM@IBMCN
  Cc: User <user@spark.apache.org>
  Date: 08/23/2016 10:26 AM
  Subject: Re: Apply ML to grouped dataframe




  You can use Spark MLlib
  
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api


  On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
Hi

We have a dataframe, then want group it and apply a ML algorithm or
statistics(say t test) to each one. Is there any efficient way for
this situation?

Currently, we transfer to pyspark, use groupbykey and apply numpy
function to array. But this wasn't an efficient way, right?

Regards.
Wenpei.




  --

  Thanks & regards,
  Nirmal

  Team Lead - WSO2 Machine Learner
  Associate Technical Lead - Data Technologies Team, WSO2 Inc.
  Mobile: +94715779733
  Blog: http://nirmalfdo.blogspot.com/








--

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/




Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
Hi Wen,

AFAIK Spark MLlib implements its machine learning algorithms on top of
Spark dataframe API. What did you mean by a grouped dataframe?

On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:

> Hi Nirmal
>
> I didn't get your point.
> Can you tell me more about how to use MLlib to grouped dataframe?
>
> Regards.
> Wenpei.
>
> [image: Inactive hide details for Nirmal Fernando ---08/23/2016 10:26:36
> AM---You can use Spark MLlib http://spark.apache.org/docs/late]Nirmal
> Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib
> http://spark.apache.org/docs/latest/ml-guide.html#
> announcement-dataframe-bas
>
> From: Nirmal Fernando <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 10:26 AM
> Subject: Re: Apply ML to grouped dataframe
> --
>
>
>
> You can use Spark MLlib
> *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api*
> <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api>
>
> On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <*yuw...@cn.ibm.com*
> <yuw...@cn.ibm.com>> wrote:
>
>Hi
>
>We have a dataframe, then want group it and apply a ML algorithm or
>statistics(say t test) to each one. Is there any efficient way for this
>situation?
>
>Currently, we transfer to pyspark, use groupbykey and apply numpy
>function to array. But this wasn't an efficient way, right?
>
>Regards.
>Wenpei.
>
>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/>
>
>
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/


Re: Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu

Hi Nirmal

I didn't get your point.
Can you tell me more about how to use MLlib to grouped dataframe?

Regards.
Wenpei.



From:   Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
Date:   08/23/2016 10:26 AM
Subject:    Re: Apply ML to grouped dataframe



You can use Spark MLlib
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api

On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
  Hi

  We have a dataframe, then want group it and apply a ML algorithm or
  statistics(say t test) to each one. Is there any efficient way for this
  situation?

  Currently, we transfer to pyspark, use groupbykey and apply numpy
  function to array. But this wasn't an efficient way, right?

  Regards.
  Wenpei.





--

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/




Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
You can use Spark MLlib
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api

On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu  wrote:

> Hi
>
> We have a dataframe, then want group it and apply a ML algorithm or
> statistics(say t test) to each one. Is there any efficient way for this
> situation?
>
> Currently, we transfer to pyspark, use groupbykey and apply numpy function
> to array. But this wasn't an efficient way, right?
>
> Regards.
> Wenpei.
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/


Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu

Hi

We have a dataframe, then want group it and apply a ML algorithm or
statistics(say t test) to each one. Is there any efficient way for this
situation?

Currently, we transfer to pyspark, use groupbykey and apply numpy function
to array. But this wasn't an efficient way, right?

Regards.
Wenpei.