[2.0,16.0]|
|12462589343|3| [1.0,1.0]|
+---+-++
From: ayan guha <guha.a...@gmail.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: user <user@spark.apache.org>, Nirmal Fernando <nir...@wso2.com>
Date: 08/23/2016 05:13 PM
Subject: Re: A
do <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 01:55 PM
> Subject: Re: Apply ML to grouped dataframe
> --
>
>
>
>
>
> On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <*yuw...
User <user@spark.apache.org>
Date: 08/23/2016 01:55 PM
Subject: Re: Apply ML to grouped dataframe
On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
We can group a dataframe by one column like
df.groupBy(df.col("gender"))
On top of this
t; From: Nirmal Fernando <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 01:14 PM
>
> Subject: Re: Apply ML to grouped dataframe
> --
>
>
>
> Hi Wen,
>
> AFAIK
: Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
Date: 08/23/2016 01:14 PM
Subject: Re: Apply ML to grouped dataframe
Hi Wen,
AFAIK Spark MLlib implements its machine learning algorithms on top of
Spark dataframe
b
> http://spark.apache.org/docs/latest/ml-guide.html#
> announcement-dataframe-bas
>
> From: Nirmal Fernando <nir...@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 10:26 AM
> Subject: Re: Apply ML to grouped
Hi Nirmal
I didn't get your point.
Can you tell me more about how to use MLlib to grouped dataframe?
Regards.
Wenpei.
From: Nirmal Fernando <nir...@wso2.com>
To: Wen Pei Yu/China/IBM@IBMCN
Cc: User <user@spark.apache.org>
Date: 08/23/2016 10:26 AM
Subject:
You can use Spark MLlib
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api
On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu wrote:
> Hi
>
> We have a dataframe, then want group it and apply a ML algorithm or
> statistics(say t test)
Hi
We have a dataframe, then want group it and apply a ML algorithm or
statistics(say t test) to each one. Is there any efficient way for this
situation?
Currently, we transfer to pyspark, use groupbykey and apply numpy function
to array. But this wasn't an efficient way, right?
Regards.