Re: MatrixUDT and VectorUDT in Spark ML

2018-05-31 Thread Li Jin
Please see https://issues.apache.org/jira/browse/SPARK-24258
On Wed, May 30, 2018 at 10:40 PM Dongjin Lee  wrote:

> How is this issue going? Is there any Jira ticket about this?
>
> Thanks,
> Dongjin
>
> On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <
> himanshu.mo...@aexp.com.invalid> wrote:
>
>> I agree
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>>
>> Himanshu
>>
>>
>>
>> *From:* Li Jin [mailto:ice.xell...@gmail.com]
>> *Sent:* Friday, March 23, 2018 8:24 PM
>> *To:* dev 
>> *Subject:* MatrixUDT and VectorUDT in Spark ML
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I came across these two types MatrixUDT and VectorUDF in Spark ML when
>> doing feature extraction and preprocessing with PySpark. However, when
>> trying to do some basic operations, such as vector multiplication and
>> matrix multiplication, I had to go down to Python UDF.
>>
>>
>>
>> It seems to be it would be very useful to have built-in operators on
>> these types just like first class Spark SQL types, e.g.,
>>
>>
>>
>> df.withColumn('v', df.matrix_column * df.vector_column)
>>
>>
>>
>> I wonder what are other people's thoughts on this?
>>
>>
>>
>> Li
>>
>> --
>> American Express made the following annotations
>> --
>>
>> "This message and any attachments are solely for the intended recipient
>> and may contain confidential or privileged information. If you are not the
>> intended recipient, any disclosure, copying, use, or distribution of the
>> information included in this message and any attachments is prohibited. If
>> you have received this communication in error, please notify us by reply
>> e-mail and immediately and permanently delete this message and any
>> attachments. Thank you."
>>
>> American Express a ajouté le commentaire suivant le
>> Ce courrier et toute pièce jointe qu'il contient sont réservés au seul
>> destinataire indiqué et peuvent renfermer des renseignements confidentiels
>> et privilégiés. Si vous n'êtes pas le destinataire prévu, toute
>> divulgation, duplication, utilisation ou distribution du courrier ou de
>> toute pièce jointe est interdite. Si vous avez reçu cette communication par
>> erreur, veuillez nous en aviser par courrier et détruire immédiatement le
>> courrier et les pièces jointes. Merci.
>> --
>>
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare: 
> www.slideshare.net/dongjinleekr
> *
>


Re: MatrixUDT and VectorUDT in Spark ML

2018-05-30 Thread Dongjin Lee
How is this issue going? Is there any Jira ticket about this?

Thanks,
Dongjin

On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <
himanshu.mo...@aexp.com.invalid> wrote:

> I agree
>
>
>
>
>
>
>
> Thanks
>
> Himanshu
>
>
>
> *From:* Li Jin [mailto:ice.xell...@gmail.com]
> *Sent:* Friday, March 23, 2018 8:24 PM
> *To:* dev 
> *Subject:* MatrixUDT and VectorUDT in Spark ML
>
>
>
> Hi All,
>
>
>
> I came across these two types MatrixUDT and VectorUDF in Spark ML when
> doing feature extraction and preprocessing with PySpark. However, when
> trying to do some basic operations, such as vector multiplication and
> matrix multiplication, I had to go down to Python UDF.
>
>
>
> It seems to be it would be very useful to have built-in operators on these
> types just like first class Spark SQL types, e.g.,
>
>
>
> df.withColumn('v', df.matrix_column * df.vector_column)
>
>
>
> I wonder what are other people's thoughts on this?
>
>
>
> Li
>
> --
> American Express made the following annotations
> --
>
> "This message and any attachments are solely for the intended recipient
> and may contain confidential or privileged information. If you are not the
> intended recipient, any disclosure, copying, use, or distribution of the
> information included in this message and any attachments is prohibited. If
> you have received this communication in error, please notify us by reply
> e-mail and immediately and permanently delete this message and any
> attachments. Thank you."
>
> American Express a ajouté le commentaire suivant le
> Ce courrier et toute pièce jointe qu'il contient sont réservés au seul
> destinataire indiqué et peuvent renfermer des renseignements confidentiels
> et privilégiés. Si vous n'êtes pas le destinataire prévu, toute
> divulgation, duplication, utilisation ou distribution du courrier ou de
> toute pièce jointe est interdite. Si vous avez reçu cette communication par
> erreur, veuillez nous en aviser par courrier et détruire immédiatement le
> courrier et les pièces jointes. Merci.
> --
>
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
slideshare:
www.slideshare.net/dongjinleekr
*


RE: MatrixUDT and VectorUDT in Spark ML

2018-03-23 Thread Himanshu Mohan
I agree



Thanks
Himanshu

From: Li Jin [mailto:ice.xell...@gmail.com]
Sent: Friday, March 23, 2018 8:24 PM
To: dev 
Subject: MatrixUDT and VectorUDT in Spark ML

Hi All,

I came across these two types MatrixUDT and VectorUDF in Spark ML when doing 
feature extraction and preprocessing with PySpark. However, when trying to do 
some basic operations, such as vector multiplication and matrix multiplication, 
I had to go down to Python UDF.

It seems to be it would be very useful to have built-in operators on these 
types just like first class Spark SQL types, e.g.,

df.withColumn('v', df.matrix_column * df.vector_column)

I wonder what are other people's thoughts on this?

Li


American Express made the following annotations
**
"This message and any attachments are solely for the intended recipient and may 
contain confidential or privileged information. If you are not the intended 
recipient, any disclosure, copying, use, or distribution of the information 
included in this message and any attachments is prohibited. If you have 
received this communication in error, please notify us by reply e-mail and 
immediately and permanently delete this message and any attachments. Thank you."

American Express a ajouté le commentaire suivant le Ce courrier et toute pièce 
jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent 
renfermer des 
renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire 
prévu, toute divulgation, duplication, utilisation ou distribution du courrier 
ou de toute pièce jointe est interdite. Si vous avez reçu cette communication 
par erreur, veuillez nous en aviser par courrier et détruire immédiatement le 
courrier et les pièces jointes. Merci.

**