[jira] [Commented] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2018-01-03 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310582#comment-16310582
 ] 

Joseph K. Bradley commented on SPARK-21926:
---

This will be mostly done in Spark 2.3, but there will be a few items left for 
Spark 2.4.  I'll retarget this.

> Compatibility between ML Transformers and Structured Streaming
> --
>
> Key: SPARK-21926
> URL: https://issues.apache.org/jira/browse/SPARK-21926
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>
> We've run into a few cases where ML components don't play nice with streaming 
> dataframes (for prediction). This ticket is meant to help aggregate these 
> known cases in one place and provide a place to discuss possible fixes.
> Failing cases:
> 1) VectorAssembler where one of the inputs is a VectorUDT column with no 
> metadata.
> Possible fixes:
> More details here SPARK-22346.
> 2) OneHotEncoder where the input is a column with no metadata.
> Possible fixes:
> a) Make OneHotEncoder an estimator (SPARK-13030).
> -b) Allow user to set the cardinality of OneHotEncoder.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-20 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212962#comment-16212962
 ] 

Joseph K. Bradley commented on SPARK-21926:
---

One more report from the dev list: HashingTF and IDFModel fail with Structured 
Streaming: 
http://apache-spark-developers-list.1001551.n3.nabble.com/HashingTFModel-IDFModel-in-Structured-Streaming-td22680.html

> Compatibility between ML Transformers and Structured Streaming
> --
>
> Key: SPARK-21926
> URL: https://issues.apache.org/jira/browse/SPARK-21926
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>
> We've run into a few cases where ML components don't play nice with streaming 
> dataframes (for prediction). This ticket is meant to help aggregate these 
> known cases in one place and provide a place to discuss possible fixes.
> Failing cases:
> 1) VectorAssembler where one of the inputs is a VectorUDT column with no 
> metadata.
> Possible fixes:
> a) Re-design vectorUDT metadata to support missing metadata for some 
> elements. (This might be a good thing to do anyways SPARK-19141)
> b) drop metadata in streaming context.
> 2) OneHotEncoder where the input is a column with no metadata.
> Possible fixes:
> a) Make OneHotEncoder an estimator (SPARK-13030).
> b) Allow user to set the cardinality of OneHotEncoder.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org