[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-02-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153587#comment-15153587
 ] 

Apache Spark commented on SPARK-12746:
--

User 'Earthson' has created a pull request for this issue:
https://github.com/apache/spark/pull/11237

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>Assignee: Earthson Lu
> Fix For: 2.0.0
>
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122870#comment-15122870
 ] 

Earthson Lu commented on SPARK-12746:
-

Hi Joseph, what is the status of nullability now?

It seems someone has already add multi DataType check, I've merged upstream to 
use their implementation.

I was just wondering you could accept this PR?

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122871#comment-15122871
 ] 

Earthson Lu commented on SPARK-12746:
-

Hi Joseph, what is the status of nullability now?

It seems someone has already add multi DataType check, I've merged upstream to 
use their implementation.

I was just wondering you could accept this PR?

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097724#comment-15097724
 ] 

Earthson Lu commented on SPARK-12746:
-

ok, i see:)

If there's no nullability in ML, how could we implement a Transformer to fill 
missing values(always represented as NULL). I think we need support nullability 
for Preprocessing, so we can get clean data for further operation. I can't 
imagine the situation that we can do nothing when the data contains NULL.

- - -

I think the type checking API is independent with nullability in ML. It is a 
common case that one transformer accept both BooleanType or IntType. Maybe, it 
is a good idea that test condition and assertions are implemented separately.

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097404#comment-15097404
 ] 

Joseph K. Bradley commented on SPARK-12746:
---

I may take a bit to think about this.  I think the deeper question is whether 
and how we should support nullability in ML, and there may need to be a little 
design discussion around that.  I'll try to get back soon.

Btw, please don't set the shepherd field.  Committers use it to indicate that 
they have the time and intent to merge a PR for a particular release.

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096120#comment-15096120
 ] 

Earthson Lu commented on SPARK-12746:
-

I was just wandering if you could do a review:)

On Tue, Jan 12, 2016 at 10:14 AM, Apache Spark (JIRA) 




-- 

~
Perfection is achieved
not when there is nothing more to add
 but when there is nothing left to take away


> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093155#comment-15093155
 ] 

Apache Spark commented on SPARK-12746:
--

User 'Earthson' has created a pull request for this issue:
https://github.com/apache/spark/pull/10697

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091487#comment-15091487
 ] 

Earthson Lu commented on SPARK-12746:
-

I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the case

I will choose the latter

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, false), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org