[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153587#comment-15153587 ] Apache Spark commented on SPARK-12746: -- User 'Earthson' has created a pull request for this issue: https://github.com/apache/spark/pull/11237 > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu >Assignee: Earthson Lu > Fix For: 2.0.0 > > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122870#comment-15122870 ] Earthson Lu commented on SPARK-12746: - Hi Joseph, what is the status of nullability now? It seems someone has already add multi DataType check, I've merged upstream to use their implementation. I was just wondering you could accept this PR? > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122871#comment-15122871 ] Earthson Lu commented on SPARK-12746: - Hi Joseph, what is the status of nullability now? It seems someone has already add multi DataType check, I've merged upstream to use their implementation. I was just wondering you could accept this PR? > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097724#comment-15097724 ] Earthson Lu commented on SPARK-12746: - ok, i see:) If there's no nullability in ML, how could we implement a Transformer to fill missing values(always represented as NULL). I think we need support nullability for Preprocessing, so we can get clean data for further operation. I can't imagine the situation that we can do nothing when the data contains NULL. - - - I think the type checking API is independent with nullability in ML. It is a common case that one transformer accept both BooleanType or IntType. Maybe, it is a good idea that test condition and assertions are implemented separately. > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097404#comment-15097404 ] Joseph K. Bradley commented on SPARK-12746: --- I may take a bit to think about this. I think the deeper question is whether and how we should support nullability in ML, and there may need to be a little design discussion around that. I'll try to get back soon. Btw, please don't set the shepherd field. Committers use it to indicate that they have the time and intent to merge a PR for a particular release. > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096120#comment-15096120 ] Earthson Lu commented on SPARK-12746: - I was just wandering if you could do a review:) On Tue, Jan 12, 2016 at 10:14 AM, Apache Spark (JIRA) -- ~ Perfection is achieved not when there is nothing more to add but when there is nothing left to take away > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093155#comment-15093155 ] Apache Spark commented on SPARK-12746: -- User 'Earthson' has created a pull request for this issue: https://github.com/apache/spark/pull/10697 > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, true), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091487#comment-15091487 ] Earthson Lu commented on SPARK-12746: - I could work on this:) I have some idea: 1. we could implement a more powerful type check api 2. check manually for all the case I will choose the latter > ArrayType(_, true) should also accept ArrayType(_, false) > - > > Key: SPARK-12746 > URL: https://issues.apache.org/jira/browse/SPARK-12746 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 1.6.0 >Reporter: Earthson Lu > > I see CountVectorizer has schema check for ArrayType which has > ArrayType(StringType, true). > ArrayType(String, false) is just a special case of ArrayType(String, false), > but it will not pass this type check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org