[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3150 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-64673009 [Test build #23899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23899/consoleFull) for PR 3150 at commit [`e935939`](https://github.com/apache/spark/commit/e935939ac829ecaa887df4bcbb6c65027876a210). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-64682672 [Test build #23899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23899/consoleFull) for PR 3150 at commit [`e935939`](https://github.com/apache/spark/commit/e935939ac829ecaa887df4bcbb6c65027876a210). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-64682678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23899/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-63158911 [Test build #23408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23408/consoleFull) for PR 3150 at commit [`ba14003`](https://github.com/apache/spark/commit/ba14003fedbc13db8b40b1712070ae1ed44972f8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-63160799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23408/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-63160796 [Test build #23408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23408/consoleFull) for PR 3150 at commit [`ba14003`](https://github.com/apache/spark/commit/ba14003fedbc13db8b40b1712070ae1ed44972f8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62251997 [Test build #23096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23096/consoleFull) for PR 3150 at commit [`8999868`](https://github.com/apache/spark/commit/89998684122af72482e8c1c2d22198dfc66aa4d4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62253734 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23096/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62253732 [Test build #23096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23096/consoleFull) for PR 3150 at commit [`8999868`](https://github.com/apache/spark/commit/89998684122af72482e8c1c2d22198dfc66aa4d4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62115143 It looks good to me in general, and I like the idea of summarizing the convertible data type checking, but in the meantime, I am a little afraid it might be error-prone for future maintenance or new data type added. Or can we remove the `resolve` method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001109 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case (BooleanType, DateType) = true case (DateType, _: NumericType) = true case (DateType, BooleanType) = true -case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null -case _= child.nullable +case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null +case _= false + } + + private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to + + private[this] def resolve(from: DataType, to: DataType): Boolean = { +(from, to) match { + case (from, to) if from == to = true + + case (NullType, _)= true + + case (_, StringType) = true + + case (StringType, BinaryType) = true + + case (StringType, BooleanType)= true + case (DateType, BooleanType) = true + case (TimestampType, BooleanType) = true + case (_: NumericType, BooleanType)= true + + case (StringType, TimestampType) = true + case (BooleanType, TimestampType) = true + case (DateType, TimestampType)= true + case (_: NumericType, TimestampType) = true + + case (_, DateType)= true + + case (StringType, _: NumericType) = true + case (BooleanType, _: NumericType)= true + case (DateType, _: NumericType) = true + case (TimestampType, _: NumericType) = true + case (_: NumericType, _: NumericType) = true + + case (ArrayType(from, fn), ArrayType(to, tn)) = +resolve(from, to) + resolvableNullability(fn || forceNullable(from, to), tn) + + case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) = +resolve(fromKey, toKey) + (!forceNullable(fromKey, toKey)) + resolve(fromValue, toValue) + resolvableNullability(fn || forceNullable(fromValue, toValue), tn) + + case (StructType(fromFields), StructType(toFields)) = +fromFields.size == toFields.size + fromFields.zip(toFields).forall { +case (fromField, toField) = + resolve(fromField.dataType, toField.dataType) +resolvableNullability( + fromField.nullable || forceNullable(fromField.dataType, toField.dataType), + toField.nullable) + } + + case _ = false --- End diff -- Hmm, I think the resolve check should be in logical plan analyzing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case (BooleanType, DateType) = true case (DateType, _: NumericType) = true case (DateType, BooleanType) = true -case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null -case _= child.nullable +case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null +case _= false + } + + private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to + + private[this] def resolve(from: DataType, to: DataType): Boolean = { +(from, to) match { + case (from, to) if from == to = true + + case (NullType, _)= true + + case (_, StringType) = true + + case (StringType, BinaryType) = true + + case (StringType, BooleanType)= true + case (DateType, BooleanType) = true + case (TimestampType, BooleanType) = true + case (_: NumericType, BooleanType)= true + + case (StringType, TimestampType) = true + case (BooleanType, TimestampType) = true + case (DateType, TimestampType)= true + case (_: NumericType, TimestampType) = true + + case (_, DateType)= true + + case (StringType, _: NumericType) = true + case (BooleanType, _: NumericType)= true + case (DateType, _: NumericType) = true + case (TimestampType, _: NumericType) = true + case (_: NumericType, _: NumericType) = true + + case (ArrayType(from, fn), ArrayType(to, tn)) = +resolve(from, to) + resolvableNullability(fn || forceNullable(from, to), tn) + + case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) = +resolve(fromKey, toKey) + (!forceNullable(fromKey, toKey)) + resolve(fromValue, toValue) + resolvableNullability(fn || forceNullable(fromValue, toValue), tn) + + case (StructType(fromFields), StructType(toFields)) = +fromFields.size == toFields.size + fromFields.zip(toFields).forall { +case (fromField, toField) = + resolve(fromField.dataType, toField.dataType) +resolvableNullability( + fromField.nullable || forceNullable(fromField.dataType, toField.dataType), + toField.nullable) + } + + case _ = false --- End diff -- Some expressions are checking the `resolved` in the `dataType` method, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -323,28 +371,53 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w buildCast[Date](_, d = dateToDouble(d)) case TimestampType = buildCast[Timestamp](_, t = timestampToDouble(t).toFloat) -case DecimalType() = - buildCast[Decimal](_, _.toFloat) case x: NumericType = b = x.numeric.asInstanceOf[Numeric[Any]].toFloat(b) } - private[this] lazy val cast: Any = Any = dataType match { + private[this] def castArray(from: ArrayType, to: ArrayType): Any = Any = { +val elementCast = cast(from.elementType, to.elementType) +buildCast[Seq[Any]](_, _.map(v = if (v == null) null else elementCast(v))) --- End diff -- I don't think we need to handle the case specially the same as other expressions. The element data of the type `ArrayType.containsNull == false` are never `null`, so always `elementCast(v)` will be called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62118596 @chenghao-intel, Thank you for your comments. If `resolve` method is removed, the nullability check (e.g. cast from `ArrayType(IntegerType, containsNull = true)` to `ArrayType(IntegerType, containsNull = false)` is apparently invalid) is also removed and it will cause unexpected errors. If there is a better way to ensure the nullability check, we can remove the method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62239484 [Test build #23081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23081/consoleFull) for PR 3150 at commit [`f677c30`](https://github.com/apache/spark/commit/f677c303115a0065589535e1053bd1e803aeb4fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62242628 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23081/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62242627 [Test build #23081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23081/consoleFull) for PR 3150 at commit [`f677c30`](https://github.com/apache/spark/commit/f677c303115a0065589535e1053bd1e803aeb4fc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/3150 [SPARK-4293][SQL] Make Cast be able to handle complex types. Inserting data of type including `ArrayType.containsNull == false` or `MapType.valueContainsNull == false` or `StructType.fields.exists(_.nullable == false)` into Hive table will fail because `Cast` inserted by `HiveMetastoreCatalog.PreInsertionCasts` rule of `Analyzer` can't handle these types correctly. Complex type cast rule proposal: - Cast for non-complex types should be able to cast the same as before. - Cast for `ArrayType` can evaluate if - Element type can cast - Nullability rule doesn't break - Cast for `MapType` can evaluate if - Key type can cast - Nullability for casted key type is `false` - Value type can cast - Nullability rule for value type doesn't break - Cast for `StructType` can evaluate if - The field size is the same - Each field can cast - Nullability rule for each field doesn't break - The nested structure should be the same. Nullability rule: - If the casted type is `nullable == true`, the target nullability should be `true` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-4293 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3150.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3150 commit 4f71bb8e4fa83f160f0f131c9b60bca911acef99 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-11-07T05:13:11Z Make Cast be able to handle complex types. commit 287f410329edf375c8d6142ea2400aa75537da5f Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-11-07T05:13:38Z Add tests to insert data of types ArrayType / MapType / StructType with nullability is false into Hive table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62103342 [Test build #23041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23041/consoleFull) for PR 3150 at commit [`287f410`](https://github.com/apache/spark/commit/287f410329edf375c8d6142ea2400aa75537da5f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62107846 [Test build #23041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23041/consoleFull) for PR 3150 at commit [`287f410`](https://github.com/apache/spark/commit/287f410329edf375c8d6142ea2400aa75537da5f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62107850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23041/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r19997693 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case (BooleanType, DateType) = true case (DateType, _: NumericType) = true case (DateType, BooleanType) = true -case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null -case _= child.nullable +case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null +case _= false + } + + private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to + + private[this] def resolve(from: DataType, to: DataType): Boolean = { +(from, to) match { + case (from, to) if from == to = true + + case (NullType, _)= true + + case (_, StringType) = true + + case (StringType, BinaryType) = true + + case (StringType, BooleanType)= true + case (DateType, BooleanType) = true + case (TimestampType, BooleanType) = true + case (_: NumericType, BooleanType)= true + + case (StringType, TimestampType) = true + case (BooleanType, TimestampType) = true + case (DateType, TimestampType)= true + case (_: NumericType, TimestampType) = true + + case (_, DateType)= true + + case (StringType, _: NumericType) = true + case (BooleanType, _: NumericType)= true + case (DateType, _: NumericType) = true + case (TimestampType, _: NumericType) = true + case (_: NumericType, _: NumericType) = true + + case (ArrayType(from, fn), ArrayType(to, tn)) = +resolve(from, to) + resolvableNullability(fn || forceNullable(from, to), tn) + + case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) = +resolve(fromKey, toKey) + (!forceNullable(fromKey, toKey)) + resolve(fromValue, toValue) + resolvableNullability(fn || forceNullable(fromValue, toValue), tn) + + case (StructType(fromFields), StructType(toFields)) = +fromFields.size == toFields.size + fromFields.zip(toFields).forall { +case (fromField, toField) = + resolve(fromField.dataType, toField.dataType) +resolvableNullability( + fromField.nullable || forceNullable(fromField.dataType, toField.dataType), + toField.nullable) + } + + case _ = false --- End diff -- I am wondering if throwing exception will be more informative, than plain `UnresolvedException` thrown in logical plan analyzing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r19997794 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -323,28 +371,53 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w buildCast[Date](_, d = dateToDouble(d)) case TimestampType = buildCast[Timestamp](_, t = timestampToDouble(t).toFloat) -case DecimalType() = - buildCast[Decimal](_, _.toFloat) case x: NumericType = b = x.numeric.asInstanceOf[Numeric[Any]].toFloat(b) } - private[this] lazy val cast: Any = Any = dataType match { + private[this] def castArray(from: ArrayType, to: ArrayType): Any = Any = { +val elementCast = cast(from.elementType, to.elementType) +buildCast[Seq[Any]](_, _.map(v = if (v == null) null else elementCast(v))) --- End diff -- Semantically, how do we handle the case where`ArrayType.nullable=false`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org