[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r00750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { --- End diff -- not sure. maybe just `findCompatibleTypeForCSV` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r00361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { --- End diff -- `compatibleType` is also fine if it is consistent with `JsonInferSchema`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222199535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { --- End diff -- @HyukjinKwon Did you have any preference or suggestion on the name of the val ? findCommonTypeExtended ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { --- End diff -- @viirya i kept the same name used in JsonInferSchema. Change that as well ? Or only change this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198591 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. --- End diff -- @viirya Yeah.. we should keep.. sorry.. got dropped inadvertently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { --- End diff -- nit: `findCompatibleType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. --- End diff -- Some comments here are ignored in the change. Shall we keep them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222197363 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { --- End diff -- BTW, let's keep the comments in the original place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222197242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { --- End diff -- Can we leave this out as a private val like the previous and leave a comment that this pattern matching is CSV specific? That will reduce the diff and makes the review easier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222195632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { +case (StringType, t2) => Some(StringType) +case (t1, StringType) => Some(StringType) + +case (t1: IntegralType, t2: DecimalType) => + compatibleType(DecimalType.forType(t1), t2) +case (t1: DecimalType, t2: IntegralType) => + compatibleType(t1, DecimalType.forType(t2)) + +case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => + Some(DoubleType) + +case (t1: DecimalType, t2: DecimalType) => + val scale = math.max(t1.scale, t2.scale) + val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) + if (range + scale > 38) { +// DecimalType can't support precision > 38 +Some(DoubleType) + } else { +Some(DecimalType(range + scale, scale)) + } +case (_, _) => None --- End diff -- @HyukjinKwon Thanks. Will change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222194957 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema { StringType } - private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence - /** - * Copied from internal Spark api - * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]] + * Returns the common data type given two input data types so that the return type + * is compatible with both input data types. */ - val findTightestCommonType: (DataType, DataType) => Option[DataType] = { -case (t1, t2) if t1 == t2 => Some(t1) -case (NullType, t1) => Some(t1) -case (t1, NullType) => Some(t1) -case (StringType, t2) => Some(StringType) -case (t1, StringType) => Some(StringType) - -// Promote numeric types to the highest of the two and all numeric types to unlimited decimal -case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) => - val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2) - Some(numericPrecedence(index)) - -// These two cases below deal with when `DecimalType` is larger than `IntegralType`. -case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) => - Some(t2) -case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => - Some(t1) - -// These two cases below deal with when `IntegralType` is larger than `DecimalType`. -case (t1: IntegralType, t2: DecimalType) => - findTightestCommonType(DecimalType.forType(t1), t2) -case (t1: DecimalType, t2: IntegralType) => - findTightestCommonType(t1, DecimalType.forType(t2)) - -// Double support larger range than fixed decimal, DecimalType.Maximum should be enough -// in most case, also have better precision. -case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => - Some(DoubleType) - -case (t1: DecimalType, t2: DecimalType) => - val scale = math.max(t1.scale, t2.scale) - val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) - if (range + scale > 38) { -// DecimalType can't support precision > 38 -Some(DoubleType) - } else { -Some(DecimalType(range + scale, scale)) + def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { +TypeCoercion.findTightestCommonType(t1, t2).orElse { + (t1, t2) match { +case (StringType, t2) => Some(StringType) +case (t1, StringType) => Some(StringType) + +case (t1: IntegralType, t2: DecimalType) => + compatibleType(DecimalType.forType(t1), t2) +case (t1: DecimalType, t2: IntegralType) => + compatibleType(t1, DecimalType.forType(t2)) + +case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => + Some(DoubleType) + +case (t1: DecimalType, t2: DecimalType) => + val scale = math.max(t1.scale, t2.scale) + val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale) + if (range + scale > 38) { +// DecimalType can't support precision > 38 +Some(DoubleType) + } else { +Some(DecimalType(range + scale, scale)) + } +case (_, _) => None --- End diff -- `case _ => None` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/22619 [SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema. ## What changes were proposed in this pull request? Current the CSV's infer schema code inlines `TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of the common type coercion code when applicable. This way we can take advantage of any improvement to the base method. Thanks to @MaxGekk for finding this while reviewing another PR. ## How was this patch tested? This is a minor refactor. Existing tests are used to verify the change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark csv_minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22619.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22619 commit d4e0bdb5a06f59ff1df3933cb43804e71cde8259 Author: Dilip Biswal Date: 2018-10-03T00:47:42Z Make use of TypeCoercion.findTightestCommonType while inferring CSV schema --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org