[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r00750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
--- End diff --

not sure. maybe just `findCompatibleTypeForCSV`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r00361
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --

`compatibleType` is also fine if it is consistent with `JsonInferSchema`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222199535
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
--- End diff --

@HyukjinKwon Did you have any preference or suggestion on the name of the 
val ? findCommonTypeExtended ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222198784
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --

@viirya i kept the same name used in JsonInferSchema. Change that as well ? 
Or only change this ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222198591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
--- End diff --

@viirya Yeah.. we should keep.. sorry.. got dropped inadvertently.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222198383
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --

nit: `findCompatibleType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222198257
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
--- End diff --

Some comments here are ignored in the change. Shall we keep them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222197363
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
--- End diff --

BTW, let's keep the comments in the original place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222197242
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
--- End diff --

Can we leave this out as a private val like the previous and leave a 
comment that this pattern matching is CSV specific? That will reduce the diff 
and makes the review easier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222195632
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
+case (StringType, t2) => Some(StringType)
+case (t1, StringType) => Some(StringType)
+
+case (t1: IntegralType, t2: DecimalType) =>
+  compatibleType(DecimalType.forType(t1), t2)
+case (t1: DecimalType, t2: IntegralType) =>
+  compatibleType(t1, DecimalType.forType(t2))
+
+case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
+  Some(DoubleType)
+
+case (t1: DecimalType, t2: DecimalType) =>
+  val scale = math.max(t1.scale, t2.scale)
+  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
+  if (range + scale > 38) {
+// DecimalType can't support precision > 38
+Some(DoubleType)
+  } else {
+Some(DecimalType(range + scale, scale))
+  }
+case (_, _) => None
--- End diff --

@HyukjinKwon Thanks. Will change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22619#discussion_r222194957
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
 StringType
   }
 
-  private val numericPrecedence: IndexedSeq[DataType] = 
TypeCoercion.numericPrecedence
-
   /**
-   * Copied from internal Spark api
-   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+   * Returns the common data type given two input data types so that the 
return type
+   * is compatible with both input data types.
*/
-  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
-case (t1, t2) if t1 == t2 => Some(t1)
-case (NullType, t1) => Some(t1)
-case (t1, NullType) => Some(t1)
-case (StringType, t2) => Some(StringType)
-case (t1, StringType) => Some(StringType)
-
-// Promote numeric types to the highest of the two and all numeric 
types to unlimited decimal
-case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
-  val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
-  Some(numericPrecedence(index))
-
-// These two cases below deal with when `DecimalType` is larger than 
`IntegralType`.
-case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
-  Some(t2)
-case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
-  Some(t1)
-
-// These two cases below deal with when `IntegralType` is larger than 
`DecimalType`.
-case (t1: IntegralType, t2: DecimalType) =>
-  findTightestCommonType(DecimalType.forType(t1), t2)
-case (t1: DecimalType, t2: IntegralType) =>
-  findTightestCommonType(t1, DecimalType.forType(t2))
-
-// Double support larger range than fixed decimal, DecimalType.Maximum 
should be enough
-// in most case, also have better precision.
-case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
-  Some(DoubleType)
-
-case (t1: DecimalType, t2: DecimalType) =>
-  val scale = math.max(t1.scale, t2.scale)
-  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
-  if (range + scale > 38) {
-// DecimalType can't support precision > 38
-Some(DoubleType)
-  } else {
-Some(DecimalType(range + scale, scale))
+  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+TypeCoercion.findTightestCommonType(t1, t2).orElse {
+  (t1, t2) match {
+case (StringType, t2) => Some(StringType)
+case (t1, StringType) => Some(StringType)
+
+case (t1: IntegralType, t2: DecimalType) =>
+  compatibleType(DecimalType.forType(t1), t2)
+case (t1: DecimalType, t2: IntegralType) =>
+  compatibleType(t1, DecimalType.forType(t2))
+
+case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
+  Some(DoubleType)
+
+case (t1: DecimalType, t2: DecimalType) =>
+  val scale = math.max(t1.scale, t2.scale)
+  val range = math.max(t1.precision - t1.scale, t2.precision - 
t2.scale)
+  if (range + scale > 38) {
+// DecimalType can't support precision > 38
+Some(DoubleType)
+  } else {
+Some(DecimalType(range + scale, scale))
+  }
+case (_, _) => None
--- End diff --

`case _ => None`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-02 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/22619

[SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while 
inferring CSV schema.

## What changes were proposed in this pull request?
Current the CSV's infer schema code inlines 
`TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of 
the common type coercion code when applicable.  This way we can take advantage 
of any improvement to the base method.

Thanks to @MaxGekk for finding this while reviewing another PR.

## How was this patch tested?
This is a minor refactor.  Existing tests are used to verify the change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark csv_minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22619


commit d4e0bdb5a06f59ff1df3933cb43804e71cde8259
Author: Dilip Biswal 
Date:   2018-10-03T00:47:42Z

Make use of TypeCoercion.findTightestCommonType while inferring CSV schema




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org