date:20190630

[GitHub] [spark] gengliangwang commented on issue #24849: [SPARK-28018][SQL] Allow upcasting decimal to double/float

2019-06-30 Thread GitBox

gengliangwang commented on issue #24849: [SPARK-28018][SQL] Allow upcasting 
decimal to double/float
URL: https://github.com/apache/spark/pull/24849#issuecomment-507140862
 
 
   > - Add a decimal type for SQL literals that can be cast to float because 
the intended type of the literal is not known, or use some analysis rule that 
matches literals for the same purpose 
   > - Parse literals as floats and insert an implicit cast from float to 
decimal
   
   Sorry I meant there is not a conclusion in the sync. I came up with the 
proposals in the sync, but I don't think they are good enough.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei edited a comment on issue #24992:  [SPARK-28194][SQL] Judge whether to 
reorder joinKeys to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507135803
 
 
   @maropu 
   It is difficult to judge Set(currentOrderOfKeys) == Set(expectedOrderOfKeys).
   For example:
   ```
   Seq(1, 2, 2, 2) 
   Seq(1, 1, 2, 2)
   Seq(1, 1, 1, 2) 
   ```
   So I add a judgement in `reorder`, when we can not reorder the join keys, 
return the (leftKeys, rightKeys) directly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei edited a comment on issue #24992:  [SPARK-28194][SQL] Judge whether to 
reorder joinKeys to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507135803
 
 
   @maropu 
   It is difficult to judge Set(currentOrderOfKeys) == Set(expectedOrderOfKeys).
   For example:
   ```
   Seq(1, 2, 2, 2) 
   Seq(1, 1, 2, 2)
   Seq(1, 1, 1, 2) 
   ```
   So I add a judgement in `reorder`, when we can not reorder the keys, return 
the (leftKeys, rightKeys) directly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei edited a comment on issue #24992:  [SPARK-28194][SQL] Judge whether to 
reorder joinKeys to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507135803
 
 
   @maropu 
   It is difficult to judge Set(currentOrderOfKeys) == Set(expectedOrderOfKeys).
   For example:
   ```
   Seq(1, 2, 2, 2) 
   Seq(1, 1, 2, 2)
   Seq(1, 1, 1, 2) 
   ```
   So I add a judgement in `reorder`, when we can reorder the keys, return the 
(leftKeys, rightKeys) directly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on issue #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on issue #24992:  [SPARK-28194][SQL] Judge whether to 
reorder joinKeys to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507135803
 
 
   It is difficult to judge Set(currentOrderOfKeys) == Set(expectedOrderOfKeys).
   For example:
   ```
   Seq(1, 2, 2, 2) 
   Seq(1, 1, 2, 2)
   Seq(1, 1, 1, 2) 
   ```
   So I add a judgement in `reorder`, when we can reorder the keys, return the 
(leftKeys, rightKeys) directly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

SparkQA commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of 
system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134687
 
 
   **[Test build #107063 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107063/testReport)**
 for PR 24998 at commit 
[`4e8507c`](https://github.com/apache/spark/commit/4e8507c2b89854e2096ca98b2a0c03296620cbea).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24998: [SPARK-28202] [Core] [Test] 
Avoid noises of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134249
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12256/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24998: [SPARK-28202] [Core] [Test] 
Avoid noises of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134241
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid 
noises of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134241
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises 
of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134239
 
 
   I'm OK with the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid 
noises of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507134249
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12256/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises 
of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507133562
 
 
   Jenkins, test this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

SparkQA commented on issue #25016: [SPARK-28200][SQL] Decimal overflow handling 
in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#issuecomment-507132873
 
 
   **[Test build #107062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107062/testReport)**
 for PR 25016 at commit 
[`2513ea7`](https://github.com/apache/spark/commit/2513ea779767b4afbaa979a25ec402a7e2d9aa4c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25016: [SPARK-28200][SQL] Decimal 
overflow handling in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#issuecomment-507132340
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25016: [SPARK-28200][SQL] Decimal 
overflow handling in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#issuecomment-507132345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12255/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25016: [SPARK-28200][SQL] Decimal overflow 
handling in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#issuecomment-507132345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12255/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25016: [SPARK-28200][SQL] Decimal overflow 
handling in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#issuecomment-507132340
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on a change in pull request #24992:  [SPARK-28194][SQL] 
Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#discussion_r298889535
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -231,14 +231,15 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
+  keysAndIndexes.find { case (e, idx) =>
 // As we may have the same key used many times, we need to filter out 
its occurrence we
 // have already used.
 e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  }.map(_._2).map(index => {
 
 Review comment:
   We can add a judgement in reorder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

maropu commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan 
is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507130730
 
 
   sure.
   
   @Dooyoung-Hwang are you there?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

cloud-fan commented on issue #22347: [SPARK-25353][SQL] executeTake in 
SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507129518
 
 
   LGTM, @maropu can you take it over if @Dooyoung-Hwang is not active?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

SparkQA commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan 
is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507129391
 
 
   **[Test build #107061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107061/testReport)**
 for PR 22347 at commit 
[`8666272`](https://github.com/apache/spark/commit/86662722e53bfcae2c75e61d170c983abd599b3a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #22347: [SPARK-25353][SQL] 
executeTake in SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#discussion_r298891683
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
 ##
 @@ -348,30 +349,30 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 // Otherwise, interpolate the number of partitions we need to try, but 
overestimate
 // it by 50%. We also cap the estimation in the end.
 val limitScaleUpFactor = Math.max(sqlContext.conf.limitScaleUpFactor, 
2)
-if (buf.isEmpty) {
+if (scannedRowCount == 0) {
   numPartsToTry = partsScanned * limitScaleUpFactor
 } else {
-  val left = n - buf.size
+  val left = n - scannedRowCount
   // As left > 0, numPartsToTry is always >= 1
-  numPartsToTry = Math.ceil(1.5 * left * partsScanned / buf.size).toInt
+  numPartsToTry = Math.ceil(1.5 * left * partsScanned / 
scannedRowCount).toInt
   numPartsToTry = Math.min(numPartsToTry, partsScanned * 
limitScaleUpFactor)
 }
   }
 
   val p = partsScanned.until(math.min(partsScanned + numPartsToTry, 
totalParts).toInt)
   val sc = sqlContext.sparkContext
-  val res = sc.runJob(childRDD,
-(it: Iterator[Array[Byte]]) => if (it.hasNext) it.next() else 
Array.empty[Byte], p)
-
-  buf ++= res.flatMap(decodeUnsafeRows)
+  val res = sc.runJob(childRDD, (it: Iterator[(Long, Array[Byte])]) =>
+if (it.hasNext) it.next() else (0L, Array.empty[Byte]), p)
 
+  buf ++= res.map(_._2)
+  scannedRowCount += res.map(_._1).sum
   partsScanned += p.size
 }
 
-if (buf.size > n) {
-  buf.take(n).toArray
+if (scannedRowCount > n) {
+  buf.iterator.flatMap(decodeUnsafeRows).take(n).toArray
 } else {
-  buf.toArray
+  buf.flatMap(decodeUnsafeRows).toArray
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #22347: [SPARK-25353][SQL] 
executeTake in SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#discussion_r298891651
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
 ##
 @@ -348,30 +349,30 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 // Otherwise, interpolate the number of partitions we need to try, but 
overestimate
 // it by 50%. We also cap the estimation in the end.
 val limitScaleUpFactor = Math.max(sqlContext.conf.limitScaleUpFactor, 
2)
-if (buf.isEmpty) {
+if (scannedRowCount == 0) {
   numPartsToTry = partsScanned * limitScaleUpFactor
 } else {
-  val left = n - buf.size
+  val left = n - scannedRowCount
   // As left > 0, numPartsToTry is always >= 1
-  numPartsToTry = Math.ceil(1.5 * left * partsScanned / buf.size).toInt
+  numPartsToTry = Math.ceil(1.5 * left * partsScanned / 
scannedRowCount).toInt
   numPartsToTry = Math.min(numPartsToTry, partsScanned * 
limitScaleUpFactor)
 }
   }
 
   val p = partsScanned.until(math.min(partsScanned + numPartsToTry, 
totalParts).toInt)
   val sc = sqlContext.sparkContext
-  val res = sc.runJob(childRDD,
-(it: Iterator[Array[Byte]]) => if (it.hasNext) it.next() else 
Array.empty[Byte], p)
-
-  buf ++= res.flatMap(decodeUnsafeRows)
+  val res = sc.runJob(childRDD, (it: Iterator[(Long, Array[Byte])]) =>
+if (it.hasNext) it.next() else (0L, Array.empty[Byte]), p)
 
+  buf ++= res.map(_._2)
+  scannedRowCount += res.map(_._1).sum
   partsScanned += p.size
 }
 
-if (buf.size > n) {
-  buf.take(n).toArray
+if (scannedRowCount > n) {
+  buf.iterator.flatMap(decodeUnsafeRows).take(n).toArray
 
 Review comment:
   nit: since this is perf critical code path, I think we can optimize it 
further, since we know the length of the result array.
   ```
   val result = new Array[InternalRow](n)
   while (result.length < n) {
 // decode
   }
   result
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake 
in SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507128897
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12254/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #22347: [SPARK-25353][SQL] executeTake in 
SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507128897
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12254/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake 
in SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507128893
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #22347: [SPARK-25353][SQL] executeTake in 
SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507128893
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #22347: [SPARK-25353][SQL] executeTake 
in SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-502206062
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding.

2019-06-30 Thread GitBox

cloud-fan commented on issue #22347: [SPARK-25353][SQL] executeTake in 
SparkPlan is modified to avoid unnecessary decoding.
URL: https://github.com/apache/spark/pull/22347#issuecomment-507128498
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LiShuMing commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

LiShuMing commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises 
of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507128537
 
 
   In `core/test`  module, only this unittest failed. Maybe I need check other 
modules?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #25004: [SPARK-28205][SQL] useV1SourceList configuration should be for all data sources

2019-06-30 Thread GitBox

cloud-fan closed pull request #25004: [SPARK-28205][SQL] useV1SourceList 
configuration should be for all data sources
URL: https://github.com/apache/spark/pull/25004
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on a change in pull request #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-06-30 Thread GitBox

jerryshao commented on a change in pull request #25007: 
[SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#discussion_r298889988
 
 

 ##
 File path: core/src/main/java/org/apache/spark/api/shuffle/ShuffleDataIO.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.shuffle;
 
 Review comment:
   Not sure if it is proper to add the interfaces to here `o.a.s.api`? Looks 
like most of the things under the api package are related to rdd functions. How 
about this package `o.a.s.shuffle.api`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25004: [SPARK-28205][SQL] useV1SourceList configuration should be for all data sources

2019-06-30 Thread GitBox

cloud-fan commented on issue #25004: [SPARK-28205][SQL] useV1SourceList 
configuration should be for all data sources
URL: https://github.com/apache/spark/pull/25004#issuecomment-507127467
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #24960: [SPARK-28156][SQL] Self-join should not miss cached view

2019-06-30 Thread GitBox

cloud-fan commented on issue #24960: [SPARK-28156][SQL] Self-join should not 
miss cached view
URL: https://github.com/apache/spark/pull/24960#issuecomment-507127363
 
 
   Another idea: shall we apply `AliasViewChild` right before `EliminateView`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] 
Refactor code to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#discussion_r298889535
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -231,14 +231,15 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
+  keysAndIndexes.find { case (e, idx) =>
 // As we may have the same key used many times, we need to filter out 
its occurrence we
 // have already used.
 e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  }.map(_._2).map(index => {
 
 Review comment:
   We can add a judgement in reorderJoinKeys.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-06-30 Thread GitBox

jerryshao commented on issue #24998: [SPARK-28202] [Core] [Test] Avoid noises 
of system props in SparkConfSuite
URL: https://github.com/apache/spark/pull/24998#issuecomment-507126499
 
 
   I would guess that other tests will also be failed possibly if you have 
local modified properties. Have you met this issue in other test suites?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] 
Refactor code to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#discussion_r298889300
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -231,14 +231,15 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
+  keysAndIndexes.find { case (e, idx) =>
 // As we may have the same key used many times, we need to filter out 
its occurrence we
 // have already used.
 e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  }.map(_._2).map(index => {
 
 Review comment:
   > If `Set(currentOrderOfKeys) == Set(expectedOrderOfKeys)`, I think we 
cannot hit the exception...
   
   yes,  you are right. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

maropu commented on a change in pull request #24992: [SPARK-28194][SQL] 
Refactor code to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#discussion_r29620
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -231,14 +231,15 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
+  keysAndIndexes.find { case (e, idx) =>
 // As we may have the same key used many times, we need to filter out 
its occurrence we
 // have already used.
 e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  }.map(_._2).map(index => {
 
 Review comment:
   If `Set(currentOrderOfKeys) == Set(expectedOrderOfKeys)`, I think we cannot 
hit the exception...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24738: [SPARK-23098][SQL] Migrate Kafka Batch source to v2.

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #24738: [SPARK-23098][SQL] 
Migrate Kafka Batch source to v2.
URL: https://github.com/apache/spark/pull/24738#discussion_r298887969
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala
 ##
 @@ -299,7 +299,7 @@ private[kafka010] class KafkaMicroBatchStream(
   if (content(0) == 'v') {
 val indexOfNewLine = content.indexOf("\n")
 if (indexOfNewLine > 0) {
-  val version = parseVersion(content.substring(0, indexOfNewLine), 
VERSION)
+  parseVersion(content.substring(0, indexOfNewLine), VERSION)
 
 Review comment:
   maybe we can rename it to `validateVersion`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic 
Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507124499
 
 
   **[Test build #107060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107060/testReport)**
 for PR 24983 at commit 
[`0860e4e`](https://github.com/apache/spark/commit/0860e4ebf424ff819609dd45db8587be7b5565ec).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on issue #25015: [SPARK-28217][SQL] Allow a pluggable statistics plan visitor for a logical plan.

2019-06-30 Thread GitBox

AngersZh commented on issue #25015: [SPARK-28217][SQL] Allow a pluggable 
statistics plan visitor for a logical plan.
URL: https://github.com/apache/spark/pull/25015#issuecomment-507124404
 
 
   [SPARK-27602] (https://issues.apache.org/jira/browse/SPARK-27602)
   
   I've thought about doing this.  You can make it more extensible.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on a change in pull request #24992: [SPARK-28194][SQL] 
Refactor code to prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#discussion_r298887634
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -231,14 +231,15 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
+  keysAndIndexes.find { case (e, idx) =>
 // As we may have the same key used many times, we need to filter out 
its occurrence we
 // have already used.
 e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  }.map(_._2).map(index => {
 
 Review comment:
   I'll check it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei removed a comment on issue #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei removed a comment on issue #24992: [SPARK-28194][SQL] Refactor code to 
prevent None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507123648
 
 
   Yes, it is a bug.
   
![image](https://user-images.githubusercontent.com/6757692/60413014-fde62a00-9c05-11e9-95b0-ee963cffda65.png)
   The keys of currentOrderOfKeys are not similar with those of 
expectedOrderOfKeys.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507124116
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507124120
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12253/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507124116
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507124120
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12253/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on issue #24992: [SPARK-28194][SQL] Refactor code to prevent None.get in EnsureRequirements

2019-06-30 Thread GitBox

turboFei commented on issue #24992: [SPARK-28194][SQL] Refactor code to prevent 
None.get in EnsureRequirements
URL: https://github.com/apache/spark/pull/24992#issuecomment-507123648
 
 
   Yes, it is a bug.
   
![image](https://user-images.githubusercontent.com/6757692/60413014-fde62a00-9c05-11e9-95b0-ee963cffda65.png)
   The keys of currentOrderOfKeys are not similar with those of 
expectedOrderOfKeys.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution

2019-06-30 Thread GitBox

cloud-fan commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle 
partition number in adaptive execution
URL: https://github.com/apache/spark/pull/24978#issuecomment-507123632
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

xianyinxin commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic 
Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507123598
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #24978: [SPARK-28177][SQL] 
Adjust post shuffle partition number in adaptive execution
URL: https://github.com/apache/spark/pull/24978#discussion_r298886464
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -36,107 +36,12 @@ import org.apache.spark.sql.internal.SQLConf
  * the input partition ordering requirements are met.
  */
 case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
-  private def defaultNumPreShufflePartitions: Int = conf.numShufflePartitions
-
-  private def targetPostShuffleInputSize: Long = 
conf.targetPostShuffleInputSize
-
-  private def adaptiveExecutionEnabled: Boolean = conf.adaptiveExecutionEnabled
-
-  private def minNumPostShufflePartitions: Option[Int] = {
-val minNumPostShufflePartitions = conf.minNumPostShufflePartitions
-if (minNumPostShufflePartitions > 0) Some(minNumPostShufflePartitions) 
else None
-  }
-
-  /**
-   * Adds [[ExchangeCoordinator]] to [[ShuffleExchangeExec]]s if adaptive 
query execution is enabled
-   * and partitioning schemes of these [[ShuffleExchangeExec]]s support 
[[ExchangeCoordinator]].
-   */
-  private def withExchangeCoordinator(
-  children: Seq[SparkPlan],
-  requiredChildDistributions: Seq[Distribution]): Seq[SparkPlan] = {
-val supportsCoordinator =
-  if (children.exists(_.isInstanceOf[ShuffleExchangeExec])) {
-// Right now, ExchangeCoordinator only support HashPartitionings.
-children.forall {
-  case e @ ShuffleExchangeExec(hash: HashPartitioning, _, _) => true
-  case child =>
-child.outputPartitioning match {
-  case hash: HashPartitioning => true
-  case collection: PartitioningCollection =>
-
collection.partitionings.forall(_.isInstanceOf[HashPartitioning])
-  case _ => false
-}
-}
-  } else {
-// In this case, although we do not have Exchange operators, we may 
still need to
-// shuffle data when we have more than one children because data 
generated by
-// these children may not be partitioned in the same way.
-// Please see the comment in withCoordinator for more details.
-val supportsDistribution = requiredChildDistributions.forall { dist =>
-  dist.isInstanceOf[ClusteredDistribution] || 
dist.isInstanceOf[HashClusteredDistribution]
-}
-children.length > 1 && supportsDistribution
-  }
-
-val withCoordinator =
-  if (adaptiveExecutionEnabled && supportsCoordinator) {
-val coordinator =
-  new ExchangeCoordinator(
-targetPostShuffleInputSize,
-minNumPostShufflePartitions)
-children.zip(requiredChildDistributions).map {
-  case (e: ShuffleExchangeExec, _) =>
-// This child is an Exchange, we need to add the coordinator.
-e.copy(coordinator = Some(coordinator))
-  case (child, distribution) =>
-// If this child is not an Exchange, we need to add an Exchange 
for now.
-// Ideally, we can try to avoid this Exchange. However, when we 
reach here,
-// there are at least two children operators (because if there is 
a single child
-// and we can avoid Exchange, supportsCoordinator will be false 
and we
-// will not reach here.). Although we can make two children have 
the same number of
-// post-shuffle partitions. Their numbers of pre-shuffle 
partitions may be different.
-// For example, let's say we have the following plan
-// Join
-// /  \
-//   Agg  Exchange
-//   /  \
-//Exchange  t2
-//  /
-// t1
-// In this case, because a post-shuffle partition can include 
multiple pre-shuffle
-// partitions, a HashPartitioning will not be strictly partitioned 
by the hashcodes
-// after shuffle. So, even we can use the child Exchange operator 
of the Join to
-// have a number of post-shuffle partitions that matches the 
number of partitions of
-// Agg, we cannot say these two children are partitioned in the 
same way.
-// Here is another case
-// Join
-// /  \
-//   Agg1  Agg2
-//   /  \
-//   Exchange1  Exchange2
-//   /   \
-//  t1   t2
-// In this case, two Aggs shuffle data with the same column of the 
join condition.
-// After we use ExchangeCoordinator, these two Aggs may not be 
partitioned in the same
-// way. Let's say that Agg1 and Agg2 both have 5 pre-shuffle 
partitions and 2
-// post-shuffle partitions. It is possible that Agg1 fetches t

[GitHub] [spark] AngersZhuuuu edited a comment on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.

2019-06-30 Thread GitBox

AngersZh edited a comment on issue #24909: [SPARK-28106][SQL] When Spark 
SQL use  "add jar" ,  before add to SparkContext, check  jar path exist first.
URL: https://github.com/apache/spark/pull/24909#issuecomment-507122951
 
 
   @gatorsmile @GregOwen 
   Could you review this again? Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.

2019-06-30 Thread GitBox

AngersZh commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use  
"add jar" ,  before add to SparkContext, check  jar path exist first.
URL: https://github.com/apache/spark/pull/24909#issuecomment-507122951
 
 
   @gatorsmile @GregOwen 
   Could you review this again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the 
transform natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507117221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the 
transform natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507117224
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107058/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507117224
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107058/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507117221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

SparkQA removed a comment on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507107652
 
 
   **[Test build #107058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107058/testReport)**
 for PR 24963 at commit 
[`bd813db`](https://github.com/apache/spark/commit/bd813dbd4e91437a2172c86c9e14f3941b3edd14).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

SparkQA commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507117059
 
 
   **[Test build #107058 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107058/testReport)**
 for PR 24963 at commit 
[`bd813db`](https://github.com/apache/spark/commit/bd813dbd4e91437a2172c86c9e14f3941b3edd14).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

SparkQA commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111962
 
 
   **[Test build #107059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107059/testReport)**
 for PR 25019 at commit 
[`9a32d8e`](https://github.com/apache/spark/commit/9a32d8e482c245d8fb1c7aeafd48ac4381829e10).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix 
CheckAnalysis not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111666
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12252/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix 
CheckAnalysis not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111665
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis 
not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111666
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12252/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis 
not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111665
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

maropu commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111650
 
 
   This issue only exists in INSERT command? Probably, you'd be better to make 
the title and description more concrete...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

maropu commented on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507111332
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25019: [SPARK-28195][SQL] Fix 
CheckAnalysis not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110740
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

HyukjinKwon closed pull request #25012: [SPARK-28215][SQL][R] as_tibble was 
removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25016: [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #25016: [SPARK-28200][SQL] 
Decimal overflow handling in ExpressionEncoder
URL: https://github.com/apache/spark/pull/25016#discussion_r298877783
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ##
 @@ -379,6 +380,78 @@ class ExpressionEncoderSuite extends 
CodegenInterpretedPlanTest with AnalysisTes
 assert(e.getMessage.contains("tuple with more than 22 elements are not 
supported"))
   }
 
+  // Scala / Java big decimals 
--
+
+  encodeDecodeTest(BigDecimal(("9" * 20) + "." + "9" * 18),
+"scala decimal within precision/scale limit")
+  encodeDecodeTest(new java.math.BigDecimal(("9" * 20) + "." + "9" * 18),
+"java decimal within precision/scale limit")
+
+  encodeDecodeTest(BigDecimal(("9" * 20) + "." + "9" * 18).unary_-,
 
 Review comment:
   shall we use `-BigDecimal...` instead of `.unary_-`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25019: [SPARK-28195]Fix CheckAnalysis 
not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110404
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110740
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110326
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25019: [SPARK-28195]Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110404
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25019: [SPARK-28195]Fix CheckAnalysis 
not working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019#issuecomment-507110326
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liupc opened a new pull request #25019: [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message

2019-06-30 Thread GitBox

liupc opened a new pull request #25019: [SPARK-28195]Fix CheckAnalysis not 
working for command and report misleading error message
URL: https://github.com/apache/spark/pull/25019
 
 
   
   ## What changes were proposed in this pull request?
   
   This PR will try to fix the issue that the sub plan of command is not being 
checked for analysis and sometimes will report misleading error message.
   An example sql is like 
   `insert overwrite directory '/path' using parquet select * from table1`
   
   When "table1" does not exists, we will finally got a misleading error 
message:
   ```
   Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
Invalid call to dataType on unresolved object, tree: 'kr.objective_id
   at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
   at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
   at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   at scala.collection.immutable.List.foreach(List.scala:381)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
   at scala.collection.immutable.List.map(List.scala:285)
   at 
org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159)
   at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159)
   at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544)
   at 
org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   at 
org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246)
   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
   at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277)
   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276)
   at org.apache.spark.sql.Dataset.(Dataset.scala:190)
   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277)
   ... 11 more
   ```
   
   ## How was this patch tested?
   
   exist UT(AnalysisSuite)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

HyukjinKwon commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was 
removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507110218
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itsvikramagr commented on a change in pull request #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-06-30 Thread GitBox

itsvikramagr commented on a change in pull request #24922: [SPARK-28120][SS]  
Rocksdb state storage implementation
URL: https://github.com/apache/spark/pull/24922#discussion_r298875972
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/WALUtils.scala
 ##
 @@ -0,0 +1,280 @@
+/*
 
 Review comment:
   Good point on self-review. 
   
   I have abstracted out a lot of code from HDFS state store to create 
WALUtils. I didn't make any change in HDFS state store provider to reduce the 
scope of this PR. I can either start a new PR for the refactoring or I can do 
it once the rest of the code is reviewed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the 
transform natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507108414
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12251/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the 
transform natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507108411
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507108414
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12251/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507108411
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-06-30 Thread GitBox

itsvikramagr commented on issue #24922: [SPARK-28120][SS]  Rocksdb state 
storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-507108250
 
 
   Thanks, @HeartSaVioR for the review. Let me work on your comments. Also, I 
am looking into generating performance numbers for various scenarios. Will soon 
get back with those as well. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion

2019-06-30 Thread GitBox

SparkQA commented on issue #24963: [SPARK-28159][ML] Make the transform 
natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#issuecomment-507107652
 
 
   **[Test build #107058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107058/testReport)**
 for PR 24963 at commit 
[`bd813db`](https://github.com/apache/spark/commit/bd813dbd4e91437a2172c86c9e14f3941b3edd14).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #25010: [SPARK-28201][SQL] Revisit MakeDecimal behavior on overflow

2019-06-30 Thread GitBox

cloud-fan closed pull request #25010: [SPARK-28201][SQL] Revisit MakeDecimal 
behavior on overflow
URL: https://github.com/apache/spark/pull/25010
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25010: [SPARK-28201][SQL] Revisit MakeDecimal behavior on overflow

2019-06-30 Thread GitBox

cloud-fan commented on issue #25010: [SPARK-28201][SQL] Revisit MakeDecimal 
behavior on overflow
URL: https://github.com/apache/spark/pull/25010#issuecomment-507106833
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25012: [SPARK-28215][SQL][R] 
as_tibble was removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507106133
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #25012: [SPARK-28215][SQL][R] 
as_tibble was removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507106139
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107056/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

SparkQA removed a comment on issue #25012: [SPARK-28215][SQL][R] as_tibble was 
removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507099969
 
 
   **[Test build #107056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107056/testReport)**
 for PR 25012 at commit 
[`bd89000`](https://github.com/apache/spark/commit/bd89000c1d08147b37957954edc09cf7f3ac469e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was 
removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507106133
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was 
removed from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507106139
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107056/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2019-06-30 Thread GitBox

SparkQA commented on issue #25012: [SPARK-28215][SQL][R] as_tibble was removed 
from Arrow R API
URL: https://github.com/apache/spark/pull/25012#issuecomment-507106090
 
 
   **[Test build #107056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107056/testReport)**
 for PR 25012 at commit 
[`bd89000`](https://github.com/apache/spark/commit/bd89000c1d08147b37957954edc09cf7f3ac469e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24795: [SPARK-27945][SQL] Minimal changes to support columnar processing

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #24795: [SPARK-27945][SQL] 
Minimal changes to support columnar processing
URL: https://github.com/apache/spark/pull/24795#discussion_r298873668
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -251,6 +286,371 @@ object MyExtensions {
 (_: Seq[Expression]) => Literal(5, IntegerType))
 }
 
+case class CloseableColumnBatchIterator(itr: Iterator[ColumnarBatch],
+f: ColumnarBatch => ColumnarBatch) extends Iterator[ColumnarBatch] {
+  var cb: ColumnarBatch = null
+
+  private def closeCurrentBatch(): Unit = {
+if (cb != null) {
+  cb.close
+  cb = null
+}
+  }
+
+  TaskContext.get().addTaskCompletionListener[Unit]((tc: TaskContext) => {
+closeCurrentBatch()
+  })
+
+  override def hasNext: Boolean = {
+closeCurrentBatch()
+itr.hasNext
+  }
+
+  override def next(): ColumnarBatch = {
+closeCurrentBatch()
+cb = f(itr.next())
+cb
+  }
+}
+
+object NoCloseColumnVector extends Logging {
+  def wrapIfNeeded(cv: ColumnVector): NoCloseColumnVector = cv match {
+case ref: NoCloseColumnVector =>
+  ref
+case vec => NoCloseColumnVector(vec)
+  }
+}
+
+/**
+ * Provide a ColumnVector so ColumnarExpression can close temporary values 
without
+ * having to guess what type it really is.
+ */
+case class NoCloseColumnVector(wrapped: ColumnVector) extends 
ColumnVector(wrapped.dataType) {
+  private var refCount = 1
+
+  /**
+   * Don't actually close the ColumnVector this wraps.  The producer of the 
vector will take
+   * care of that.
+   */
+  override def close(): Unit = {
+// Empty
+  }
+
+  override def hasNull: Boolean = wrapped.hasNull
+
+  override def numNulls(): Int = wrapped.numNulls
+
+  override def isNullAt(rowId: Int): Boolean = wrapped.isNullAt(rowId)
+
+  override def getBoolean(rowId: Int): Boolean = wrapped.getBoolean(rowId)
+
+  override def getByte(rowId: Int): Byte = wrapped.getByte(rowId)
+
+  override def getShort(rowId: Int): Short = wrapped.getShort(rowId)
+
+  override def getInt(rowId: Int): Int = wrapped.getInt(rowId)
+
+  override def getLong(rowId: Int): Long = wrapped.getLong(rowId)
+
+  override def getFloat(rowId: Int): Float = wrapped.getFloat(rowId)
+
+  override def getDouble(rowId: Int): Double = wrapped.getDouble(rowId)
+
+  override def getArray(rowId: Int): ColumnarArray = wrapped.getArray(rowId)
+
+  override def getMap(ordinal: Int): ColumnarMap = wrapped.getMap(ordinal)
+
+  override def getDecimal(rowId: Int, precision: Int, scale: Int): Decimal =
+wrapped.getDecimal(rowId, precision, scale)
+
+  override def getUTF8String(rowId: Int): UTF8String = 
wrapped.getUTF8String(rowId)
+
+  override def getBinary(rowId: Int): Array[Byte] = wrapped.getBinary(rowId)
+
+  override protected def getChild(ordinal: Int): ColumnVector = 
wrapped.getChild(ordinal)
+}
+
+trait ColumnarExpression extends Expression with Serializable {
+  /**
+   * Returns true if this expression supports columnar processing through 
[[columnarEval]].
+   */
+  def supportsColumnar: Boolean = true
+
+  /**
+   * Returns the result of evaluating this expression on the entire
+   * [[org.apache.spark.sql.vectorized.ColumnarBatch]]. The result of
+   * calling this may be a single 
[[org.apache.spark.sql.vectorized.ColumnVector]] or a scalar
+   * value. Scalar values typically happen if they are a part of the 
expression i.e. col("a") + 100.
 
 Review comment:
   I understand that this is in test not an API, but other people may look at 
this test to learn how to implement columnar operator, and I feel the current 
example is not that good.
   
   IIUC, the goal is:
   1. users can write a rule to replace an arbitrary SQL operator with a custom 
optimized columnar version
   2. Spark automatically insert column-to-row and row-to-column operators 
around the columnar operator.
   
   For 1, I think a pretty simple approach is, take in an expression tree, 
compile it to a columnar processor that can execute the expression tree in a 
columnar fashion. We don't need to create a `ColumnarExpression`, which seems 
over complicated to me as a column processor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24795: [SPARK-27945][SQL] Minimal changes to support columnar processing

2019-06-30 Thread GitBox

cloud-fan commented on a change in pull request #24795: [SPARK-27945][SQL] 
Minimal changes to support columnar processing
URL: https://github.com/apache/spark/pull/24795#discussion_r298873668
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -251,6 +286,371 @@ object MyExtensions {
 (_: Seq[Expression]) => Literal(5, IntegerType))
 }
 
+case class CloseableColumnBatchIterator(itr: Iterator[ColumnarBatch],
+f: ColumnarBatch => ColumnarBatch) extends Iterator[ColumnarBatch] {
+  var cb: ColumnarBatch = null
+
+  private def closeCurrentBatch(): Unit = {
+if (cb != null) {
+  cb.close
+  cb = null
+}
+  }
+
+  TaskContext.get().addTaskCompletionListener[Unit]((tc: TaskContext) => {
+closeCurrentBatch()
+  })
+
+  override def hasNext: Boolean = {
+closeCurrentBatch()
+itr.hasNext
+  }
+
+  override def next(): ColumnarBatch = {
+closeCurrentBatch()
+cb = f(itr.next())
+cb
+  }
+}
+
+object NoCloseColumnVector extends Logging {
+  def wrapIfNeeded(cv: ColumnVector): NoCloseColumnVector = cv match {
+case ref: NoCloseColumnVector =>
+  ref
+case vec => NoCloseColumnVector(vec)
+  }
+}
+
+/**
+ * Provide a ColumnVector so ColumnarExpression can close temporary values 
without
+ * having to guess what type it really is.
+ */
+case class NoCloseColumnVector(wrapped: ColumnVector) extends 
ColumnVector(wrapped.dataType) {
+  private var refCount = 1
+
+  /**
+   * Don't actually close the ColumnVector this wraps.  The producer of the 
vector will take
+   * care of that.
+   */
+  override def close(): Unit = {
+// Empty
+  }
+
+  override def hasNull: Boolean = wrapped.hasNull
+
+  override def numNulls(): Int = wrapped.numNulls
+
+  override def isNullAt(rowId: Int): Boolean = wrapped.isNullAt(rowId)
+
+  override def getBoolean(rowId: Int): Boolean = wrapped.getBoolean(rowId)
+
+  override def getByte(rowId: Int): Byte = wrapped.getByte(rowId)
+
+  override def getShort(rowId: Int): Short = wrapped.getShort(rowId)
+
+  override def getInt(rowId: Int): Int = wrapped.getInt(rowId)
+
+  override def getLong(rowId: Int): Long = wrapped.getLong(rowId)
+
+  override def getFloat(rowId: Int): Float = wrapped.getFloat(rowId)
+
+  override def getDouble(rowId: Int): Double = wrapped.getDouble(rowId)
+
+  override def getArray(rowId: Int): ColumnarArray = wrapped.getArray(rowId)
+
+  override def getMap(ordinal: Int): ColumnarMap = wrapped.getMap(ordinal)
+
+  override def getDecimal(rowId: Int, precision: Int, scale: Int): Decimal =
+wrapped.getDecimal(rowId, precision, scale)
+
+  override def getUTF8String(rowId: Int): UTF8String = 
wrapped.getUTF8String(rowId)
+
+  override def getBinary(rowId: Int): Array[Byte] = wrapped.getBinary(rowId)
+
+  override protected def getChild(ordinal: Int): ColumnVector = 
wrapped.getChild(ordinal)
+}
+
+trait ColumnarExpression extends Expression with Serializable {
+  /**
+   * Returns true if this expression supports columnar processing through 
[[columnarEval]].
+   */
+  def supportsColumnar: Boolean = true
+
+  /**
+   * Returns the result of evaluating this expression on the entire
+   * [[org.apache.spark.sql.vectorized.ColumnarBatch]]. The result of
+   * calling this may be a single 
[[org.apache.spark.sql.vectorized.ColumnVector]] or a scalar
+   * value. Scalar values typically happen if they are a part of the 
expression i.e. col("a") + 100.
 
 Review comment:
   I understand that this is in test not an API, but other people may look at 
this test to learn how to implement columnar operator, and I feel the current 
example is not that good.
   
   IIUC, the goal is:
   1. users can write a rule to replace an arbitrary SQL operator with a custom 
optimized columnar version
   2. Spark automatically insert column-to-row and row-to-column operators 
around the columnar operator.
   
   For 1, I think a pretty simple approach is, take in an expression tree, 
compile it to a columnar processor that can execute the expression tree in a 
columnar fashion. We don't need to create a `ColumnarExpression`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507101280
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107057/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #24946: [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs)

2019-06-30 Thread GitBox

HyukjinKwon commented on issue #24946: [SPARK-27234][SS][PYTHON] Use 
InheritableThreadLocal for current epoch in EpochTracker (to support Python 
UDFs)
URL: https://github.com/apache/spark/pull/24946#issuecomment-507101455
 
 
   gentle ping .. :-) ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-06-30 Thread GitBox

HyukjinKwon commented on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-507101426
 
 
   gentle ping .. :-) ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507101274
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

SparkQA removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507101082
 
 
   **[Test build #107057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107057/testReport)**
 for PR 24983 at commit 
[`4c3e21d`](https://github.com/apache/spark/commit/4c3e21d5c739bd52b1c46675fcb06ef670cee6a9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507101274
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-06-30 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-507101280
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107057/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 257 matches

Mail list logo