[GitHub] [spark] viirya commented on a change in pull request #24675: [SPARK-27803][SQL][PYTHON] Fix column pruning for Python UDF

2019-05-22 Thread GitBox
viirya commented on a change in pull request #24675: [SPARK-27803][SQL][PYTHON] 
Fix column pruning for Python UDF
URL: https://github.com/apache/spark/pull/24675#discussion_r286786053
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala
 ##
 @@ -38,3 +38,30 @@ case class FlatMapGroupsInPandas(
*/
   override val producedAttributes = AttributeSet(output)
 }
+
+trait BaseEvalPython extends UnaryNode {
 
 Review comment:
   Is `producedAttributes` missing from this? Previously, `BatchEvalPython` and 
`ArrowEvalPython` have it defined.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #24675: [SPARK-27803][SQL][PYTHON] Fix column pruning for Python UDF

2019-05-22 Thread GitBox
viirya commented on a change in pull request #24675: [SPARK-27803][SQL][PYTHON] 
Fix column pruning for Python UDF
URL: https://github.com/apache/spark/pull/24675#discussion_r286785463
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala
 ##
 @@ -38,3 +38,30 @@ case class FlatMapGroupsInPandas(
*/
   override val producedAttributes = AttributeSet(output)
 }
+
+trait BaseEvalPython extends UnaryNode {
+
+  def udfs: Seq[PythonUDF]
+
+  def resultAttrs: Seq[Attribute]
+
+  override def output: Seq[Attribute] = child.output ++ resultAttrs
+
+  override def references: AttributeSet = 
AttributeSet(udfs.flatMap(_.references))
 
 Review comment:
   If `references` only cover references in `udfs`, will some output attributes 
from child that aren't referred by `udfs` be pruned from `BaseEvalPython`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24680: [SPARK-26045][BUILD] Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages

2019-05-22 Thread GitBox
dongjoon-hyun edited a comment on issue #24680: [SPARK-26045][BUILD] Leave 
avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages
URL: https://github.com/apache/spark/pull/24680#issuecomment-495068392
 
 
   I'll leave this PR here since @vanzin 's review is requested. We need this 
in `master/2.4` branches.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24640: [SPARK-27770] [SQL] [TEST] Port AGGREGATES.sql [Part 1]

2019-05-22 Thread GitBox
dongjoon-hyun commented on issue #24640: [SPARK-27770] [SQL] [TEST] Port 
AGGREGATES.sql [Part 1]
URL: https://github.com/apache/spark/pull/24640#issuecomment-495074398
 
 
   Could you fix the UT failure?
   ```
   [info] - aggregates_part1.sql *** FAILED *** (3 seconds, 720 milliseconds)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495049729
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495073870
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24675: [SPARK-27803][SQL][PYTHON] Fix column pruning for Python UDF

2019-05-22 Thread GitBox
HyukjinKwon commented on issue #24675: [SPARK-27803][SQL][PYTHON] Fix column 
pruning for Python UDF
URL: https://github.com/apache/spark/pull/24675#issuecomment-495073865
 
 
   makes sense to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery

2019-05-22 Thread GitBox
cloud-fan commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated 
predicate subquery
URL: https://github.com/apache/spark/pull/24344#issuecomment-495070324
 
 
   I think @dilipbiswal has a good point here. For non-correlated EXISTS/IN, 
it's a bad idea to collect all the data of a table to the driver side and do 
the calculation. That said, we should not have a physical version of EXISTS/IN, 
they always need to be converted to join (sorry for the back and forth!).
   
   But we do have a chance to optimize non-correlated EXISTS/IN. More 
generally, if a left semi/anti join has a condition that only refers to 
attributes from the right side, we can probably turn this join into a filter 
operator. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add 
FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495068453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105709/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add 
FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495068451
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24680: [SPARK-26045][BUILD] Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages

2019-05-22 Thread GitBox
dongjoon-hyun commented on issue #24680: [SPARK-26045][BUILD] Leave avro, 
avro-ipc dependendencies as compile scope even for hadoop-provided usages
URL: https://github.com/apache/spark/pull/24680#issuecomment-495068392
 
 
   I'll leave this PR here since @vanzin 's review is requested. We need this 
in `master/2.4` branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog 
API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495068451
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog 
API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495068453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105709/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
SparkQA removed a comment on issue #24559: [SPARK-27658][SQL] Add 
FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039716
 
 
   **[Test build #105709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105709/testReport)**
 for PR 24559 at commit 
[`21a5f07`](https://github.com/apache/spark/commit/21a5f074e3b564a353da28901c8d6cb107ec04c2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
SparkQA commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495068179
 
 
   **[Test build #105709 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105709/testReport)**
 for PR 24559 at commit 
[`21a5f07`](https://github.com/apache/spark/commit/21a5f074e3b564a353da28901c8d6cb107ec04c2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495067333
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105708/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495067331
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495067331
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495067333
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105708/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
SparkQA commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495067035
 
 
   **[Test build #105708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105708/testReport)**
 for PR 24617 at commit 
[`47d89d3`](https://github.com/apache/spark/commit/47d89d37a196e75173996adc6feb475a5c8ce87b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
SparkQA removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038346
 
 
   **[Test build #105708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105708/testReport)**
 for PR 24617 at commit 
[`47d89d3`](https://github.com/apache/spark/commit/47d89d37a196e75173996adc6feb475a5c8ce87b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24671: 
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and 
spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495066701
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105710/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24671: 
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and 
spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495066698
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495066698
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495066701
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105710/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve 
docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495045176
 
 
   **[Test build #105710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105710/testReport)**
 for PR 24671 at commit 
[`3f79e89`](https://github.com/apache/spark/commit/3f79e89e00f920af959a6b979e736af5a43a93c7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495066402
 
 
   **[Test build #105710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105710/testReport)**
 for PR 24671 at commit 
[`3f79e89`](https://github.com/apache/spark/commit/3f79e89e00f920af959a6b979e736af5a43a93c7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
viirya commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286774379
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
 
 Review comment:
   Have we documented `avroSchema` about this the behavior?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
viirya commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286774235
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   Sounds good to have warning messages for the case. So can let users know 
they're actually writing from nullable catalyst schema into non-nullable avro 
schema.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve

2019-05-22 Thread GitBox
zhengruifeng edited a comment on issue #24648: [SPARK-2][ML] Eliminate 
uncessary sliding job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-495060029
 
 
   @srowen  Oh, not a pass. My expression was not correct.
   Sliding need a separate job to collect head rows on each partitions, which 
can be eliminated.
   When the number of points is small, e.g. 1000, the difference is tiny. 
   As shown in the first fig, only 0.8 sec is saved.
   
![图片](https://user-images.githubusercontent.com/7322292/58225023-ac1eca00-7d52-11e9-997e-76821b2594fd.png)
   
   
   
   Serveral reasons will result in more points in curve:
   1, when I want a more accurate score
   2, if we evaluate on a big dataset, then the points easily exceed 1000 even 
if we set `numBins`=1000. Since the grouping in the curve is limiited in 
partitions, or each partition will contains at least one point. In many 
practical cases, there are tens of thounds of partitions, so there are tens of 
thounds of points. 
   As shown in the second fig, we set `numBins` to default value, and 
repartition the input data to 2000 partitions. Then the sliding job can not be 
ignored.
   
![图片](https://user-images.githubusercontent.com/7322292/58225172-6f070780-7d53-11e9-96f0-5b773b3e5a28.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve

2019-05-22 Thread GitBox
zhengruifeng edited a comment on issue #24648: [SPARK-2][ML] Eliminate 
uncessary sliding job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-495060029
 
 
   @srowen  Oh, not a pass. My expression was not correct.
   Sliding need a separate job to collect head rows on each partitions, which 
can be eliminated.
   When the number of points is small, e.g. 1000, the difference is tiny. 
   As shown in the first fig, only 0.8 sec is saved.
   
![图片](https://user-images.githubusercontent.com/7322292/58225023-ac1eca00-7d52-11e9-997e-76821b2594fd.png)
   
   
   
   Serveral reasons will result in more points in curve:
   1, when I want a more accurate score
   2, if we evaluate on a big dataset, then the points easily exceed 1000 even 
if we set `numBins`=1000. Since the grouping in the curve is limiited in 
partitions, or each partition will contains at least one point. In many 
practical cases, there are tens of thounds of partitions, so there are tens of 
thounds of points. 
   As shown in the second fig, we set `numBins` to default value, and 
repartition the input data to 2000 partitions. Then the sliding job will take 
12 sec, which is much longer than the computation time of AUC (2s)
   
![图片](https://user-images.githubusercontent.com/7322292/58225172-6f070780-7d53-11e9-96f0-5b773b3e5a28.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
gengliangwang edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] 
Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495059043
 
 
   This should not be a big concern. The file writing job is almost 
transactional since Spark follows the `FileCommitProtocol`. 
   If failure happens during writes, the middle output files won't show up in 
the target path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve

2019-05-22 Thread GitBox
zhengruifeng commented on issue #24648: [SPARK-2][ML] Eliminate uncessary 
sliding job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-495060029
 
 
   @srowen  Oh, not a pass. My expression was not correct.
   Sliding need a separate job to collect head rows on each partitions, which 
can be eliminated.
   When the number of points is small, e.g. 1000, the difference is tiny. 
   As shown in the first fig, only 0.8 sec is saved.
   
![图片](https://user-images.githubusercontent.com/7322292/58225023-ac1eca00-7d52-11e9-997e-76821b2594fd.png)
   
   
   
   Serveral reasons will result in more points in curve:
   1, when I want a more accurate score
   2, if we evaluate on a big dataset, then the points easily exceed 1000 even 
if we set `numBins`=1000. Since the grouping in the curve is limiit in 
partitions, or each partition will contains at least on partition. In many 
practional cases, there are tens of thounds of partitions, so there are tens of 
thounds of points. 
   As shown in the second fig, we set `numBins` to default value, and 
repartition the input data to 2000 partitions. Then the sliding job will take 
12 sec, which is much longer than the computation time of AUC (2s)
   
![图片](https://user-images.githubusercontent.com/7322292/58225172-6f070780-7d53-11e9-96f0-5b773b3e5a28.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
gengliangwang commented on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add 
behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495059043
 
 
   This should not be a big concern. The file writing job is almost 
transactional since Spark follows the `FileCommitProtocol`. 
   If failure happens during writes, the existing output file won't show up in 
the target path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
gengliangwang commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286765012
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   @cloud-fan I think this is fine. Otherwise, there is no way for users to 
write with a non-nullable schema.
   But should we show warning messages for such case? So that users can be 
aware of the risk.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
gengliangwang commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286765012
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   @cloud-fan I think this is fine. Otherwise, that is no way for users to 
write with a non-nullable schema.
   But should we show warning messages for such case? So that users can be 
aware of the risk.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
dongjoon-hyun edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] 
Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495054805
 
 
   I understand the concern about the difference from our default `.schema` 
option. I believe this is the main reason why we add `.option("avroSchema", 
...)`.
   
   For Avro, `nullable` column type is `"type": ["int", "null"]` and 
non-nullable column type is `"type": "int"` explicitly.
   
   For ORC/Parquet (DSv1/v2), everything is always nullable by default when 
reading. So, please don't worry about `.schema` use cases. This is a different 
option for different use cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during broadcast join

2019-05-22 Thread GitBox
JkSelf commented on issue #21899: [SPARK-24912][SQL] Don't obscure source of 
OOM during broadcast join
URL: https://github.com/apache/spark/pull/21899#issuecomment-495055518
 
 
   @beliefer Thanks for your working.  Here before we new the newPage in `val 
newPage = new Array[Long](newNumWords.toInt)`, we already check the available 
memory by `ensureAcquireMemory(newNumWords * 8L)`  and if enough memory, we 
will do the creation operation of `newPage`. And if the memory is enough, why 
throw the oom exception in `val newPage = new Array[Long](newNumWords.toInt)`? 
Thanks for your help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] francis0407 commented on a change in pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery

2019-05-22 Thread GitBox
francis0407 commented on a change in pull request #24344: [SPARK-27440][SQL] 
Optimize uncorrelated predicate subquery
URL: https://github.com/apache/spark/pull/24344#discussion_r286764851
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala
 ##
 @@ -55,6 +55,112 @@ object ExecSubqueryExpression {
   }
 }
 
+/**
+ * Exists is used to test for the existence of any record in a subquery.
+ *
+ * This is the physical copy of Exists to be used inside SparkPlan.
+ */
+case class Exists(
+plan: BaseSubqueryExec,
+exprId: ExprId)
+  extends ExecSubqueryExpression {
+
+  override def dataType: DataType = BooleanType
+  override def children: Seq[Expression] = Nil
+  override def nullable: Boolean = false
+  override def toString: String = 
plan.simpleString(SQLConf.get.maxToStringFields)
+  override def withNewPlan(plan: BaseSubqueryExec): Exists = copy(plan = plan)
+
+  // Whether the subquery returns one or more records
+  @volatile private var result: Boolean = _
+  @volatile private var updated: Boolean = false
+
+  def updateResult(): Unit = {
+val rows = plan.executeCollect()
+result = rows.nonEmpty
+updated = true
+  }
+
+  override def eval(input: InternalRow): Boolean = {
+require(updated, s"$this has not finished")
+result
+  }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+require(updated, s"$this has not finished")
+Literal.create(result, BooleanType).doGenCode(ctx, ev)
+  }
+}
+
+/**
+ * Evaluates to `true` if `values` are returned in the subquery's result set.
+ * If `values` are not found in the subquery's result set, and there are nulls 
in
+ * `values` or the result set, it should return null.
+ * This is the physical copy of InSubquery to be used inside SparkPlan.
+ */
+case class InSubquery(
+values: Seq[Literal],
+plan: BaseSubqueryExec,
+exprId: ExprId)
+  extends ExecSubqueryExpression {
+  override def dataType: DataType = BooleanType
+  override def children: Seq[Expression] = Nil
+  override def nullable: Boolean = true
+  override def toString: String = 
plan.simpleString(SQLConf.get.maxToStringFields)
+  override def withNewPlan(plan: BaseSubqueryExec): InSubquery = copy(plan = 
plan)
+
+  @volatile private var result: Boolean = _
+  @volatile private var isNull: Boolean = false
+  @volatile private var updated: Boolean = false
+
+  def updateResult(): Unit = {
+val rows = plan.executeCollect()
+// The semantic of '(a,b) in ((x1, y1), (x2, y2), ...)' is
+// '(a = x1 and b = y1) or (a = x2 and b = y2) or ...'
+val expression = rows.map(row => {
 
 Review comment:
   I have updated this, could you please help check it? cc @dilipbiswal  
@cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
dongjoon-hyun edited a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] 
Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495054805
 
 
   I understand the concern about the difference from our default `.schema` 
option. I believe this is the main reason why we add `.option("avroSchema", 
...)`.
   
   For Avro, `nullable` column type is `"type": ["int", "null"]` and 
non-nullable column type is `"type": "int"` explicitly.
   
   For ORC/Parquet (DSv1/v2), everything is always nullable by default when 
reading. So, please don't worry about `.schema` use cases. This is a different 
use case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
dongjoon-hyun commented on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add 
behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495054805
 
 
   I understand the concern about the difference from our default `.schema` 
option. I believe this is the main reason why we add `.option("avroSchema", 
...)`.
   
   For Avro, `nullable` column type is `"type": ["int", "null"]` and 
non-nullable column type is `"type": "int"` explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
dongjoon-hyun commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286763767
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
 
 Review comment:
   `catalyst` schema is always nullable when we read from the file. This is a 
special support for `.option("avroSchema", ...)` for Avro.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
dongjoon-hyun commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286763619
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   For `.schema()` option, we always enforce `nullable` by using 
`dataSchema.asNullable` in `FileTable`. For me, this is a special support for 
`.option("avroSchema", "")`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23791: [SPARK-20597][SQL][SS][WIP] KafkaSourceProvider falls back on path as synonym for topic

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #23791: [SPARK-20597][SQL][SS][WIP] 
KafkaSourceProvider falls back on path as synonym for topic
URL: https://github.com/apache/spark/pull/23791#issuecomment-495053513
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
HyukjinKwon commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286763265
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
 
 Review comment:
   BTW, note that we don't currently support non-nullable schema in file format 
sources because they are turned to be nullable in SQL batch code path.
   Non-nullable is able to be set in SS tho. Both codes paths should be matched 
- it's a long standing issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23791: [SPARK-20597][SQL][SS][WIP] KafkaSourceProvider falls back on path as synonym for topic

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #23791: [SPARK-20597][SQL][SS][WIP] 
KafkaSourceProvider falls back on path as synonym for topic
URL: https://github.com/apache/spark/pull/23791#issuecomment-463663060
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
HyukjinKwon commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286762917
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
 
 Review comment:
   This doesn't quite make sense to me. Looks if the catalyst schema is 
nullable, it should reject non-nullable avro schema


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on issue #24672: [SPARK-27801] Improve performance of InMemoryFileIndex.listLeafFiles for HDFS directories with many files

2019-05-22 Thread GitBox
wangyum commented on issue #24672: [SPARK-27801] Improve performance of 
InMemoryFileIndex.listLeafFiles for HDFS directories with many files
URL: https://github.com/apache/spark/pull/24672#issuecomment-495052170
 
 
   Thank you @rrusso2007 @JoshRosen I did simple benchmark in our production 
environment(Hadoop version is 2.7.1):
   ```
   19/05/22 19:53:18 WARN InMemoryFileIndex: Elements: 10. default time token: 
41, SPARK-27801 time token: 18, SPARK-27807 time token: 30
   19/05/22 19:53:29 WARN InMemoryFileIndex: Elements: 20. default time token: 
22, SPARK-27801 time token: 10, SPARK-27807 time token: 24
   19/05/22 19:53:30 WARN InMemoryFileIndex: Elements: 50. default time token: 
47, SPARK-27801 time token: 13, SPARK-27807 time token: 25
   19/05/22 19:53:33 WARN InMemoryFileIndex: Elements: 100. default time token: 
54, SPARK-27801 time token: 10, SPARK-27807 time token: 30
   19/05/22 19:53:42 WARN InMemoryFileIndex: Elements: 200. default time token: 
86, SPARK-27801 time token: 19, SPARK-27807 time token: 40
   19/05/22 19:53:52 WARN InMemoryFileIndex: Elements: 500. default time token: 
254, SPARK-27801 time token: 30, SPARK-27807 time token: 90
   19/05/22 19:54:06 WARN InMemoryFileIndex: Elements: 1000. default time 
token: 507, SPARK-27801 time token: 165, SPARK-27807 time token: 117
   19/05/22 19:54:18 WARN InMemoryFileIndex: Elements: 2000. default time 
token: 1193, SPARK-27801 time token: 114, SPARK-27807 time token: 216
   19/05/22 19:54:34 WARN InMemoryFileIndex: Elements: 5000. default time 
token: 2401, SPARK-27801 time token: 430, SPARK-27807 time token: 565
   19/05/22 19:54:56 WARN InMemoryFileIndex: Elements: 1. default time 
token: 4831, SPARK-27801 time token: 646, SPARK-27807 time token: 1202
   19/05/22 19:55:40 WARN InMemoryFileIndex: Elements: 2. default time 
token: 9121, SPARK-27801 time token: 1535, SPARK-27807 time token: 1920
   19/05/22 19:56:45 WARN InMemoryFileIndex: Elements: 4. default time 
token: 18873, SPARK-27801 time token: 2784, SPARK-27807 time token: 3997
   19/05/22 19:58:18 WARN InMemoryFileIndex: Elements: 8. default time 
token: 33658, SPARK-27801 time token: 6476, SPARK-27807 time token: 8326
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #24679: [SPARK-27807][SQL] Parallel resolve leaf statuses InMemoryFileIndex

2019-05-22 Thread GitBox
wangyum closed pull request #24679: [SPARK-27807][SQL] Parallel resolve leaf 
statuses InMemoryFileIndex
URL: https://github.com/apache/spark/pull/24679
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495044115
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495049729
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495042381
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24628: 
[SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495046736
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24628: 
[SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495046742
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105707/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] 
hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495046736
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] 
hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495046742
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105707/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
SparkQA removed a comment on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] 
hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495024200
 
 
   **[Test build #105707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105707/testReport)**
 for PR 24628 at commit 
[`a0e52aa`](https://github.com/apache/spark/commit/a0e52aae93fd8c1b3a3b1931b2102943cb0202a4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
SparkQA commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] 
hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495046418
 
 
   **[Test build #105707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105707/testReport)**
 for PR 24628 at commit 
[`a0e52aa`](https://github.com/apache/spark/commit/a0e52aae93fd8c1b3a3b1931b2102943cb0202a4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
SparkQA commented on issue #24671: [MINOR][DOCS]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495045176
 
 
   **[Test build #105710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105710/testReport)**
 for PR 24671 at commit 
[`3f79e89`](https://github.com/apache/spark/commit/3f79e89e00f920af959a6b979e736af5a43a93c7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24671: [MINOR][DOCS]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495044831
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24671: [MINOR][DOCS]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495044835
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10966/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24671: [MINOR][DOCS]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495044835
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10966/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24671: [MINOR][DOCS]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495044831
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on issue #24671: [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-22 Thread GitBox
beliefer commented on issue #24671: [MINOR][DOCS]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-495044178
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495044115
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #24647: [SPARK-27776][SQL]Avoid duplicate Java reflection in DataSource.

2019-05-22 Thread GitBox
beliefer commented on a change in pull request #24647: [SPARK-27776][SQL]Avoid 
duplicate Java reflection in DataSource.
URL: https://github.com/apache/spark/pull/24647#discussion_r286753702
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -105,6 +105,8 @@ case class DataSource(
   case _ => cls
 }
   }
+  private def providingInstance = providingClass.getConstructor().newInstance()
 
 Review comment:
   If we add a return type, only `Any` could use here. This method is modified 
by private, so whether the return type can be omitted?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] habren commented on issue #24663: [SPARK-27792][SQL] SkewJoin--handle only skewed keys with broadcastjoin

2019-05-22 Thread GitBox
habren commented on issue #24663: [SPARK-27792][SQL] SkewJoin--handle only 
skewed keys with broadcastjoin
URL: https://github.com/apache/spark/pull/24663#issuecomment-495043133
 
 
   @viirya  Could you please review this pull request ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo removed a comment on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-494819839
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page

2019-05-22 Thread GitBox
pengbo commented on issue #24666: [SPARK-27482][SQL][WEBUI] Show 
BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page
URL: https://github.com/apache/spark/pull/24666#issuecomment-495042381
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #24647: [SPARK-27776][SQL]Avoid duplicate Java reflection in DataSource.

2019-05-22 Thread GitBox
beliefer commented on a change in pull request #24647: [SPARK-27776][SQL]Avoid 
duplicate Java reflection in DataSource.
URL: https://github.com/apache/spark/pull/24647#discussion_r286753702
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -105,6 +105,8 @@ case class DataSource(
   case _ => cls
 }
   }
+  private def providingInstance = providingClass.getConstructor().newInstance()
 
 Review comment:
   If we add a return type, only Any could use here. This method is modified by 
private, so whether the return type can be omitted?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-22 Thread GitBox
sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-495040379
 
 
   On the client (executor) side we were seeing lots of timeouts, e.g.:
   
   ```
   ERROR [2019-05-16T18:34:57.782Z] org.apache.spark.storage.BlockManager: 
Failed to connect to external shuffle server, will retry 2 more times after 
waiting 5 seconds... 
   java.io.IOException: Failed to connect to /:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:250)
at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:206)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:300)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:297)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:271)
at org.apache.spark.executor.Executor.(Executor.scala:121)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:92)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:222)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection timed out: /:7337
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   ```
   
   And in the NodeManager logs we were seeing lots of `ClosedChannelException` 
errors from netty, along with the occasional `java.io.IOException: Broken pipe` 
error. For example:
   
   ```
   2019-05-16 05:13:17,999 ERROR 
org.apache.spark.network.server.TransportRequestHandler: Error sending result 
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1647907385644, 
chunkIndex=22}, 
buffer=FileSegmentManagedBuffer{file=/scratch/hadoop/tmp/nm-local-dir/usercache//appcache/application_1557300039674_635976/blockmgr-0ec1d292-3e75-40bd-afd3-79314f427338/11/shuffle_5_3900_0.data,
 offset=12387017, length=1235}} to /:35922; closing connection
   java.nio.channels.ClosedChannelException
   at 
org.spark_project.io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown
 Source)
   ```
   
   We confirmed that the 

[GitHub] [spark] sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-22 Thread GitBox
sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-495040379
 
 
   On the client (executor) side we were seeing lots of timeouts, e.g.:
   
   ```
   ERROR [2019-05-16T18:34:57.782Z] org.apache.spark.storage.BlockManager: 
Failed to connect to external shuffle server, will retry 2 more times after 
waiting 5 seconds... 
   java.io.IOException: Failed to connect to /:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:250)
at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:206)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:300)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:297)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:271)
at org.apache.spark.executor.Executor.(Executor.scala:121)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:92)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:222)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection timed out: /:7337
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   ```
   
   And in the NodeManager logs we were seeing lots of `ClosedChannelException` 
errors from netty, along with the occasional `java.io.IOException: Broken pipe` 
error. For example:
   
   ```
   2019-05-16 05:13:17,999 ERROR 
org.apache.spark.network.server.TransportRequestHandler: Error sending result 
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1647907385644, 
chunkIndex=22}, 
buffer=FileSegmentManagedBuffer{file=/scratch/hadoop/tmp/nm-local-dir/usercache//appcache/application_1557300039674_635976/blockmgr-0ec1d292-3e75-40bd-afd3-79314f427338/11/shuffle_5_3900_0.data,
 offset=12387017, length=1235}} to /:35922; closing connection
   java.nio.channels.ClosedChannelException
   at 
org.spark_project.io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown
 Source)
   ```
   
   We confirmed that the `shuffle-server` 

[GitHub] [spark] sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-22 Thread GitBox
sjrand edited a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-495040379
 
 
   On the client (executor) side we were seeing lots of timeouts, e.g.:
   
   ```
   ERROR [2019-05-16T18:34:57.782Z] org.apache.spark.storage.BlockManager: 
Failed to connect to external shuffle server, will retry 2 more times after 
waiting 5 seconds... 
   java.io.IOException: Failed to connect to /:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:250)
at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:206)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:300)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:297)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:271)
at org.apache.spark.executor.Executor.(Executor.scala:121)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:92)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:222)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection timed out: /:7337
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   ```
   
   And in the NodeManager logs we were seeing lots of `ClosedChannelException` 
errors from netty, along with the occasional `java.io.IOException: Broken pipe` 
error. For example:
   
   ```
   2019-05-16 05:13:17,999 ERROR 
org.apache.spark.network.server.TransportRequestHandler: Error sending result 
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1647907385644, 
chunkIndex=22}, 
buffer=FileSegmentManagedBuffer{file=/scratch/hadoop/tmp/nm-local-dir/usercache//appcache/application_1557300039674_635976/blockmgr-0ec1d292-3e75-40bd-afd3-79314f427338/11/shuffle_5_3900_0.data,
 offset=12387017, length=1235}} to /:35922; closing connection
   java.nio.channels.ClosedChannelException
   ```
   
   We confirmed that the `shuffle-server` threads were still alive in the NM 
and took thread dumps, but we weren't able to determine what the 

[GitHub] [spark] sjrand commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-22 Thread GitBox
sjrand commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number 
of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-495040379
 
 
   On the client (executor) side we were seeing lots of timeouts, e.g.:
   
   ```
   ERROR [2019-05-16T18:34:57.782Z] org.apache.spark.storage.BlockManager: 
Failed to connect to external shuffle server, will retry 2 more times after 
waiting 5 seconds... 
   java.io.IOException: Failed to connect to /:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:250)
at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:206)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:300)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:297)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:271)
at org.apache.spark.executor.Executor.(Executor.scala:121)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:92)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:222)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection timed out: /:7337
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
   ```
   
   And in the NodeManager logs we were seeing lots of `ClosedChannelException` 
errors from netty, along with the occasional `java.io.IOException: Broken pipe` 
error. 
   
   We confirmed that the `shuffle-server` threads were still alive in the NM 
and took thread dumps, but we weren't able to determine what the issue was. In 
the end we just restarted the NodeManagers and this fixed the problem.
   
   I didn't create a JIRA for this just because I don't think the information I 
have so far is enough to be actionable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With 

[GitHub] [spark] SparkQA commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
SparkQA commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039716
 
 
   **[Test build #105709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105709/testReport)**
 for PR 24559 at commit 
[`21a5f07`](https://github.com/apache/spark/commit/21a5f074e3b564a353da28901c8d6cb107ec04c2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add 
FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039354
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add 
FunctionCatalog API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10965/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog 
API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10965/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog 
API
URL: https://github.com/apache/spark/pull/24559#issuecomment-495039354
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
SparkQA commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038346
 
 
   **[Test build #105708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105708/testReport)**
 for PR 24617 at commit 
[`47d89d3`](https://github.com/apache/spark/commit/47d89d37a196e75173996adc6feb475a5c8ce87b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038020
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038025
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10964/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038020
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
AmplabJenkins commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495038025
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10964/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rdblue commented on issue #24233: [SPARK-26356][SQL] remove SaveMode from data source v2

2019-05-22 Thread GitBox
rdblue commented on issue #24233: [SPARK-26356][SQL] remove SaveMode from data 
source v2
URL: https://github.com/apache/spark/pull/24233#issuecomment-495037807
 
 
   @cloud-fan, I don't recall that conclusion from a sync. Can you quote from 
the notes that you're talking about?
   
   I'm fine fixing this in a follow-up, as long as there's a blocker filed so 
that this doesn't go into the 3.0 release.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rdblue commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
rdblue commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495037408
 
 
   @mccheah, I made the changes you requested. Should be good to go when tests 
pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on issue #24605: [SPARK-27711][CORE] Unset InputFileBlockHolder at the end of tasks

2019-05-22 Thread GitBox
jiangxb1987 commented on issue #24605: [SPARK-27711][CORE] Unset 
InputFileBlockHolder at the end of tasks
URL: https://github.com/apache/spark/pull/24605#issuecomment-495034840
 
 
   Thanks! Merged to master, please manually backport to 2.4!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 closed pull request #24605: [SPARK-27711][CORE] Unset InputFileBlockHolder at the end of tasks

2019-05-22 Thread GitBox
jiangxb1987 closed pull request #24605: [SPARK-27711][CORE] Unset 
InputFileBlockHolder at the end of tasks
URL: https://github.com/apache/spark/pull/24605
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #24615: [SPARK-27488][CORE] Driver interface to support GPU resources

2019-05-22 Thread GitBox
jiangxb1987 commented on a change in pull request #24615: [SPARK-27488][CORE] 
Driver interface to support GPU resources
URL: https://github.com/apache/spark/pull/24615#discussion_r286741051
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -187,6 +187,25 @@ of the most common options to set are:
 This option is currently supported on YARN, Mesos and Kubernetes.
   
 
+
+ spark.driver.resource.{resourceName}.count
+  0
+  
+The number of a particular resource type to use on the driver.
+If this is used, you must also specify the
+spark.driver.resource.{resourceName}.discoveryScript
 
 Review comment:
   Do we want to mention `spark.driver.resource.{resourceName}.addresses` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mccheah edited a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
mccheah edited a comment on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495027809
 
 
   Looks good, about what I would expect apart from some small changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mccheah commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
mccheah commented on issue #24617: [SPARK-27732][SQL] Add v2 CreateTable 
implementation.
URL: https://github.com/apache/spark/pull/24617#issuecomment-495027809
 
 
   Looks good, about what we would expect apart from some small changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mccheah commented on a change in pull request #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
mccheah commented on a change in pull request #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#discussion_r286740813
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalog.v2.{Identifier, TableCatalog}
+import org.apache.spark.sql.catalog.v2.expressions.Transform
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.LeafExecNode
+import org.apache.spark.sql.types.StructType
+
+case class CreateTableExec(
+catalog: TableCatalog,
+identifier: Identifier,
+tableSchema: StructType,
+partitioning: Seq[Transform],
+tableProperties: Map[String, String],
+ignoreIfExists: Boolean) extends LeafExecNode {
+
+  override protected def doExecute(): RDD[InternalRow] = {
+def create(): Unit = {
+  catalog.createTable(identifier, tableSchema, partitioning.toArray, 
tableProperties.asJava)
+}
+
+if (!catalog.tableExists(identifier)) {
+  if (ignoreIfExists) {
 
 Review comment:
   I think this can be simplified a bit:
   
   ```
   try {
 create()
   } catch {
 case e: TableAlreadyExistsException if ignoreIfExists {
   logInfo("Table was created concurrently. Ignoring.", e)
 }
   }
   ```
   
   This removes the need to have two branches both calling `create()` and only 
differing by one having a try-catch clause.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mccheah commented on a change in pull request #24617: [SPARK-27732][SQL] Add v2 CreateTable implementation.

2019-05-22 Thread GitBox
mccheah commented on a change in pull request #24617: [SPARK-27732][SQL] Add v2 
CreateTable implementation.
URL: https://github.com/apache/spark/pull/24617#discussion_r286739814
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalog.v2.{Identifier, TableCatalog}
+import org.apache.spark.sql.catalog.v2.expressions.Transform
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.LeafExecNode
+import org.apache.spark.sql.types.StructType
+
+case class CreateTableExec(
+catalog: TableCatalog,
+identifier: Identifier,
+tableSchema: StructType,
+partitioning: Seq[Transform],
+tableProperties: Map[String, String],
+ignoreIfExists: Boolean) extends LeafExecNode {
+
+  override protected def doExecute(): RDD[InternalRow] = {
+def create(): Unit = {
+  catalog.createTable(identifier, tableSchema, partitioning.toArray, 
tableProperties.asJava)
+}
+
+if (!catalog.tableExists(identifier)) {
+  if (ignoreIfExists) {
+try {
+  create()
+} catch {
+  case _: TableAlreadyExistsException =>
+// ignore the table that was created after checking existence
 
 Review comment:
   Might be worth adding a simple log at INFO level indicating there was a 
concurrent create, and then with an exception.
   
   I'm naturally wary of swallowing exceptions without logging them though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24675: [SPARK-27803][SQL] fix column pruning for python UDF

2019-05-22 Thread GitBox
cloud-fan commented on a change in pull request #24675: [SPARK-27803][SQL] fix 
column pruning for python UDF
URL: https://github.com/apache/spark/pull/24675#discussion_r286738983
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ##
 @@ -226,22 +214,4 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
   }
 }
   }
-
-  // Split the original FilterExec to two FilterExecs. Only push down the 
first few predicates
-  // that are all deterministic.
-  private def trySplitFilter(plan: LogicalPlan): LogicalPlan = {
 
 Review comment:
   quote from the PR description
   > There are some hacks in the ExtractPythonUDFs rule, to duplicate the 
column pruning and filter pushdown logic. However, it has some bugs as 
demonstrated in the new test case(only column pruning is broken). This PR 
removes the hacks and re-apply the column pruning and filter pushdown rules 
explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24675: [SPARK-27803][SQL] fix column pruning for python UDF

2019-05-22 Thread GitBox
cloud-fan commented on a change in pull request #24675: [SPARK-27803][SQL] fix 
column pruning for python UDF
URL: https://github.com/apache/spark/pull/24675#discussion_r286738848
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##
 @@ -1143,6 +1143,8 @@ object PushDownPredicate extends Rule[LogicalPlan] with 
PredicateHelper {
 case _: Repartition => true
 case _: ScriptTransformation => true
 case _: Sort => true
+case _: BatchEvalPython => true
 
 Review comment:
   This defines the nodes that we can push filters through.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
cloud-fan commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286738681
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   does parquet/orc have the same behavior? It seems better to forbid this at 
the beginning, otherwise we need to do null check at runtime, which may fail a 
long-running query middle away.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
cloud-fan commented on a change in pull request #24682: [SPARK-27762][SQL] 
[FOLLOWUP] Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#discussion_r286738727
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -930,6 +930,33 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("support user provided non-nullable avro schema " +
+"for nullable catalyst schema without any null record") {
 
 Review comment:
   cc @dongjoon-hyun @gengliangwang 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] hadoop-3.2 support hive-thriftserver

2019-05-22 Thread GitBox
SparkQA commented on issue #24628: [SPARK-27749][SQL][test-hadoop3.2] 
hadoop-3.2 support hive-thriftserver
URL: https://github.com/apache/spark/pull/24628#issuecomment-495024200
 
 
   **[Test build #105707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105707/testReport)**
 for PR 24628 at commit 
[`a0e52aa`](https://github.com/apache/spark/commit/a0e52aae93fd8c1b3a3b1931b2102943cb0202a4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] 
Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495023973
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105706/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] Add behavior change for Avro writer in migration guide

2019-05-22 Thread GitBox
AmplabJenkins removed a comment on issue #24682: [SPARK-27762][SQL] [FOLLOWUP] 
Add behavior change for Avro writer in migration guide
URL: https://github.com/apache/spark/pull/24682#issuecomment-495023966
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >