[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521126149
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521126158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14156/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25448: [SPARK-28697] Invalidate Database/Table names starting with underscore

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25448: [SPARK-28697] Invalidate 
Database/Table names starting with underscore
URL: https://github.com/apache/spark/pull/25448#issuecomment-521125826
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521126158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14156/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25408: [SPARK-28687][SQL] Support `epoch`, 
`isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#issuecomment-521126128
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521126149
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25408: [SPARK-28687][SQL] Support 
`epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#issuecomment-521126133
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14157/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25408: [SPARK-28687][SQL] Support 
`epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#issuecomment-521126128
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25408: [SPARK-28687][SQL] Support `epoch`, 
`isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#issuecomment-521126133
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14157/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25448: [SPARK-28697] Invalidate Database/Table names starting with underscore

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25448: [SPARK-28697] Invalidate 
Database/Table names starting with underscore
URL: https://github.com/apache/spark/pull/25448#issuecomment-521125989
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
dongjoon-hyun commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521125981
 
 
   BTW, unfortunately, the ongoing Jenkins tests will be kill in 5 minutes 
because it's already midnight in PST timezone. I'll visit this PR tomorrow 
again. Thanks for testing, @wangyum and @HyukjinKwon .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25448: [SPARK-28697] Invalidate names starting with _ to avoid unexpected behaviour

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25448: [SPARK-28697] Invalidate names 
starting with _ to avoid unexpected behaviour
URL: https://github.com/apache/spark/pull/25448#issuecomment-521125826
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ajithme commented on issue #25448: [SPARK-28697] Invalidate names starting with _ to avoid unexpected behaviour

2019-08-13 Thread GitBox
ajithme commented on issue #25448: [SPARK-28697] Invalidate names starting with 
_ to avoid unexpected behaviour
URL: https://github.com/apache/spark/pull/25448#issuecomment-521125734
 
 
   @dongjoon-hyun @cloud-fan @HyukjinKwon  please review and let me know your 
opinion on the fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
dongjoon-hyun commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521125536
 
 
   @wangyum . If we change this PR's `builtinHiveVersion` version to `2.3.6`, 
`HiveThriftServer2Suites` and `HiveMetastoreLazyInitializationSuite` seems to 
fail. 
   > val builtinHiveVersion: String = if (isHive23) "2.3.5" else "1.2.1"
   
   Is that the reason we need to do this later?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ajithme opened a new pull request #25448: [SPARK-28697] Invalidate names starting with _ to avoid unexpected behaviour

2019-08-13 Thread GitBox
ajithme opened a new pull request #25448: [SPARK-28697] Invalidate names 
starting with _ to avoid unexpected behaviour
URL: https://github.com/apache/spark/pull/25448
 
 
   ## What changes were proposed in this pull request?
   
   I think we should disallow if a identifier starts with _ for create database 
and create table
   Partially we can see its effect in SPARK-28697 where as the table name 
starts with _ (like _sampleTable) , the FileFormat assumes it to be a hidden 
folder and do not list it which causes unusual behavior
   
   ## How was this patch tested?
   
   Avoiding creating tables and databases with names starting from underscore. 
Added test case for same


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] 
Investigate JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521124655
 
 
   **[Test build #109090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109090/testReport)**
 for PR 25447 at commit 
[`f7d40b0`](https://github.com/apache/spark/commit/f7d40b075680d90b141c888524eb64545ce2081c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521124064
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hddong commented on a change in pull request #25412: [SPARK-28691][EXAMPLES] Add Java/Scala DirectKerberizedKafkaWordCount examples

2019-08-13 Thread GitBox
hddong commented on a change in pull request #25412: [SPARK-28691][EXAMPLES] 
Add Java/Scala DirectKerberizedKafkaWordCount examples
URL: https://github.com/apache/spark/pull/25412#discussion_r313726230
 
 

 ##
 File path: 
examples/src/main/scala/org/apache/spark/examples/streaming/DirectKerberizedKafkaWordCount.scala
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.streaming
+
+import org.apache.kafka.clients.CommonClientConfigs
+import org.apache.kafka.clients.consumer.ConsumerConfig
+import org.apache.kafka.common.security.auth.SecurityProtocol
+import org.apache.kafka.common.serialization.StringDeserializer
+
+import org.apache.spark.SparkConf
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.kafka010._
+
+/**
+ * Consumes messages from one or more topics in Kafka and does wordcount.
+ * Usage: DirectKerberizedKafkaWordCount  
+ *is a list of one or more Kafka brokers
+ *is a consumer group name to consume from topics
+ *is a list of one or more kafka topics to consume from
+ *
+ * Example:
+ *$ bin/run-example --files ${path}/kafka_jaas.conf \
 
 Review comment:
   > Where is `kafka_jaas.conf` file? Can we describe how to execute this 
example from the very first bash command?
   
   `kafka_jaas.conf` can manually create, I will add a template and describe it 
in this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521124067
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14155/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521124067
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14155/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521124064
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25447: [DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
HyukjinKwon commented on issue #25447: 
[DO-NOT-MERGE][test-hadoop3.2][test-maven] Investigate JAVA_HOME not being set 
properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521123890
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on issue #24566: [SPARK-27667][SQL] Get the current database from spark catalog instead of querying the Hive

2019-08-13 Thread GitBox
sandeep-katta commented on issue #24566: [SPARK-27667][SQL] Get the current 
database from spark catalog instead of querying the Hive
URL: https://github.com/apache/spark/pull/24566#issuecomment-521121995
 
 
   @wangyum can you please review this, I have added the SQLConf


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] PavithraRamachandran commented on a change in pull request #25394: [SPARK-28671][SQL]when a non exsistent permanent function is dropped, NoSuchPermanentFunctionException is thrown

2019-08-13 Thread GitBox
PavithraRamachandran commented on a change in pull request #25394: 
[SPARK-28671][SQL]when a non exsistent permanent function is 
dropped,NoSuchPermanentFunctionException is thrown
URL: https://github.com/apache/spark/pull/25394#discussion_r313722698
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ##
 @@ -1114,7 +1114,7 @@ class SessionCatalog(
   }
   externalCatalog.dropFunction(db, name.funcName)
 } else if (!ignoreIfNotExists) {
-  throw new NoSuchFunctionException(db = db, func = identifier.toString)
+  throw new NoSuchPermanentFunctionException(db = db, func = 
identifier.toString)
 
 Review comment:
   cc @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#issuecomment-521120214
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14153/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
SparkQA commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#issuecomment-521120721
 
 
   **[Test build #109088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109088/testReport)**
 for PR 25439 at commit 
[`4d5965e`](https://github.com/apache/spark/commit/4d5965ecb48685faed63a751100433a273695e5b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25115: [SPARK-28351][SQL] Support 
DELETE in DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521120283
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14154/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
SparkQA commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521120746
 
 
   **[Test build #109089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109089/testReport)**
 for PR 25115 at commit 
[`bbf5156`](https://github.com/apache/spark/commit/bbf515666495cbf5f12731b3cdab4a23960f3d77).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#issuecomment-521120210
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25115: [SPARK-28351][SQL] Support 
DELETE in DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521120280
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521120283
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14154/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521120280
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#issuecomment-521120210
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#issuecomment-521120214
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14153/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
cloud-fan commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521119576
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #25206: [SPARK-28265][SQL] Add renameTable to TableCatalog API

2019-08-13 Thread GitBox
cloud-fan closed pull request #25206: [SPARK-28265][SQL] Add renameTable to 
TableCatalog API
URL: https://github.com/apache/spark/pull/25206
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] 
- Fix StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#discussion_r313720790
 
 

 ##
 File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
 ##
 @@ -575,6 +577,8 @@ class StreamingContext private[streaming] (
   try {
 validate()
 
+registerProgressListener()
 
 Review comment:
   I think so. I believe``StreamingTab`` shouldn't be responsible for 
registering/unregistering the listener as it could be and even already used in 
other place (metrics). Moreover seems there is also a bug that if ui is 
disabled, listener isn't registered and metrics aren't reported. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on a change in pull request #25431: [SPARK-28711][DOCS] Update migration guide to add note about Hive upgrade

2019-08-13 Thread GitBox
sandeep-katta commented on a change in pull request #25431: [SPARK-28711][DOCS] 
Update migration guide to add note about Hive upgrade
URL: https://github.com/apache/spark/pull/25431#discussion_r313720598
 
 

 ##
 File path: docs/sql-migration-guide-upgrade.md
 ##
 @@ -23,6 +23,9 @@ license: |
 {:toc}
 
 ## Upgrading From Spark SQL 2.4 to 3.0
+  - Since Spark 3.0, hive is upgraded to 2.3.x, so it is required to update 
the Hive
 
 Review comment:
   okay understood, this should be in the scope of Hive upgrade. Thank you 
@wangyum . I will close this PR as invalid 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta closed pull request #25431: [SPARK-28711][DOCS] Update migration guide to add note about Hive upgrade

2019-08-13 Thread GitBox
sandeep-katta closed pull request #25431: [SPARK-28711][DOCS] Update migration 
guide to add note about Hive upgrade
URL: https://github.com/apache/spark/pull/25431
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #25442: [SPARK-28722][ML] Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread GitBox
viirya commented on a change in pull request #25442: [SPARK-28722][ML] Change 
sequential label sorting in StringIndexer fit to parallel
URL: https://github.com/apache/spark/pull/25442#discussion_r313719799
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala
 ##
 @@ -213,32 +221,36 @@ class StringIndexer @Since("1.4.0") (
 val labelsArray = $(stringOrderType) match {
   case StringIndexer.frequencyDesc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = false)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
   counts.toSeq.sortWith(sortFunc).map(_._1).toArray
-}
+}.toArray
   case StringIndexer.frequencyAsc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = true)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
   counts.toSeq.sortWith(sortFunc).map(_._1).toArray
-}
+}.toArray
   case StringIndexer.alphabetDesc =>
-import dataset.sparkSession.implicits._
 dataset.persist()
-val labels = inputCols.map { inputCol =>
-  
dataset.select(inputCol).na.drop().distinct().sort(dataset(s"$inputCol").desc)
-.as[String].collect()
-}
+val selectedCols = getSelectedCols(dataset, 
inputCols).map(collect_set(_))
+val allLabels = dataset.select(selectedCols: _*)
+  .collect().toSeq.flatMap(_.toSeq).asInstanceOf[Seq[Seq[String]]]
 
 Review comment:
   distinct is done at executors by `collect_set` expression. Yes, sorting is 
done at the driver.
   
   This is a good point. I think it depends on the cardinality of input labels. 
For StringIndexer, the cardinality should not be very high, as suggested 
billion level.
   
   Actually, for frequency-based string order, sorting is also done at the 
driver, previously.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #25206: [SPARK-28265][SQL] Add renameTable to TableCatalog API

2019-08-13 Thread GitBox
cloud-fan commented on issue #25206: [SPARK-28265][SQL] Add renameTable to 
TableCatalog API
URL: https://github.com/apache/spark/pull/25206#issuecomment-521117919
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] 
- Fix StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#discussion_r313719160
 
 

 ##
 File path: 
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
 ##
 @@ -52,8 +52,6 @@ class InputStreamsSuite extends TestSuiteBase with 
BeforeAndAfter {
 
   // Set up the streaming context and input streams
   withStreamingContext(new StreamingContext(conf, batchDuration)) { ssc =>
-ssc.addStreamingListener(ssc.progressListener)
-
 
 Review comment:
   Correct.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] 
- Fix StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#discussion_r313719072
 
 

 ##
 File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
 ##
 @@ -373,7 +374,7 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with TimeL
 Thread.sleep(100)
   }
 
-  test ("registering and de-registering of streamingSource") {
+  test("registering and de-registering of streamingSource") {
 
 Review comment:
   Got it, reverted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] 
- Fix StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#discussion_r313718989
 
 

 ##
 File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
 ##
 @@ -392,6 +393,29 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with TimeL
 assert(!sourcesAfterStop.contains(streamingSourceAfterStop))
   }
 
+  test("registering and de-registering of progressListener") {
 
 Review comment:
   Sure, updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] - Fix StreamingContext leak through Streaming…

2019-08-13 Thread GitBox
choojoyq commented on a change in pull request #25439: [SPARK-28709][DSTREAMS] 
- Fix StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439#discussion_r313718918
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ui/SparkUI.scala
 ##
 @@ -138,6 +138,10 @@ private[spark] class SparkUI private (
 streamingJobProgressListener = Option(sparkListener)
   }
 
+  def clearStreamingJobProgressListener(): Unit = {
+streamingJobProgressListener = None
+  }
+
 
 Review comment:
   Removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #25431: [SPARK-28711][DOCS] Update migration guide to add note about Hive upgrade

2019-08-13 Thread GitBox
wangyum commented on a change in pull request #25431: [SPARK-28711][DOCS] 
Update migration guide to add note about Hive upgrade
URL: https://github.com/apache/spark/pull/25431#discussion_r313718805
 
 

 ##
 File path: docs/sql-migration-guide-upgrade.md
 ##
 @@ -23,6 +23,9 @@ license: |
 {:toc}
 
 ## Upgrading From Spark SQL 2.4 to 3.0
+  - Since Spark 3.0, hive is upgraded to 2.3.x, so it is required to update 
the Hive
 
 Review comment:
   There are two things here:
   1. If you want to improve the performance of the Hive Metastore Server. The 
correct way is to upgrade your Hive Metastore Server to latest version. 
`SCHEMA_VERSION` should be updated by [Hive 
itself](https://github.com/apache/hive/blob/c57a59611fa168ee38c6ee0ee60b1d6c4994f9f8/metastore/scripts/upgrade/mysql/upgrade-1.2.0-to-1.3.0.mysql.sql).
   2. Upgrade built-in Hive to 2.3.x still can get benefits if you Hive 
Metastore Server is 1.2.x, such as 
[SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014), 
[SPARK-27500](https://issues.apache.org/jira/browse/SPARK-27500) and 
[SPARK-26321](https://issues.apache.org/jira/browse/SPARK-26321).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115705
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109085/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521112095
 
 
   **[Test build #109085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109085/testReport)**
 for PR 25447 at commit 
[`0c89766`](https://github.com/apache/spark/commit/0c897661afb5f716c404d6892b550b04140be153).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115698
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115698
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not 
being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115681
 
 
   **[Test build #109085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109085/testReport)**
 for PR 25447 at commit 
[`0c89766`](https://github.com/apache/spark/commit/0c897661afb5f716c404d6892b550b04140be153).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115705
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109085/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115415
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109087/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115392
 
 
   **[Test build #109087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109087/testReport)**
 for PR 25447 at commit 
[`130bef4`](https://github.com/apache/spark/commit/130bef4c51373554afc3427cd09a6af52a63ef86).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not 
being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115392
 
 
   **[Test build #109087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109087/testReport)**
 for PR 25447 at commit 
[`130bef4`](https://github.com/apache/spark/commit/130bef4c51373554afc3427cd09a6af52a63ef86).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115410
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not 
being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115404
 
 
   **[Test build #109087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109087/testReport)**
 for PR 25447 at commit 
[`130bef4`](https://github.com/apache/spark/commit/130bef4c51373554afc3427cd09a6af52a63ef86).
* This patch **fails due to an unknown error code, 125**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115415
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109087/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521115410
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #25442: [SPARK-28722][ML] Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread GitBox
viirya commented on a change in pull request #25442: [SPARK-28722][ML] Change 
sequential label sorting in StringIndexer fit to parallel
URL: https://github.com/apache/spark/pull/25442#discussion_r313716725
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala
 ##
 @@ -213,32 +221,36 @@ class StringIndexer @Since("1.4.0") (
 val labelsArray = $(stringOrderType) match {
   case StringIndexer.frequencyDesc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = false)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
 
 Review comment:
   Picked this number as following other places in Spark using 
`ThreadUtils.parmap`. I'm not sure we can use driver core config 
(`spark.driver.cores`) as it is not for this purpose. From the document, this 
config is only in cluster mode. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521114985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521114985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521114983
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521114983
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313715521
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/CatalogManager.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A thread-safe manager for [[CatalogPlugin]]s. It tracks all the registered 
catalogs, and allow
+ * the caller to look up a catalog by name.
+ */
+class CatalogManager(conf: SQLConf) extends Logging {
+
+  private val catalogs = mutable.HashMap.empty[String, CatalogPlugin]
+
+  def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, conf))
+  }
+
+  def defaultCatalog: Option[CatalogPlugin] = {
+conf.defaultV2Catalog.flatMap { catalogName =>
+  try {
+Some(catalog(catalogName))
+  } catch {
+case NonFatal(e) =>
+  logError(s"Cannot load default v2 catalog: $catalogName", e)
+  None
+  }
+}
+  }
+
+  def v2SessionCatalog: Option[CatalogPlugin] = {
+try {
+  Some(catalog(CatalogManager.SESSION_CATALOG_NAME))
+} catch {
+  case NonFatal(e) =>
+logError("Cannot load v2 session catalog", e)
+None
+}
+  }
+
+  private def getDefaultNamespace(c: CatalogPlugin) = c match {
+case c: SupportsNamespaces => c.defaultNamespace()
+case _ => Array.empty[String]
+  }
+
+  private var _currentNamespace = {
+// The builtin catalog use "default" as the default database.
 
 Review comment:
   I think we're saying the same things. I totally agree that:
   for a catalog `c1` ```sql
   SELECT ... FROM c1.t
   ```, this should be a fully qualified identifier. I'm saying that we should 
push the namespace configuration into the catalog that supports it. It 
shouldn't be part of the CatalogManager.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313715521
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/CatalogManager.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A thread-safe manager for [[CatalogPlugin]]s. It tracks all the registered 
catalogs, and allow
+ * the caller to look up a catalog by name.
+ */
+class CatalogManager(conf: SQLConf) extends Logging {
+
+  private val catalogs = mutable.HashMap.empty[String, CatalogPlugin]
+
+  def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, conf))
+  }
+
+  def defaultCatalog: Option[CatalogPlugin] = {
+conf.defaultV2Catalog.flatMap { catalogName =>
+  try {
+Some(catalog(catalogName))
+  } catch {
+case NonFatal(e) =>
+  logError(s"Cannot load default v2 catalog: $catalogName", e)
+  None
+  }
+}
+  }
+
+  def v2SessionCatalog: Option[CatalogPlugin] = {
+try {
+  Some(catalog(CatalogManager.SESSION_CATALOG_NAME))
+} catch {
+  case NonFatal(e) =>
+logError("Cannot load v2 session catalog", e)
+None
+}
+  }
+
+  private def getDefaultNamespace(c: CatalogPlugin) = c match {
+case c: SupportsNamespaces => c.defaultNamespace()
+case _ => Array.empty[String]
+  }
+
+  private var _currentNamespace = {
+// The builtin catalog use "default" as the default database.
 
 Review comment:
   I think we're saying the same things. I totally agree that:
   for a catalog `c1` 
   ```sql
   SELECT ... FROM c1.t
   ```, this should be a fully qualified identifier. I'm saying that we should 
push the namespace configuration into the catalog that supports it. It 
shouldn't be part of the CatalogManager.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313715521
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/CatalogManager.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A thread-safe manager for [[CatalogPlugin]]s. It tracks all the registered 
catalogs, and allow
+ * the caller to look up a catalog by name.
+ */
+class CatalogManager(conf: SQLConf) extends Logging {
+
+  private val catalogs = mutable.HashMap.empty[String, CatalogPlugin]
+
+  def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, conf))
+  }
+
+  def defaultCatalog: Option[CatalogPlugin] = {
+conf.defaultV2Catalog.flatMap { catalogName =>
+  try {
+Some(catalog(catalogName))
+  } catch {
+case NonFatal(e) =>
+  logError(s"Cannot load default v2 catalog: $catalogName", e)
+  None
+  }
+}
+  }
+
+  def v2SessionCatalog: Option[CatalogPlugin] = {
+try {
+  Some(catalog(CatalogManager.SESSION_CATALOG_NAME))
+} catch {
+  case NonFatal(e) =>
+logError("Cannot load v2 session catalog", e)
+None
+}
+  }
+
+  private def getDefaultNamespace(c: CatalogPlugin) = c match {
+case c: SupportsNamespaces => c.defaultNamespace()
+case _ => Array.empty[String]
+  }
+
+  private var _currentNamespace = {
+// The builtin catalog use "default" as the default database.
 
 Review comment:
   I think we're saying the same things. I totally agree that:
   for a catalog `c1` 
   ```sql
   SELECT ... FROM c1.t
   ```
   , this should be a fully qualified identifier. I'm saying that we should 
push the namespace configuration into the catalog that supports it. It 
shouldn't be part of the CatalogManager.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
brkyvz commented on a change in pull request #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313715521
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/CatalogManager.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A thread-safe manager for [[CatalogPlugin]]s. It tracks all the registered 
catalogs, and allow
+ * the caller to look up a catalog by name.
+ */
+class CatalogManager(conf: SQLConf) extends Logging {
+
+  private val catalogs = mutable.HashMap.empty[String, CatalogPlugin]
+
+  def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, conf))
+  }
+
+  def defaultCatalog: Option[CatalogPlugin] = {
+conf.defaultV2Catalog.flatMap { catalogName =>
+  try {
+Some(catalog(catalogName))
+  } catch {
+case NonFatal(e) =>
+  logError(s"Cannot load default v2 catalog: $catalogName", e)
+  None
+  }
+}
+  }
+
+  def v2SessionCatalog: Option[CatalogPlugin] = {
+try {
+  Some(catalog(CatalogManager.SESSION_CATALOG_NAME))
+} catch {
+  case NonFatal(e) =>
+logError("Cannot load v2 session catalog", e)
+None
+}
+  }
+
+  private def getDefaultNamespace(c: CatalogPlugin) = c match {
+case c: SupportsNamespaces => c.defaultNamespace()
+case _ => Array.empty[String]
+  }
+
+  private var _currentNamespace = {
+// The builtin catalog use "default" as the default database.
 
 Review comment:
   I think we're saying the same things. I totally agree that:
   for a catalog `c1` 
   ```sql
   SELECT ... FROM c1.t
   ```,
this should be a fully qualified identifier. I'm saying that we should push 
the namespace configuration into the catalog that supports it. It shouldn't be 
part of the CatalogManager.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
cloud-fan commented on a change in pull request #25368: [SPARK-28635][SQL] 
create CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313714736
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -45,8 +45,8 @@ case class DataSourceResolution(
   import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
   import lookup._
 
-  lazy val v2SessionCatalog: CatalogPlugin = lookup.sessionCatalog
-  .getOrElse(throw new AnalysisException("No v2 session catalog 
implementation is available"))
+  def v2SessionCatalog: CatalogPlugin = lookup.sessionCatalog
 
 Review comment:
   The `LookupCatalog` has some convenient utils, e.g. 
`CatalogObjectIdentifier`, `AsTableIdentifier`, etc. I think we should still 
keep it.
   
   BTW good point about making this rule take `CatalogManager` directly. Will 
update it soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-13 Thread GitBox
cloud-fan commented on a change in pull request #25368: [SPARK-28635][SQL] 
create CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#discussion_r313714259
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/CatalogManager.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A thread-safe manager for [[CatalogPlugin]]s. It tracks all the registered 
catalogs, and allow
+ * the caller to look up a catalog by name.
+ */
+class CatalogManager(conf: SQLConf) extends Logging {
+
+  private val catalogs = mutable.HashMap.empty[String, CatalogPlugin]
+
+  def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, conf))
+  }
+
+  def defaultCatalog: Option[CatalogPlugin] = {
+conf.defaultV2Catalog.flatMap { catalogName =>
+  try {
+Some(catalog(catalogName))
+  } catch {
+case NonFatal(e) =>
+  logError(s"Cannot load default v2 catalog: $catalogName", e)
+  None
+  }
+}
+  }
+
+  def v2SessionCatalog: Option[CatalogPlugin] = {
+try {
+  Some(catalog(CatalogManager.SESSION_CATALOG_NAME))
+} catch {
+  case NonFatal(e) =>
+logError("Cannot load v2 session catalog", e)
+None
+}
+  }
+
+  private def getDefaultNamespace(c: CatalogPlugin) = c match {
+case c: SupportsNamespaces => c.defaultNamespace()
+case _ => Array.empty[String]
+  }
+
+  private var _currentNamespace = {
+// The builtin catalog use "default" as the default database.
 
 Review comment:
   I think current namespace only make sense to the current catalog, e.g. 
`SELECT ... FROM t`, `t` can be a table in the current catalog's current 
namespace. However, for `SELECT ... FROM c1.t`, it's confusing to say `t` is a 
table in catalog `c1`'s current namespace.
   
   When a table identifier starts with a catalog name, it should be a fully 
qualified identifier, and we can't apply current namespace here.
   
   catalog (including `V2SessionCatalog`) can report its default namespace, 
which will be used as the current namespace when switching to the catalog at 
the first time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-52677
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14150/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-52669
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xianyinxin commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2

2019-08-13 Thread GitBox
xianyinxin commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-521112238
 
 
   It seems the failure pyspark test has nothing to do with this pr.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
SparkQA commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not 
being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-521112095
 
 
   **[Test build #109085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109085/testReport)**
 for PR 25447 at commit 
[`0c89766`](https://github.com/apache/spark/commit/0c897661afb5f716c404d6892b550b04140be153).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
SparkQA commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521112119
 
 
   **[Test build #109086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109086/testReport)**
 for PR 25443 at commit 
[`77a70ae`](https://github.com/apache/spark/commit/77a70ae1b98b538a315ca7f53e44fd15a49b0ec2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-52726
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-52730
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14151/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-52677
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14150/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-52730
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14151/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25447: [DO-NOT-MERGE] Investigate JAVA_HOME 
not being set properly
URL: https://github.com/apache/spark/pull/25447#issuecomment-52669
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-52726
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on a change in pull request #25442: [SPARK-28722][ML] Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread GitBox
felixcheung commented on a change in pull request #25442: [SPARK-28722][ML] 
Change sequential label sorting in StringIndexer fit to parallel
URL: https://github.com/apache/spark/pull/25442#discussion_r313712444
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala
 ##
 @@ -213,32 +221,36 @@ class StringIndexer @Since("1.4.0") (
 val labelsArray = $(stringOrderType) match {
   case StringIndexer.frequencyDesc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = false)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
 
 Review comment:
   how is 8 picked here? should this be ~= number of driver core or something?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on a change in pull request #25442: [SPARK-28722][ML] Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread GitBox
felixcheung commented on a change in pull request #25442: [SPARK-28722][ML] 
Change sequential label sorting in StringIndexer fit to parallel
URL: https://github.com/apache/spark/pull/25442#discussion_r313713261
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala
 ##
 @@ -213,32 +221,36 @@ class StringIndexer @Since("1.4.0") (
 val labelsArray = $(stringOrderType) match {
   case StringIndexer.frequencyDesc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = false)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
   counts.toSeq.sortWith(sortFunc).map(_._1).toArray
-}
+}.toArray
   case StringIndexer.frequencyAsc =>
 val sortFunc = StringIndexer.getSortFunc(ascending = true)
-countByValue(dataset, inputCols).map { counts =>
+val orgStrings = countByValue(dataset, inputCols).toSeq
+ThreadUtils.parmap(orgStrings, "sortingStringLabels", 8) { counts =>
   counts.toSeq.sortWith(sortFunc).map(_._1).toArray
-}
+}.toArray
   case StringIndexer.alphabetDesc =>
-import dataset.sparkSession.implicits._
 dataset.persist()
-val labels = inputCols.map { inputCol =>
-  
dataset.select(inputCol).na.drop().distinct().sort(dataset(s"$inputCol").desc)
-.as[String].collect()
-}
+val selectedCols = getSelectedCols(dataset, 
inputCols).map(collect_set(_))
+val allLabels = dataset.select(selectedCols: _*)
+  .collect().toSeq.flatMap(_.toSeq).asInstanceOf[Seq[Seq[String]]]
 
 Review comment:
   so this can be selecting a large number of columns and collecting it all to 
the driver for distinct/sort? isn't this possibly very slow if we have billions 
of record?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #25447: [DO-NOT-MERGE] Investigate JAVA_HOME not being set properly

2019-08-13 Thread GitBox
HyukjinKwon opened a new pull request #25447: [DO-NOT-MERGE] Investigate 
JAVA_HOME not being set properly
URL: https://github.com/apache/spark/pull/25447
 
 
   ## What changes were proposed in this pull request?
   
   Do not merge
   
   ## How was this patch tested?
   
   N/A


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25402: [SPARK-28666] Support saveAsTable for V2 tables through Session Catalog

2019-08-13 Thread GitBox
SparkQA commented on issue #25402: [SPARK-28666] Support saveAsTable for V2 
tables through Session Catalog
URL: https://github.com/apache/spark/pull/25402#issuecomment-521110620
 
 
   **[Test build #109084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109084/testReport)**
 for PR 25402 at commit 
[`673d95a`](https://github.com/apache/spark/commit/673d95a58fb1b80618c9d626acc8d1a64dd61d51).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25348: [SPARK-28554][SQL] Adds a v1 
fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-521110184
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14149/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25402: [SPARK-28666] Support saveAsTable for V2 tables through Session Catalog

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25402: [SPARK-28666] Support 
saveAsTable for V2 tables through Session Catalog
URL: https://github.com/apache/spark/pull/25402#issuecomment-521110185
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25348: [SPARK-28554][SQL] Adds a v1 
fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-521110178
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521110129
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14147/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25402: [SPARK-28666] Support saveAsTable for V2 tables through Session Catalog

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25402: [SPARK-28666] Support 
saveAsTable for V2 tables through Session Catalog
URL: https://github.com/apache/spark/pull/25402#issuecomment-521110189
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14148/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521110127
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
HyukjinKwon commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521110213
 
 
   Let me open a separate PR and proceed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25402: [SPARK-28666] Support saveAsTable for V2 tables through Session Catalog

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25402: [SPARK-28666] Support saveAsTable for 
V2 tables through Session Catalog
URL: https://github.com/apache/spark/pull/25402#issuecomment-521110185
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521110127
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback 
writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-521110184
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14149/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25348: [SPARK-28554][SQL] Adds a v1 fallback 
writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-521110178
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521110129
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14147/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25402: [SPARK-28666] Support saveAsTable for V2 tables through Session Catalog

2019-08-13 Thread GitBox
AmplabJenkins commented on issue #25402: [SPARK-28666] Support saveAsTable for 
V2 tables through Session Catalog
URL: https://github.com/apache/spark/pull/25402#issuecomment-521110189
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14148/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-08-13 Thread GitBox
felixcheung commented on issue #24922: [SPARK-28120][SS]  Rocksdb state storage 
implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-521110033
 
 
   cool. is the issue here 
https://github.com/apache/spark/pull/24922#issuecomment-510327508 resolved?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-13 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521109054
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109083/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >