date:20190530

[GitHub] [spark] AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL 
parser error message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497585758
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-30 Thread GitBox

SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make 
separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-497554557
 
 
   **[Test build #105987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105987/testReport)**
 for PR 24700 at commit 
[`6ad1cd8`](https://github.com/apache/spark/commit/6ad1cd8bf99693675541de2006e9cb006b1b1c95).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL 
parser error message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497585751
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-30 Thread GitBox

SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate 
PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-497586002
 
 
   **[Test build #105987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105987/testReport)**
 for PR 24700 at commit 
[`6ad1cd8`](https://github.com/apache/spark/commit/6ad1cd8bf99693675541de2006e9cb006b1b1c95).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error 
message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497586177
 
 
   **[Test build #105995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105995/testReport)**
 for PR 24749 at commit 
[`5b7a025`](https://github.com/apache/spark/commit/5b7a025e101246b67d312cb7dcd918e379964a9c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser 
error message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497585751
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser 
error message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497585758
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error 
message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#issuecomment-497584679
 
 
   **[Test build #105994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105994/testReport)**
 for PR 24749 at commit 
[`8fd8fa9`](https://github.com/apache/spark/commit/8fd8fa933c6644d017212ed83872ab6aa4a71f35).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289264248
 
 

 ##
 File path: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ##
 @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
 assert(uris.asScala.forall(_.getCache))
   }
 
+  test("SPARK-26082 supports setting fetcher cache in the submission") {
 
 Review comment:
   For this case, yep. If you want, you can remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liancheng commented on a change in pull request #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens

2019-05-30 Thread GitBox

liancheng commented on a change in pull request #24749: [SPARK-27890][SQL] 
Improve SQL parser error message for identifier with hyphens
URL: https://github.com/apache/spark/pull/24749#discussion_r289264175
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -786,6 +790,16 @@ qualifiedName
 : identifier ('.' identifier)*
 ;
 
+errorCapturingIdentifier
+: identifier errorCapturingIdentifierExtra
+;
+
+// extrq grammer for left refactoring
 
 Review comment:
   Typo: extrq => extra


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24743: [WIP][SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24743: [WIP][SPARK-27883][SQL] Port 
AGGREGATES.sql [Part 2]
URL: https://github.com/apache/spark/pull/24743#issuecomment-497584161
 
 
   Got it, @wangyum !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mwlon commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

mwlon commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] 
Retrieve enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289263830
 
 

 ##
 File path: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ##
 @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
 assert(uris.asScala.forall(_.getCache))
   }
 
+  test("SPARK-26082 supports setting fetcher cache in the submission") {
 
 Review comment:
   Ah, right - actually, can I just get rid of the JIRA tag?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497583959
 
 
   Wow, looks useful!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support 
INTERVAL ... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497582962
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support 
INTERVAL ... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497582958
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR 
TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497583318
 
 
   **[Test build #105993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105993/testReport)**
 for PR 24472 at commit 
[`0ee3bc9`](https://github.com/apache/spark/commit/0ee3bc9e870ca583c85029b1c7e29c4f089365f8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497582962
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497582958
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] 
Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497583059
 
 
   Thank you for making the PR again, @wangyum .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289261872
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2.parquet
+
+import java.net.URI
+import java.util.TimeZone
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.parquet.filter2.compat.FilterCompat
+import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate}
+import 
org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS
+import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, 
ParquetInputSplit, ParquetRecordReader}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.execution.datasources.{PartitionedFile, 
RecordReaderIterator}
+import org.apache.spark.sql.execution.datasources.parquet._
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader}
+import org.apache.spark.sql.types.{AtomicType, StructType}
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.SerializableConfiguration
+
+/**
+ * A factory used to create Parquet readers.
+ *
+ * @param sqlConf SQL configuration.
+ * @param broadcastedConf Broadcast serializable Hadoop Configuration.
+ * @param dataSchema Schema of Parquet files.
+ * @param readDataSchema Required schema of Parquet files.
+ * @param partitionSchema Schema of partitions.
+ * @param filters Filters of the batch scan.
+ */
+case class ParquetPartitionReaderFactory(
+sqlConf: SQLConf,
+broadcastedConf: Broadcast[SerializableConfiguration],
+dataSchema: StructType,
+readDataSchema: StructType,
+partitionSchema: StructType,
+filters: Array[Filter]) extends FilePartitionReaderFactory with Logging {
+  private val isCaseSensitive = sqlConf.caseSensitiveAnalysis
+  private val resultSchema = StructType(partitionSchema.fields ++ 
readDataSchema.fields)
+  private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled
+  private val enableVectorizedReader: Boolean = 
sqlConf.parquetVectorizedReaderEnabled &&
+resultSchema.forall(_.dataType.isInstanceOf[AtomicType])
+  private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled
+  private val timestampConversion: Boolean = 
sqlConf.isParquetINT96TimestampConversion
+  private val capacity = sqlConf.parquetVectorizedReaderBatchSize
+  private val enableParquetFilterPushDown: Boolean = 
sqlConf.parquetFilterPushDown
+  private val pushDownDate = sqlConf.parquetFilterPushDownDate
+  private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp
+  private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal
+  private val pushDownStringStartWith = 
sqlConf.parquetFilterPushDownStringStartWith
+  private val pushDownInFilterThreshold = 
sqlConf.parquetFilterPushDownInFilterThreshold
+
+  override def supportColumnarReads(partition: InputPartition): Boolean = {
+sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled &&
+  resultSchema.length <= sqlConf.wholeStageMaxNumFields &&
+  resultSchema.forall(_.dataType.isInstanceOf[AtomicType])
+  }
+
+  override def buildReader(file: PartitionedFile): 
PartitionReader[InternalRow] = {
+val reader = if (enableVectorizedReader) {
+  createVectorizedReader(file)
+} else {
+  createRowBaseReader(file)
+}
+
+val fileReader = new PartitionReader[InternalRow] {
+  override def next(): Boolean =

[GitHub] [spark] wangyum commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-30 Thread GitBox

wangyum commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR 
TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-497582394
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly 
handle task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497581308
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105989/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289261872
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2.parquet
+
+import java.net.URI
+import java.util.TimeZone
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.parquet.filter2.compat.FilterCompat
+import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate}
+import 
org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS
+import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, 
ParquetInputSplit, ParquetRecordReader}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.execution.datasources.{PartitionedFile, 
RecordReaderIterator}
+import org.apache.spark.sql.execution.datasources.parquet._
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader}
+import org.apache.spark.sql.types.{AtomicType, StructType}
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.SerializableConfiguration
+
+/**
+ * A factory used to create Parquet readers.
+ *
+ * @param sqlConf SQL configuration.
+ * @param broadcastedConf Broadcast serializable Hadoop Configuration.
+ * @param dataSchema Schema of Parquet files.
+ * @param readDataSchema Required schema of Parquet files.
+ * @param partitionSchema Schema of partitions.
+ * @param filters Filters of the batch scan.
+ */
+case class ParquetPartitionReaderFactory(
+sqlConf: SQLConf,
+broadcastedConf: Broadcast[SerializableConfiguration],
+dataSchema: StructType,
+readDataSchema: StructType,
+partitionSchema: StructType,
+filters: Array[Filter]) extends FilePartitionReaderFactory with Logging {
+  private val isCaseSensitive = sqlConf.caseSensitiveAnalysis
+  private val resultSchema = StructType(partitionSchema.fields ++ 
readDataSchema.fields)
+  private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled
+  private val enableVectorizedReader: Boolean = 
sqlConf.parquetVectorizedReaderEnabled &&
+resultSchema.forall(_.dataType.isInstanceOf[AtomicType])
+  private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled
+  private val timestampConversion: Boolean = 
sqlConf.isParquetINT96TimestampConversion
+  private val capacity = sqlConf.parquetVectorizedReaderBatchSize
+  private val enableParquetFilterPushDown: Boolean = 
sqlConf.parquetFilterPushDown
+  private val pushDownDate = sqlConf.parquetFilterPushDownDate
+  private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp
+  private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal
+  private val pushDownStringStartWith = 
sqlConf.parquetFilterPushDownStringStartWith
+  private val pushDownInFilterThreshold = 
sqlConf.parquetFilterPushDownInFilterThreshold
+
+  override def supportColumnarReads(partition: InputPartition): Boolean = {
+sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled &&
+  resultSchema.length <= sqlConf.wholeStageMaxNumFields &&
+  resultSchema.forall(_.dataType.isInstanceOf[AtomicType])
+  }
+
+  override def buildReader(file: PartitionedFile): 
PartitionReader[InternalRow] = {
+val reader = if (enableVectorizedReader) {
+  createVectorizedReader(file)
+} else {
+  createRowBaseReader(file)
+}
+
+val fileReader = new PartitionReader[InternalRow] {
+  override def next(): Boolean =

[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly 
handle task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497581305
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

SparkQA removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle 
task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497567324
 
 
   **[Test build #105989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)**
 for PR 24497 at commit 
[`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289261505
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala
 ##
 @@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2.parquet
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.parquet.hadoop.ParquetInputFormat
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetReadSupport, 
ParquetWriteSupport}
+import org.apache.spark.sql.execution.datasources.v2.FileScan
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScan
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.sources.v2.reader.PartitionReaderFactory
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.util.CaseInsensitiveStringMap
+import org.apache.spark.util.SerializableConfiguration
+
+case class ParquetScan(
+sparkSession: SparkSession,
+hadoopConf: Configuration,
+fileIndex: PartitioningAwareFileIndex,
+dataSchema: StructType,
+readDataSchema: StructType,
+readPartitionSchema: StructType,
+filters: Array[Filter],
+pushedFilters: Array[Filter],
+options: CaseInsensitiveStringMap)
+  extends FileScan(sparkSession, fileIndex, readDataSchema, 
readPartitionSchema) {
+  override def isSplitable(path: Path): Boolean = true
+
+  override def createReaderFactory(): PartitionReaderFactory = {
+hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, 
classOf[ParquetReadSupport].getName)
+hadoopConf.set(
+  ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA,
+  readDataSchema.json)
+hadoopConf.set(
+  ParquetWriteSupport.SPARK_ROW_SCHEMA,
+  readDataSchema.json)
+hadoopConf.set(
+  SQLConf.SESSION_LOCAL_TIMEZONE.key,
+  sparkSession.sessionState.conf.sessionLocalTimeZone)
+hadoopConf.setBoolean(
+  SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key,
+  sparkSession.sessionState.conf.nestedSchemaPruningEnabled)
+hadoopConf.setBoolean(
+  SQLConf.CASE_SENSITIVE.key,
+  sparkSession.sessionState.conf.caseSensitiveAnalysis)
+
+ParquetWriteSupport.setSchema(readDataSchema, hadoopConf)
+
+// Sets flags for `ParquetToSparkSchemaConverter`
+hadoopConf.setBoolean(
+  SQLConf.PARQUET_BINARY_AS_STRING.key,
+  sparkSession.sessionState.conf.isParquetBinaryAsString)
+hadoopConf.setBoolean(
+  SQLConf.PARQUET_INT96_AS_TIMESTAMP.key,
+  sparkSession.sessionState.conf.isParquetINT96AsTimestamp)
+
+val broadcastedConf = sparkSession.sparkContext.broadcast(
+  new SerializableConfiguration(hadoopConf))
+ParquetPartitionReaderFactory(sparkSession.sessionState.conf, 
broadcastedConf,
+  dataSchema, readDataSchema, readPartitionSchema, filters)
 
 Review comment:
   This should be `pushedFilters` instead of `filter` since we already 
converted them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle 
task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497581305
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle 
task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497581308
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105989/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end 
events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497581160
 
 
   **[Test build #105989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)**
 for PR 24497 at commit 
[`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

HyukjinKwon commented on a change in pull request #24752: 
[SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar 
Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#discussion_r289260986
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##
 @@ -442,3 +519,172 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 }
+
+
+/**
+ * This object targets to integrate various UDF test cases so that Scalar UDF, 
Python UDF and
+ * Scalar Pandas UDFs can be tested in SBT & Maven tests.
+ *
+ * The available UDFs cast input to strings and take one column as input with 
a string type
+ * column as output.
+ *
+ * To register Scala UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestScalaUDF, spark)
+ * }}}
+ *
+ * To register Python UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark)
+ * }}}
+ *
+ * To register Scalar Pandas UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestScalarPandasUDF, spark)
+ * }}}
+ *
+ * To use it in Scala API and SQL:
+ * {{{
+ *   sql("SELECT udf(1)")
+ *   spark.select(expr("udf(1)")
+ * }}}
+ *
+ * They are currently registered as the name 'udf' in function registry.
+ */
+object IntegratedUDFTestUtils extends SQLHelper with Logging {
+  import scala.sys.process._
+
+  lazy val pythonExec: String = {
+val pythonExec = sys.env.getOrElse("PYSPARK_PYTHON", "python3.6")
 
 Review comment:
   `python3.6` is for Jenkins. Just using `python` could be enough .. I should 
see if it runs correctly in Jenkins.
   We will likely deprecate Python 2 anyway


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] 
Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497580308
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497580308
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497580311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11243/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] 
Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497580311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11243/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289260080
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala
 ##
 @@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2.parquet
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.parquet.hadoop.ParquetInputFormat
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetReadSupport, 
ParquetWriteSupport}
+import org.apache.spark.sql.execution.datasources.v2.FileScan
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScan
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.sources.v2.reader.PartitionReaderFactory
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.util.CaseInsensitiveStringMap
+import org.apache.spark.util.SerializableConfiguration
+
+case class ParquetScan(
+sparkSession: SparkSession,
+hadoopConf: Configuration,
+fileIndex: PartitioningAwareFileIndex,
+dataSchema: StructType,
+readDataSchema: StructType,
+readPartitionSchema: StructType,
+filters: Array[Filter],
+pushedFilters: Array[Filter],
+options: CaseInsensitiveStringMap)
+  extends FileScan(sparkSession, fileIndex, readDataSchema, 
readPartitionSchema) {
+  override def isSplitable(path: Path): Boolean = true
+
+  override def createReaderFactory(): PartitionReaderFactory = {
+hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, 
classOf[ParquetReadSupport].getName)
+hadoopConf.set(
+  ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA,
+  readDataSchema.json)
+hadoopConf.set(
+  ParquetWriteSupport.SPARK_ROW_SCHEMA,
+  readDataSchema.json)
 
 Review comment:
   nit. Since we are making a new class, could you declare `val` for 
`readDataSchema.json` and reuse it at line 52 and 55?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] 
Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11242/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] 
Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579057
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289259910
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java
 ##
 @@ -25,11 +25,11 @@
 
 // Redirects the JUL logging for parquet-mr versions <= 1.8 to SLF4J logging 
using
 // SLF4JBridgeHandler. Parquet-mr versions >= 1.9 use SLF4J directly
-final class ParquetLogRedirector implements Serializable {
+public final class ParquetLogRedirector implements Serializable {
 
 Review comment:
   > Spark uses Parquet >= 1.9. Is this still needed?
   
   I am not sure about this. I think we can resolve this in another Jira/PR.
   
   > Why was it made public?
   
   We need to make it public so that ParquetWriteBuilder can access it. As per 
the discussion in https://issues.apache.org/jira/browse/SPARK-16964, I think it 
is fine to do this in the `sql.execution` package
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

SparkQA commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579378
 
 
   **[Test build #105992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105992/testReport)**
 for PR 24752 at commit 
[`a377255`](https://github.com/apache/spark/commit/a3772558b5d50b037cf9f7a53c344c6c4aa123bc).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

HyukjinKwon commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579209
 
 
   cc @BryanCutler, @cloud-fan, @icexelloss, @viirya, @gatorsmile, @ueshin, 
@wangyum, @dilipbiswal, @dongjoon-hyun 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11242/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an 
integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#issuecomment-497579057
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

HyukjinKwon commented on a change in pull request #24752: 
[SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar 
Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752#discussion_r289259435
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##
 @@ -442,3 +519,172 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 }
+
+
+/**
+ * This object targets to integrate various UDF test cases so that Scalar UDF, 
Python UDF and
+ * Scalar Pandas UDFs can be tested in SBT & Maven tests.
+ *
+ * The available UDFs cast input to strings and take one column as input with 
a string type
+ * column as output.
+ *
+ * To register Scala UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestScalaUDF, spark)
+ * }}}
+ *
+ * To register Python UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark)
+ * }}}
+ *
+ * To register Scalar Pandas UDF in SQL:
+ * {{{
+ *   IntegratedUDFTestUtils.registerTestUDF(new TestScalarPandasUDF, spark)
+ * }}}
+ *
+ * To use it in Scala API and SQL:
+ * {{{
+ *   sql("SELECT udf(1)")
+ *   spark.select(expr("udf(1)")
+ * }}}
+ *
+ * They are currently registered as the name 'udf' in function registry.
+ */
+object IntegratedUDFTestUtils extends SQLHelper with Logging {
 
 Review comment:
   Maybe this has to be moved somewhere else later to be used in Scala APIs too


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files

2019-05-30 Thread GitBox

HyukjinKwon opened a new pull request #24752: [SPARK-27893][SQL][PYTHON] Create 
an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
URL: https://github.com/apache/spark/pull/24752
 
 
   ## What changes were proposed in this pull request?
   
   This PR targets to add an integrated test base for various UDF test cases so 
that Scalar UDF, Python UDF and Scalar Pandas UDFs can be tested in SBT & Maven 
tests.
   
   ### Problem
   
   One of the problems we face is that: `ExtractPythonUDF[s|FromAggregate]` has 
unevaluable expressions that always has to be wrapped with special plans. This 
special rule seems producing many issues, for instance, SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
   
   ### Why do we have less test cases dedicated for SQL and plans?
   
   We don't have such SQL (or plan) dedicated tests in PySpark to catch such 
issues because: 
 - A developer should know both SQL, PySpark, Py4J and version differences 
in Python to write such good test cases
 - To test plans, we should access to plans in JVM via Py4J which is 
tricky, messy and duplicates JVM test cases
 - Usually we just add end-to-end test cases in PySpark therefore there are 
not so many examples to refer
   
   It is non-trivial overhead to switch test base and method (IMHO).
   
   ### How does this PR fix?
   
   This PR adds Python UDF and Scalar Pandas UDF in runtime of SBT / Maven test 
cases. It generates Python-pickled instance (consisting of return type and 
Python native function) that is used in Python or Scalar Pandas UDF and 
directly brings into JVM.
   
   After that, we don't interact via Py4J anymore but run the tests directly in 
JVM - we can just register and run Python UDF and Scalar Pandas UDF in JVM.
   
   Currently, I only integrated this change into SQL file based testing. This 
is how works with `udf-*.sql` files:
   
   After the test files starting with `udf-*.sql` are detected, it creates 
three test cases:
 - Scala UDF test case with a Scalar UDF registered named 'udf'.
 - Python UDF test case with a Python UDF registered named 'udf' iff Python 
executable and pyspark are available.
 - Scalar Pandas UDF test case with a Scalar Pandas UDF registered named 
'udf' iff Python executable, pandas, pyspark and pyarrow are available.
   
   Therefore, UDF test cases should have single input and output files but 
executed by three different types of UDFs.
   
   For instance, 
   
   ```sql
   CREATE TEMPORARY VIEW ta AS
   SELECT udf(a) AS a, udf('a') AS tag FROM t1
   UNION ALL
   SELECT udf(a) AS a, udf('b') AS tag FROM t2;
   
   CREATE TEMPORARY VIEW tb AS
   SELECT udf(a) AS a, udf('a') AS tag FROM t3
   UNION ALL
   SELECT udf(a) AS a, udf('b') AS tag FROM t4;
   
   SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag;
   ```
   
   will be ran 3 times with Scalar UDF, Python UDF and Scalar Pandas UDF each.
   
   ### Appendix
   
   Plus, this PR adds `IntegratedUDFTestUtils` which enables to test and 
execute Python UDF and Scalar Pandas UDFs as below:
   
   To register Python UDF in SQL:
   
   ```scala
   IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark)
   ```
   
   To register Scalar Pandas UDF in SQL:
   
   ```scala
   IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark)
   ```
   
To use it in Scala API:
   
   ```scala
   spark.select(expr("udf(1)").show()
   ```
   
To use it in SQL:
   
   ```scala
   sql("SELECT udf(1)").show()
   ```
   
   This util could be used in the future for better coverage with Scala API 
combinations as well.
   
   ## How was this patch tested?
   
   Tested via the command below:
   
   ```bash
   build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-inner-join.sql"
   ```
   
   ```
   [info] SQLQueryTestSuite:
   [info] - udf/udf-inner-join.sql - Scala UDF (5 seconds, 47 milliseconds)
   [info] - udf/udf-inner-join.sql - Python UDF (4 seconds, 335 milliseconds)
   [info] - udf/udf-inner-join.sql - Scalar Pandas UDF (5 seconds, 423 
milliseconds)
   ```
   
   [python] unavailable:
   
   ```
   [info] SQLQueryTestSuite:
   [info] - udf/udf-inner-join.sql - Scala UDF (4 seconds, 577 milliseconds)
   [info] - udf/udf-inner-join.sql - Python UDF is skipped because [pyton] 
and/or pyspark were not available. !!! IGNORED !!!
   [info] - udf/udf-inner-join.sql - Scalar Pandas UDF is skipped because 
pyspark,pandas and/or pyarrow were not available in [pyton]. !!! IGNORED !!!
   ```
   
   pyspark unavailable:
   
   ```
   [info] SQLQueryTestSuite:
   [info] - udf/udf-inner-join.sql - Scala UDF (4 seconds, 991 milliseconds)
   [info] - udf/udf-inner-join.sql - Python UDF is skipped because [python] 
and/or pyspark were not available. !!! IGNORED !!!
   [info] - udf/udf-inner-join.sql - Scalar Pandas UDF is skipped because 
pyspark,pandas and/or pyarrow were not available in [python]. !!! IGNORED !!!
   ```
   
   pandas

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289259062
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
 ##
 @@ -238,7 +240,8 @@ case class AlterTableAddColumnsCommand(
 // TextFileFormat only default to one column "value"
 // Hive type is already considered as hive serde table, so the logic 
will not
 // come in here.
-case _: JsonFileFormat | _: CSVDataSourceV2 | _: ParquetFileFormat | 
_: OrcDataSourceV2 =>
+case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat =>
 
 Review comment:
   Could you make this another small PR since this is Parquet migration PR? 
Also, it would be great if the PR has a test coverage for this missing part.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] 
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289259062
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
 ##
 @@ -238,7 +240,8 @@ case class AlterTableAddColumnsCommand(
 // TextFileFormat only default to one column "value"
 // Hive type is already considered as hive serde table, so the logic 
will not
 // come in here.
-case _: JsonFileFormat | _: CSVDataSourceV2 | _: ParquetFileFormat | 
_: OrcDataSourceV2 =>
+case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat =>
 
 Review comment:
   Could you make this another small PR since this is Parquet migration PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24751: 
[SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497577723
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11241/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24751: 
[SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497577720
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] 
Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497577723
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11241/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] 
Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497577720
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289258283
 
 

 ##
 File path: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ##
 @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
 assert(uris.asScala.forall(_.getCache))
   }
 
+  test("SPARK-26082 supports setting fetcher cache in the submission") {
 
 Review comment:
   ```
   - test("SPARK-26082 supports setting fetcher cache in the submission") {
   + test("SPARK-26192 supports setting fetcher cache in the submission") {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

SparkQA commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move 
Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751#issuecomment-497576728
 
 
   **[Test build #105991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105991/testReport)**
 for PR 24751 at commit 
[`addb908`](https://github.com/apache/spark/commit/addb9087b34bfb83aec9c300f473b88a08b670d9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

wangyum opened a new pull request #24751: [SPARK-27831][SQL][TEST][test-maven] 
Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24751
 
 
   ## What changes were proposed in this pull request?
   
   This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, 
`hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and 
`hive-hcatalog-core-2.3.5.jar`) to maven dependency.
   
   ## How was this patch tested?
   
   Existing test
   
   Please note that this pr need test with `maven` and `sbt`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-30 Thread GitBox

cloud-fan commented on a change in pull request #24706: [SPARK-23128][SQL] A 
new approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#discussion_r289256656
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##
 @@ -0,0 +1,380 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import java.util
+import java.util.concurrent.LinkedBlockingQueue
+
+import scala.collection.JavaConverters._
+import scala.collection.concurrent.TrieMap
+import scala.collection.mutable
+import scala.concurrent.ExecutionContext
+
+import org.apache.spark.SparkException
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, ReturnAnswer}
+import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.exchange._
+import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * A root node to execute the query plan adaptively. It splits the query plan 
into independent
+ * stages and executes them in order according to their dependencies. The 
query stage
+ * materializes its output at the end. When one stage completes, the data 
statistics of the
+ * materialized output will be used to optimize the remainder of the query.
+ *
+ * To create query stages, we traverse the query tree bottom up. When we hit 
an exchange node,
+ * and if all the child query stages of this exchange node are materialized, 
we create a new
+ * query stage for this exchange node. The new stage is then materialized 
asynchronously once it
+ * is created.
+ *
+ * When one query stage finishes materialization, the rest query is 
re-optimized and planned based
+ * on the latest statistics provided by all materialized stages. Then we 
traverse the query plan
+ * again and create more stages if possible. After all stages have been 
materialized, we execute
+ * the rest of the plan.
+ */
+case class AdaptiveSparkPlanExec(
+initialPlan: SparkPlan,
+session: SparkSession,
+subqueryMap: Map[Long, ExecSubqueryExpression],
+stageCache: TrieMap[SparkPlan, QueryStageExec])
+  extends LeafExecNode {
+
+  def executedPlan: SparkPlan = currentPhysicalPlan
+
+  override def conf: SQLConf = session.sessionState.conf
+
+  override def output: Seq[Attribute] = initialPlan.output
+
+  override def doCanonicalize(): SparkPlan = initialPlan.canonicalized
+
+  override def doExecute(): RDD[InternalRow] = lock.synchronized {
+var currentLogicalPlan = currentPhysicalPlan.logicalLink.get
+var result = createQueryStages(currentPhysicalPlan)
+val events = new LinkedBlockingQueue[StageMaterializationEvent]()
+val errors = new mutable.ArrayBuffer[SparkException]()
+while (!result.allChildStagesMaterialized) {
+  currentPhysicalPlan = result.newPlan
+  currentLogicalPlan = updateLogicalPlan(currentLogicalPlan, 
result.newStages)
+  currentPhysicalPlan.setTagValue(SparkPlan.LOGICAL_PLAN_TAG, 
currentLogicalPlan)
+  onUpdatePlan()
+
+  // Start materialization of all new stages.
+  result.newStages.map(_._2).foreach { stage =>
+stage.materialize().onComplete { res =>
+  if (res.isSuccess) {
+events.offer(StageSuccess(stage, res.get))
+  } else {
+events.offer(StageFailure(stage, res.failed.get))
+  }
+}(AdaptiveSparkPlanExec.executionContext)
+  }
+
+  // Wait on the next completed stage, which indicates new stats are 
available and probably
+  // new stages can be created. There might be other stages that finish at 
around the same
+  // time, so we process those stages too in order to reduce re-planning.
+  val nextMsg = events.take()
+  val rem = new util.ArrayList[StageMaterializationEvent]()
+  events.drainTo(rem)
+

[GitHub] [spark] William1104 commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring

2019-05-30 Thread GitBox

William1104 commented on a change in pull request #24747: 
[SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
URL: https://github.com/apache/spark/pull/24747#discussion_r289256350
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
 ##
 @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase
* Drops temporary view `viewNames` after calling `f`.
*/
   protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView))
   }
 
   /**
* Drops global temporary view `viewNames` after calling `f`.
*/
   protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // global temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropGlobalTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView))
   }
 
   /**
* Drops table `tableName` after calling `f`.
*/
   protected def withTable(tableNames: String*)(f: => Unit): Unit = {
-try f finally {
-  tableNames.foreach { name =>
-spark.sql(s"DROP TABLE IF EXISTS $name")
-  }
-}
+tryWithFinally(f)(tableNames.foreach { name =>
+  spark.sql(s"DROP TABLE IF EXISTS $name")
+})
   }
 
   /**
* Drops view `viewName` after calling `f`.
*/
   protected def withView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
+tryWithFinally(f)(
   viewNames.foreach { name =>
 spark.sql(s"DROP VIEW IF EXISTS $name")
   }
-}
+)
   }
 
   /**
* Drops cache `cacheName` after calling `f`.
*/
   protected def withCache(cacheNames: String*)(f: => Unit): Unit = {
-try f finally {
-  cacheNames.foreach { cacheName =>
-try uncacheTable(cacheName) catch {
-  case _: AnalysisException =>
+tryWithFinally(f)(cacheNames.foreach(uncacheTable))
+  }
+
+  /**
+   * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
+   * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
+   * from the finallyBlock with be added to the exception thrown from tryBlock 
as a
+   * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
+   * from finallyBlock
+   */
+  private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = 
{
 
 Review comment:
   You are right. They looks almost exactly the same. And this function was in 
Spark four years already.. I didn’t do enough research on what we already have 
in Spark... 
   
   The function I created is redundant. Even the name looks very similar. I 
will update the test to reuse this Utils.tryWithSafeFinally in SQLTestUtils. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] 
Retrieve enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497573582
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105990/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497573581
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497573582
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105990/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] 
Retrieve enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497573581
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

SparkQA removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570801
 
 
   **[Test build #105990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)**
 for PR 24750 at commit 
[`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497573540
 
 
   **[Test build #105990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)**
 for PR 24750 at commit 
[`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289253607
 
 

 ##
 File path: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ##
 @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler(
   }
 
   private def getDriverUris(desc: MesosDriverDescription): 
List[CommandInfo.URI] = {
+val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", 
false) ||
+  desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false)
 
 Review comment:
   Although this is effectively the same, could you preserve the order of 
boolean expressions to be consistent with the master branch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289253404
 
 

 ##
 File path: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ##
 @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler(
   }
 
   private def getDriverUris(desc: MesosDriverDescription): 
List[CommandInfo.URI] = {
+val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", 
false) ||
+  desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false)
 
 Review comment:
   Ur, this one looks like the opposite direction. Could you check this again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289253607
 
 

 ##
 File path: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ##
 @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler(
   }
 
   private def getDriverUris(desc: MesosDriverDescription): 
List[CommandInfo.URI] = {
+val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", 
false) ||
+  desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false)
 
 Review comment:
   Although this is effectively same, could you preserve the order of boolean 
expressions to be consistent?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24750: 
[SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission 
for driver URIs
URL: https://github.com/apache/spark/pull/24750#discussion_r289253404
 
 

 ##
 File path: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ##
 @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler(
   }
 
   private def getDriverUris(desc: MesosDriverDescription): 
List[CommandInfo.URI] = {
+val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", 
false) ||
+  desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false)
 
 Review comment:
   Ur, this one looks the opposite direction. Could you check this again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570801
 
 
   **[Test build #105990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)**
 for PR 24750 at commit 
[`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] 
Retrieve enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11240/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] 
Retrieve enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570445
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570445
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11240/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497570054
 
 
   Thank you for making a new PR, @mwlon .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569432
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24747: 
[SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
URL: https://github.com/apache/spark/pull/24747#discussion_r289252522
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
 ##
 @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase
* Drops temporary view `viewNames` after calling `f`.
*/
   protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView))
   }
 
   /**
* Drops global temporary view `viewNames` after calling `f`.
*/
   protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // global temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropGlobalTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView))
   }
 
   /**
* Drops table `tableName` after calling `f`.
*/
   protected def withTable(tableNames: String*)(f: => Unit): Unit = {
-try f finally {
-  tableNames.foreach { name =>
-spark.sql(s"DROP TABLE IF EXISTS $name")
-  }
-}
+tryWithFinally(f)(tableNames.foreach { name =>
+  spark.sql(s"DROP TABLE IF EXISTS $name")
+})
   }
 
   /**
* Drops view `viewName` after calling `f`.
*/
   protected def withView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
+tryWithFinally(f)(
   viewNames.foreach { name =>
 spark.sql(s"DROP VIEW IF EXISTS $name")
   }
-}
+)
   }
 
   /**
* Drops cache `cacheName` after calling `f`.
*/
   protected def withCache(cacheNames: String*)(f: => Unit): Unit = {
-try f finally {
-  cacheNames.foreach { cacheName =>
-try uncacheTable(cacheName) catch {
-  case _: AnalysisException =>
+tryWithFinally(f)(cacheNames.foreach(uncacheTable))
+  }
+
+  /**
+   * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
+   * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
+   * from the finallyBlock with be added to the exception thrown from tryBlock 
as a
+   * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
+   * from finallyBlock
+   */
+  private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = 
{
 
 Review comment:
   Unfortunately, the proposed one looks like the existing 
[tryWithSafeFinally](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1347)?
 Does this has a new feature compared to the existing `tryWithSafeFinally`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569944
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)

2019-05-30 Thread GitBox

BryanCutler commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each 
batch for pandas UDF (for improving pandas UDFs pipeline)
URL: https://github.com/apache/spark/pull/24734#issuecomment-497569777
 
 
   >BTW, is the buffer size 65536 bytes? So .. the issue is that we should wait 
until 65536 bytes is full? Why don't we simply add a config to control the 
buffer size then?
   
   Yes, I think this is the right approach if there is too much latency between 
endpoints. There is a config for the Scala side `spark.buffer.size` already


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569126
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring

2019-05-30 Thread GitBox

dongjoon-hyun commented on a change in pull request #24747: 
[SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
URL: https://github.com/apache/spark/pull/24747#discussion_r289252522
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
 ##
 @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase
* Drops temporary view `viewNames` after calling `f`.
*/
   protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView))
   }
 
   /**
* Drops global temporary view `viewNames` after calling `f`.
*/
   protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
-  // If the test failed part way, we don't want to mask the failure by 
failing to remove
-  // global temp views that never got created.
-  try viewNames.foreach(spark.catalog.dropGlobalTempView) catch {
-case _: NoSuchTableException =>
-  }
-}
+tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView))
   }
 
   /**
* Drops table `tableName` after calling `f`.
*/
   protected def withTable(tableNames: String*)(f: => Unit): Unit = {
-try f finally {
-  tableNames.foreach { name =>
-spark.sql(s"DROP TABLE IF EXISTS $name")
-  }
-}
+tryWithFinally(f)(tableNames.foreach { name =>
+  spark.sql(s"DROP TABLE IF EXISTS $name")
+})
   }
 
   /**
* Drops view `viewName` after calling `f`.
*/
   protected def withView(viewNames: String*)(f: => Unit): Unit = {
-try f finally {
+tryWithFinally(f)(
   viewNames.foreach { name =>
 spark.sql(s"DROP VIEW IF EXISTS $name")
   }
-}
+)
   }
 
   /**
* Drops cache `cacheName` after calling `f`.
*/
   protected def withCache(cacheNames: String*)(f: => Unit): Unit = {
-try f finally {
-  cacheNames.foreach { cacheName =>
-try uncacheTable(cacheName) catch {
-  case _: AnalysisException =>
+tryWithFinally(f)(cacheNames.foreach(uncacheTable))
+  }
+
+  /**
+   * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
+   * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
+   * from the finallyBlock with be added to the exception thrown from tryBlock 
as a
+   * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
+   * from finallyBlock
+   */
+  private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = 
{
 
 Review comment:
   Unfortunately, the proposed one looks like with the existing 
[tryWithSafeFinally](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1347)?
 Does this has a new feature compared to the existing `tryWithSafeFinally`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569432
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569044
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569126
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
URL: https://github.com/apache/spark/pull/24750#issuecomment-497569044
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mwlon opened a new pull request #24750: functional changes for SPARK-26192

2019-05-30 Thread GitBox

mwlon opened a new pull request #24750: functional changes for SPARK-26192
URL: https://github.com/apache/spark/pull/24750
 
 
   ## What changes were proposed in this pull request?
   
   Let Mesos fetcher cache option come from submission as well as dispatcher
   
   ## How was this patch tested?
   
   Existing unit tests and a new one
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly 
handle task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497568162
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly 
handle task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497568167
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11239/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle 
task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497568162
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle 
task end events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497568167
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11239/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU 
Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497567799
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support 
GPU Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497567806
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105986/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU 
Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497567806
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105986/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support 
GPU Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497567799
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] felixcheung commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-30 Thread GitBox

felixcheung commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make 
separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-497567724
 
 
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

SparkQA commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in 
Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497567530
 
 
   **[Test build #105986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105986/testReport)**
 for PR 24374 at commit 
[`a3c1db5`](https://github.com/apache/spark/commit/a3c1db51da4e6f426aa978faff9449f331f56dec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

SparkQA removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU 
Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497548036
 
 
   **[Test build #105986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105986/testReport)**
 for PR 24374 at commit 
[`a3c1db5`](https://github.com/apache/spark/commit/a3c1db51da4e6f426aa978faff9449f331f56dec).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24695: [SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency

2019-05-30 Thread GitBox

dongjoon-hyun commented on issue #24695: 
[SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency
URL: https://github.com/apache/spark/pull/24695#issuecomment-497567318
 
 
   Thank you, @wangyum. Yes, please make a PR with that approach, @wangyum . At 
this time, let's test with all combinations. Last time, we tested 
SBT(hadoop-2.7/hadoop-3.2) only.
   
   @srowen . Unfortunately, this patch is already reverted from the master, and 
it's verified in another PR (here, 
https://github.com/apache/spark/pull/24745#pullrequestreview-244020056) to pass 
the Maven.
   - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105971/
   
   The last Jenkins passed the Hive UT but failed at some YARN UTs. I guess the 
next run will pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages

2019-05-30 Thread GitBox

SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end 
events from completed stages
URL: https://github.com/apache/spark/pull/24497#issuecomment-497567324
 
 
   **[Test build #105989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)**
 for PR 24497 at commit 
[`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] felixcheung commented on issue #24731: [SPARK-27725][EXAMPLES] Add an example discovery Script for GPU resources

2019-05-30 Thread GitBox

felixcheung commented on issue #24731: [SPARK-27725][EXAMPLES] Add an example 
discovery Script for GPU resources
URL: https://github.com/apache/spark/pull/24731#issuecomment-497566753
 
 
   script sounds good


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block

2019-05-30 Thread GitBox

turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large 
shuffle partition to multi-segments to enable transfer oversize shuffle 
partition block
URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772
 
 
   > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as 
you're also running a recent shuffle service. this uses fetching shuffle blocks 
to disk, instead of memory, which should be enabled by default for large blocks 
(https://issues.apache.org/jira/browse/SPARK-24297).
   > 
   > But if you're seeing a failure with that, can you share some more details?
   
   
   Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not 
notice your PR, which set 
maxRemoteBlockSizeFetchToMem to a value little than 2GB.
   But when resources are available,  it is a good idea that we can set this 
value large than 2GB to reduce the I/O overhead. 
   So, shall we support the shuffle data transmission no matter the shuffle 
blocks' size when maxRemoteBlockSizeFetchToMem is large than 2GB?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block

2019-05-30 Thread GitBox

turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large 
shuffle partition to multi-segments to enable transfer oversize shuffle 
partition block
URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772
 
 
   > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as 
you're also running a recent shuffle service. this uses fetching shuffle blocks 
to disk, instead of memory, which should be enabled by default for large blocks 
(https://issues.apache.org/jira/browse/SPARK-24297).
   > 
   > But if you're seeing a failure with that, can you share some more details?
   
   
   Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not 
notice your PR, which set 
maxRemoteBlockSizeFetchToMem to a value little than 2GB.
   But when resources are available,  it is a good idea that we can set this 
value large than 2GB to reduce the I/O overhead. So, shall we support the 
shuffle data transmission when maxRemoteBlockSizeFetchToMem is large than 2GB?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block

2019-05-30 Thread GitBox

turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large 
shuffle partition to multi-segments to enable transfer oversize shuffle 
partition block
URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772
 
 
   > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as 
you're also running a recent shuffle service. this uses fetching shuffle blocks 
to disk, instead of memory, which should be enabled by default for large blocks 
(https://issues.apache.org/jira/browse/SPARK-24297).
   > 
   > But if you're seeing a failure with that, can you share some more details?
   
   
   Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not 
notice your PR, which set 
maxRemoteBlockSizeFetchToMem to a value little than 2GB.
   But when resources are available,  it is a good idea that we can set this 
value large than 2GB to reduce the I/O overhead. 
   So, shall we support the shuffle data transmission when 
maxRemoteBlockSizeFetchToMem is large than 2GB?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling

2019-05-30 Thread GitBox

AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU 
Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#issuecomment-497562069
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 809 matches

Mail list logo