[GitHub] [spark] viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r263265265
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSchemaPruningSuite.scala
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.execution.datasources.SchemaPruningSuite
+import org.apache.spark.sql.internal.SQLConf
+
+class OrcSchemaPruningSuite extends SchemaPruningSuite {
+  override protected val dataSourceName: String = "orc"
+  override protected val vectorizedReaderEnabledKey: String =
+SQLConf.ORC_VECTORIZED_READER_ENABLED.key
+
+  override protected def sparkConf: SparkConf =
+super
+  .sparkConf
+  .set(SQLConf.USE_V1_SOURCE_READER_LIST, "orc")
+  .set(SQLConf.USE_V1_SOURCE_WRITER_LIST, "orc")
+
+  testSchemaPruning("select a single complex field") {
+val query = sql("select name.middle from contacts")
+checkScan(query, "struct>")
+checkAnswer(query.orderBy("id"), Row("X.") :: Row("Y.") :: Nil)
+  }
+
+  testSchemaPruning("select a single complex field and its parent struct") {
+val query = sql("select name.middle, name from contacts")
+checkScan(query, 
"struct>")
+checkAnswer(query.orderBy("id"),
+  Row("X.", Row("Jane", "X.", "Doe")) ::
+Row("Y.", Row("John", "Y.", "Doe")) ::
 
 Review comment:
   Sure. Automatically edited by the IDE...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r263265164
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala
 ##
 @@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.scalactic.Equality
+
+import org.apache.spark.sql.{DataFrame, QueryTest, Row}
+import org.apache.spark.sql.catalyst.SchemaPruningTest
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.StructType
+
+abstract class SchemaPruningSuite
+extends QueryTest
+with FileBasedDataSourceTest
+with SchemaPruningTest
+with SharedSQLContext {
+  case class FullName(first: String, middle: String, last: String)
+  case class Company(name: String, address: String)
+  case class Employer(id: Int, company: Company)
+  case class Contact(
+id: Int,
+name: FullName,
+address: String,
+pets: Int,
+friends: Array[FullName] = Array.empty,
+relatives: Map[String, FullName] = Map.empty,
+employer: Employer = null)
+
+  val janeDoe = FullName("Jane", "X.", "Doe")
+  val johnDoe = FullName("John", "Y.", "Doe")
+  val susanSmith = FullName("Susan", "Z.", "Smith")
+
+  val employer = Employer(0, Company("abc", "123 Business Street"))
+  val employerWithNullCompany = Employer(1, null)
+
+  val contacts =
+Contact(0, janeDoe, "123 Main Street", 1, friends = Array(susanSmith),
+  relatives = Map("brother" -> johnDoe), employer = employer) ::
+Contact(1, johnDoe, "321 Wall Street", 3, relatives = Map("sister" -> 
janeDoe),
+  employer = employerWithNullCompany) :: Nil
+
+  case class Name(first: String, last: String)
+  case class BriefContact(id: Int, name: Name, address: String)
+
+  private val briefContacts =
+BriefContact(2, Name("Janet", "Jones"), "567 Maple Drive") ::
+BriefContact(3, Name("Jim", "Jones"), "6242 Ash Street") :: Nil
+
+  case class ContactWithDataPartitionColumn(
+id: Int,
+name: FullName,
+address: String,
+pets: Int,
+friends: Array[FullName] = Array(),
+relatives: Map[String, FullName] = Map(),
+employer: Employer = null,
+p: Int)
+
+  case class BriefContactWithDataPartitionColumn(id: Int, name: Name, address: 
String, p: Int)
+
+  val contactsWithDataPartitionColumn =
+contacts.map { case Contact(id, name, address, pets, friends, relatives, 
employer) =>
+  ContactWithDataPartitionColumn(id, name, address, pets, friends, 
relatives, employer, 1) }
+  val briefContactsWithDataPartitionColumn =
+briefContacts.map { case BriefContact(id, name, address) =>
+  BriefContactWithDataPartitionColumn(id, name, address, 2) }
+
+  testSchemaPruning("select a single complex field array and its parent struct 
array") {
 
 Review comment:
   Yes, we should be able to use all same tests between Parquet and ORC once 
user specified schema works.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23998: [MINOR][SQL]Add a conf to control 
subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470417898
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103116/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r263264514
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSchemaPruningSuite.scala
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.execution.datasources.SchemaPruningSuite
+import org.apache.spark.sql.internal.SQLConf
+
+class OrcSchemaPruningSuite extends SchemaPruningSuite {
+  override protected val dataSourceName: String = "orc"
+  override protected val vectorizedReaderEnabledKey: String =
+SQLConf.ORC_VECTORIZED_READER_ENABLED.key
+
+  override protected def sparkConf: SparkConf =
+super
+  .sparkConf
+  .set(SQLConf.USE_V1_SOURCE_READER_LIST, "orc")
+  .set(SQLConf.USE_V1_SOURCE_WRITER_LIST, "orc")
+
+  testSchemaPruning("select a single complex field") {
+val query = sql("select name.middle from contacts")
+checkScan(query, "struct>")
+checkAnswer(query.orderBy("id"), Row("X.") :: Row("Y.") :: Nil)
+  }
+
+  testSchemaPruning("select a single complex field and its parent struct") {
+val query = sql("select name.middle, name from contacts")
+checkScan(query, 
"struct>")
+checkAnswer(query.orderBy("id"),
+  Row("X.", Row("Jane", "X.", "Doe")) ::
+Row("Y.", Row("John", "Y.", "Doe")) ::
+Nil)
+  }
+
+  testSchemaPruning("select a single complex field and the partition column") {
+val query = sql("select name.middle, p from contacts")
+checkScan(query, "struct>")
+checkAnswer(query.orderBy("id"),
+  Row("X.", 1) :: Row("Y.", 1) ::  Nil)
+  }
+
+  testSchemaPruning("no unnecessary schema pruning") {
+val query =
+  sql("select id, name.last, name.middle, name.first, relatives[''].last, 
" +
+"relatives[''].middle, relatives[''].first, friends[0].last, 
friends[0].middle, " +
+"friends[0].first, pets, address from contacts where p=1")
+// We've selected every field in the schema. Therefore, no schema pruning 
should be performed.
+// We check this by asserting that the scanned schema of the query is 
identical to the schema
+// of the contacts relation, even though the fields are selected in 
different orders.
+checkScan(query,
+  
"struct,address:string,pets:int,"
 +
+"friends:array>," +
+
"relatives:map>>")
+checkAnswer(query.orderBy("id"),
+  Row(0, "Doe", "X.", "Jane", null, null, null, "Smith", "Z.", "Susan", 1, 
"123 Main Street") ::
+Row(1, "Doe", "Y.", "John", null, null, null, null, null, null, 3, 
"321 Wall Street") ::
+Nil)
+  }
+
+  testSchemaPruning("select a single complex field and is null expression in 
project") {
+val query = sql("select name.first, address is not null from contacts")
+checkScan(query, "struct,address:string>")
+checkAnswer(query.orderBy("id"),
+  Row("Jane", true) :: Row("John", true) :: Nil)
+  }
+
+  /**
+   * Overrides this because ORC datasource doesn't support schema merging 
currently.
 
 Review comment:
   I haven't tried. But I guess with user specified schema, it should work. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23998: [MINOR][SQL]Add a conf to 
control subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470417887
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23998: [MINOR][SQL]Add a conf to 
control subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470417898
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103116/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23998: [MINOR][SQL]Add a conf to control 
subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470417887
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
SparkQA removed a comment on issue #23998: [MINOR][SQL]Add a conf to control 
subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470369391
 
 
   **[Test build #103116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103116/testReport)**
 for PR 23998 at commit 
[`d468c7f`](https://github.com/apache/spark/commit/d468c7f813f557c01e94e0ae76de395d3bdcd71f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23998: [MINOR][SQL]Add a conf to control subqueryReuse

2019-03-06 Thread GitBox
SparkQA commented on issue #23998: [MINOR][SQL]Add a conf to control 
subqueryReuse
URL: https://github.com/apache/spark/pull/23998#issuecomment-470417392
 
 
   **[Test build #103116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103116/testReport)**
 for PR 23998 at commit 
[`d468c7f`](https://github.com/apache/spark/commit/d468c7f813f557c01e94e0ae76de395d3bdcd71f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-03-06 Thread GitBox
fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new 
approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#discussion_r263260234
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -36,107 +35,12 @@ import org.apache.spark.sql.internal.SQLConf
  * the input partition ordering requirements are met.
  */
 case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
-  private def defaultNumPreShufflePartitions: Int = conf.numShufflePartitions
-
-  private def targetPostShuffleInputSize: Long = 
conf.targetPostShuffleInputSize
-
-  private def adaptiveExecutionEnabled: Boolean = conf.adaptiveExecutionEnabled
-
-  private def minNumPostShufflePartitions: Option[Int] = {
-val minNumPostShufflePartitions = conf.minNumPostShufflePartitions
-if (minNumPostShufflePartitions > 0) Some(minNumPostShufflePartitions) 
else None
-  }
-
-  /**
-   * Adds [[ExchangeCoordinator]] to [[ShuffleExchangeExec]]s if adaptive 
query execution is enabled
-   * and partitioning schemes of these [[ShuffleExchangeExec]]s support 
[[ExchangeCoordinator]].
-   */
-  private def withExchangeCoordinator(
-  children: Seq[SparkPlan],
-  requiredChildDistributions: Seq[Distribution]): Seq[SparkPlan] = {
-val supportsCoordinator =
-  if (children.exists(_.isInstanceOf[ShuffleExchangeExec])) {
-// Right now, ExchangeCoordinator only support HashPartitionings.
-children.forall {
-  case e @ ShuffleExchangeExec(hash: HashPartitioning, _, _) => true
-  case child =>
-child.outputPartitioning match {
-  case hash: HashPartitioning => true
-  case collection: PartitioningCollection =>
-
collection.partitionings.forall(_.isInstanceOf[HashPartitioning])
-  case _ => false
-}
-}
-  } else {
-// In this case, although we do not have Exchange operators, we may 
still need to
-// shuffle data when we have more than one children because data 
generated by
-// these children may not be partitioned in the same way.
-// Please see the comment in withCoordinator for more details.
-val supportsDistribution = requiredChildDistributions.forall { dist =>
-  dist.isInstanceOf[ClusteredDistribution] || 
dist.isInstanceOf[HashClusteredDistribution]
-}
-children.length > 1 && supportsDistribution
-  }
-
-val withCoordinator =
-  if (adaptiveExecutionEnabled && supportsCoordinator) {
-val coordinator =
-  new ExchangeCoordinator(
-targetPostShuffleInputSize,
-minNumPostShufflePartitions)
-children.zip(requiredChildDistributions).map {
-  case (e: ShuffleExchangeExec, _) =>
-// This child is an Exchange, we need to add the coordinator.
-e.copy(coordinator = Some(coordinator))
-  case (child, distribution) =>
-// If this child is not an Exchange, we need to add an Exchange 
for now.
-// Ideally, we can try to avoid this Exchange. However, when we 
reach here,
-// there are at least two children operators (because if there is 
a single child
-// and we can avoid Exchange, supportsCoordinator will be false 
and we
-// will not reach here.). Although we can make two children have 
the same number of
-// post-shuffle partitions. Their numbers of pre-shuffle 
partitions may be different.
-// For example, let's say we have the following plan
-// Join
-// /  \
-//   Agg  Exchange
-//   /  \
-//Exchange  t2
-//  /
-// t1
-// In this case, because a post-shuffle partition can include 
multiple pre-shuffle
-// partitions, a HashPartitioning will not be strictly partitioned 
by the hashcodes
-// after shuffle. So, even we can use the child Exchange operator 
of the Join to
-// have a number of post-shuffle partitions that matches the 
number of partitions of
-// Agg, we cannot say these two children are partitioned in the 
same way.
-// Here is another case
-// Join
-// /  \
-//   Agg1  Agg2
-//   /  \
-//   Exchange1  Exchange2
-//   /   \
-//  t1   t2
-// In this case, two Aggs shuffle data with the same column of the 
join condition.
-// After we use ExchangeCoordinator, these two Aggs may not be 
partitioned in the same
-// way. Let's say that Agg1 and Agg2 both have 5 pre-shuffle 
partitions and 2
-// post-shuffle partitions. It is possible that Agg1 fetches those 

[GitHub] [spark] fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-03-06 Thread GitBox
fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new 
approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#discussion_r263260234
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -36,107 +35,12 @@ import org.apache.spark.sql.internal.SQLConf
  * the input partition ordering requirements are met.
  */
 case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
-  private def defaultNumPreShufflePartitions: Int = conf.numShufflePartitions
-
-  private def targetPostShuffleInputSize: Long = 
conf.targetPostShuffleInputSize
-
-  private def adaptiveExecutionEnabled: Boolean = conf.adaptiveExecutionEnabled
-
-  private def minNumPostShufflePartitions: Option[Int] = {
-val minNumPostShufflePartitions = conf.minNumPostShufflePartitions
-if (minNumPostShufflePartitions > 0) Some(minNumPostShufflePartitions) 
else None
-  }
-
-  /**
-   * Adds [[ExchangeCoordinator]] to [[ShuffleExchangeExec]]s if adaptive 
query execution is enabled
-   * and partitioning schemes of these [[ShuffleExchangeExec]]s support 
[[ExchangeCoordinator]].
-   */
-  private def withExchangeCoordinator(
-  children: Seq[SparkPlan],
-  requiredChildDistributions: Seq[Distribution]): Seq[SparkPlan] = {
-val supportsCoordinator =
-  if (children.exists(_.isInstanceOf[ShuffleExchangeExec])) {
-// Right now, ExchangeCoordinator only support HashPartitionings.
-children.forall {
-  case e @ ShuffleExchangeExec(hash: HashPartitioning, _, _) => true
-  case child =>
-child.outputPartitioning match {
-  case hash: HashPartitioning => true
-  case collection: PartitioningCollection =>
-
collection.partitionings.forall(_.isInstanceOf[HashPartitioning])
-  case _ => false
-}
-}
-  } else {
-// In this case, although we do not have Exchange operators, we may 
still need to
-// shuffle data when we have more than one children because data 
generated by
-// these children may not be partitioned in the same way.
-// Please see the comment in withCoordinator for more details.
-val supportsDistribution = requiredChildDistributions.forall { dist =>
-  dist.isInstanceOf[ClusteredDistribution] || 
dist.isInstanceOf[HashClusteredDistribution]
-}
-children.length > 1 && supportsDistribution
-  }
-
-val withCoordinator =
-  if (adaptiveExecutionEnabled && supportsCoordinator) {
-val coordinator =
-  new ExchangeCoordinator(
-targetPostShuffleInputSize,
-minNumPostShufflePartitions)
-children.zip(requiredChildDistributions).map {
-  case (e: ShuffleExchangeExec, _) =>
-// This child is an Exchange, we need to add the coordinator.
-e.copy(coordinator = Some(coordinator))
-  case (child, distribution) =>
-// If this child is not an Exchange, we need to add an Exchange 
for now.
-// Ideally, we can try to avoid this Exchange. However, when we 
reach here,
-// there are at least two children operators (because if there is 
a single child
-// and we can avoid Exchange, supportsCoordinator will be false 
and we
-// will not reach here.). Although we can make two children have 
the same number of
-// post-shuffle partitions. Their numbers of pre-shuffle 
partitions may be different.
-// For example, let's say we have the following plan
-// Join
-// /  \
-//   Agg  Exchange
-//   /  \
-//Exchange  t2
-//  /
-// t1
-// In this case, because a post-shuffle partition can include 
multiple pre-shuffle
-// partitions, a HashPartitioning will not be strictly partitioned 
by the hashcodes
-// after shuffle. So, even we can use the child Exchange operator 
of the Join to
-// have a number of post-shuffle partitions that matches the 
number of partitions of
-// Agg, we cannot say these two children are partitioned in the 
same way.
-// Here is another case
-// Join
-// /  \
-//   Agg1  Agg2
-//   /  \
-//   Exchange1  Exchange2
-//   /   \
-//  t1   t2
-// In this case, two Aggs shuffle data with the same column of the 
join condition.
-// After we use ExchangeCoordinator, these two Aggs may not be 
partitioned in the same
-// way. Let's say that Agg1 and Agg2 both have 5 pre-shuffle 
partitions and 2
-// post-shuffle partitions. It is possible that Agg1 fetches those 

[GitHub] [spark] fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-03-06 Thread GitBox
fangshil commented on a change in pull request #20303: [SPARK-23128][SQL] A new 
approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#discussion_r263260234
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ##
 @@ -36,107 +35,12 @@ import org.apache.spark.sql.internal.SQLConf
  * the input partition ordering requirements are met.
  */
 case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
-  private def defaultNumPreShufflePartitions: Int = conf.numShufflePartitions
-
-  private def targetPostShuffleInputSize: Long = 
conf.targetPostShuffleInputSize
-
-  private def adaptiveExecutionEnabled: Boolean = conf.adaptiveExecutionEnabled
-
-  private def minNumPostShufflePartitions: Option[Int] = {
-val minNumPostShufflePartitions = conf.minNumPostShufflePartitions
-if (minNumPostShufflePartitions > 0) Some(minNumPostShufflePartitions) 
else None
-  }
-
-  /**
-   * Adds [[ExchangeCoordinator]] to [[ShuffleExchangeExec]]s if adaptive 
query execution is enabled
-   * and partitioning schemes of these [[ShuffleExchangeExec]]s support 
[[ExchangeCoordinator]].
-   */
-  private def withExchangeCoordinator(
-  children: Seq[SparkPlan],
-  requiredChildDistributions: Seq[Distribution]): Seq[SparkPlan] = {
-val supportsCoordinator =
-  if (children.exists(_.isInstanceOf[ShuffleExchangeExec])) {
-// Right now, ExchangeCoordinator only support HashPartitionings.
-children.forall {
-  case e @ ShuffleExchangeExec(hash: HashPartitioning, _, _) => true
-  case child =>
-child.outputPartitioning match {
-  case hash: HashPartitioning => true
-  case collection: PartitioningCollection =>
-
collection.partitionings.forall(_.isInstanceOf[HashPartitioning])
-  case _ => false
-}
-}
-  } else {
-// In this case, although we do not have Exchange operators, we may 
still need to
-// shuffle data when we have more than one children because data 
generated by
-// these children may not be partitioned in the same way.
-// Please see the comment in withCoordinator for more details.
-val supportsDistribution = requiredChildDistributions.forall { dist =>
-  dist.isInstanceOf[ClusteredDistribution] || 
dist.isInstanceOf[HashClusteredDistribution]
-}
-children.length > 1 && supportsDistribution
-  }
-
-val withCoordinator =
-  if (adaptiveExecutionEnabled && supportsCoordinator) {
-val coordinator =
-  new ExchangeCoordinator(
-targetPostShuffleInputSize,
-minNumPostShufflePartitions)
-children.zip(requiredChildDistributions).map {
-  case (e: ShuffleExchangeExec, _) =>
-// This child is an Exchange, we need to add the coordinator.
-e.copy(coordinator = Some(coordinator))
-  case (child, distribution) =>
-// If this child is not an Exchange, we need to add an Exchange 
for now.
-// Ideally, we can try to avoid this Exchange. However, when we 
reach here,
-// there are at least two children operators (because if there is 
a single child
-// and we can avoid Exchange, supportsCoordinator will be false 
and we
-// will not reach here.). Although we can make two children have 
the same number of
-// post-shuffle partitions. Their numbers of pre-shuffle 
partitions may be different.
-// For example, let's say we have the following plan
-// Join
-// /  \
-//   Agg  Exchange
-//   /  \
-//Exchange  t2
-//  /
-// t1
-// In this case, because a post-shuffle partition can include 
multiple pre-shuffle
-// partitions, a HashPartitioning will not be strictly partitioned 
by the hashcodes
-// after shuffle. So, even we can use the child Exchange operator 
of the Join to
-// have a number of post-shuffle partitions that matches the 
number of partitions of
-// Agg, we cannot say these two children are partitioned in the 
same way.
-// Here is another case
-// Join
-// /  \
-//   Agg1  Agg2
-//   /  \
-//   Exchange1  Exchange2
-//   /   \
-//  t1   t2
-// In this case, two Aggs shuffle data with the same column of the 
join condition.
-// After we use ExchangeCoordinator, these two Aggs may not be 
partitioned in the same
-// way. Let's say that Agg1 and Agg2 both have 5 pre-shuffle 
partitions and 2
-// post-shuffle partitions. It is possible that Agg1 fetches those 

[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263259691
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
+ * If a project or its child references to nested fields, and not all the 
fields
+ * in a nested attribute are used, we can substitute the by alias attributes; 
then a project
+ * of the nested fields as aliases on the children of the child will be 
created.
+ */
+object NestedColumnAliasing {
+
+  def unapply(plan: LogicalPlan)
+: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan 
match {
+case Project(_, child) if canProjectPushThrough(child) =>
+  getAliasSubMap(plan, child)
+case _ => None
+  }
+
+  /**
+   * Replace nested columns to prune unused nested columns later.
+   */
+  def replaceToAliases(
+  plan: LogicalPlan,
+  nestedFieldToAlias: Map[GetStructField, Alias],
+  attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match {
+case Project(projectList, g @ GlobalLimit(_, grandChild: LocalLimit)) =>
+  Project(
+getNewProjectList(projectList, nestedFieldToAlias),
+g.copy(child = replaceChildrenWithAliases(grandChild, attrToAliases)))
+
+case Project(projectList, child) =>
+  Project(
+getNewProjectList(projectList, nestedFieldToAlias),
+replaceChildrenWithAliases(child, attrToAliases))
+  }
+
+  /**
+   * Return a replaced project list.
+   */
+  private def getNewProjectList(
+  projectList: Seq[NamedExpression],
+  nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = {
+projectList.map(_.transform {
+  case f: GetStructField if nestedFieldToAlias.contains(f) =>
+nestedFieldToAlias(f).toAttribute
+}.asInstanceOf[NamedExpression])
+  }
+
+  /**
+   * Return a plan with new childen replaced with aliases.
+   */
+  private def replaceChildrenWithAliases(
+  plan: LogicalPlan,
+  attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
+plan.withNewChildren(plan.children.map { plan =>
+  Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
+})
+  }
+
+  /**
+   * Returns true for those operators that project can be pushed through.
+   */
+  private def canProjectPushThrough(plan: LogicalPlan) = plan match {
+case _: GlobalLimit => true
+case _: LocalLimit => true
+case _: Repartition => true
+case _: Sample => true
+case _ => false
+  }
+
+  /**
+   * Similar to [[QueryPlan.references]], but this only returns all attributes
+   * that are explicitly referenced on the root levels in a [[LogicalPlan]].
+   */
+  private def getRootReferences(plan: LogicalPlan): AttributeSet = {
+def helper(e: Expression): AttributeSet = e match {
+  case attr: AttributeReference => AttributeSet(attr)
+  case _: GetStructField => AttributeSet.empty
+  case es if es.children.nonEmpty => 
AttributeSet(es.children.flatMap(helper))
+  case _ => AttributeSet.empty
+}
+AttributeSet.fromAttributeSets(plan.expressions.map(helper))
+  }
+
+  /**
+   * Returns all the nested fields that are explicitly referenced as 
[[Expression]]
+   * in a [[LogicalPlan]]. Currently, we only support having 
[[GetStructField]] in the chain
+   * of the expressions. If the chain contains GetArrayStructFields, 
GetMapValue, or
+   * GetArrayItem, the nested field substitution will not be performed.
+   */
+  private def getNestedFieldReferences(plan: LogicalPlan): Seq[GetStructField] 
= {
+def helper(e: Expression): Seq[GetStructField] = e match {
+  

[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263257592
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
+ * If a project or its child references to nested fields, and not all the 
fields
+ * in a nested attribute are used, we can substitute the by alias attributes; 
then a project
 
 Review comment:
   typo: `substitute the` -> `substitute them`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263257374
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
 
 Review comment:
   super nit: " * This aims to handle a nested column aliasing pattern inside 
the `ColumnPruning` optimizer rule."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263257374
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
 
 Review comment:
   super nit: ` * This aims to handle a nested column aliasing pattern inside 
the ColumnPruning optimizer rule.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263257374
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
 
 Review comment:
   super nit: ` * This aims to handle a nested column aliasing pattern inside 
the `ColumnPruning` optimizer rule.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition

2019-03-06 Thread GitBox
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support 
nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r263257086
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types._
+
+/**
+ * This aims to handle nested column aliasing pattern inside `ColumnPruning` 
optimizer rule.
+ * If a project or its child references to nested fields, and not all the 
fields
+ * in a nested attribute are used, we can substitute the by alias attributes; 
then a project
+ * of the nested fields as aliases on the children of the child will be 
created.
+ */
+object NestedColumnAliasing {
+
+  def unapply(plan: LogicalPlan)
+: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan 
match {
+case Project(_, child) if canProjectPushThrough(child) =>
+  getAliasSubMap(plan, child)
+case _ => None
+  }
+
+  /**
+   * Replace nested columns to prune unused nested columns later.
+   */
+  def replaceToAliases(
+  plan: LogicalPlan,
+  nestedFieldToAlias: Map[GetStructField, Alias],
+  attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match {
+case Project(projectList, g @ GlobalLimit(_, grandChild: LocalLimit)) =>
+  Project(
+getNewProjectList(projectList, nestedFieldToAlias),
+g.copy(child = replaceChildrenWithAliases(grandChild, attrToAliases)))
+
+case Project(projectList, child) =>
+  Project(
+getNewProjectList(projectList, nestedFieldToAlias),
+replaceChildrenWithAliases(child, attrToAliases))
+  }
+
+  /**
+   * Return a replaced project list.
+   */
+  private def getNewProjectList(
+  projectList: Seq[NamedExpression],
+  nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = {
+projectList.map(_.transform {
+  case f: GetStructField if nestedFieldToAlias.contains(f) =>
+nestedFieldToAlias(f).toAttribute
+}.asInstanceOf[NamedExpression])
+  }
+
+  /**
+   * Return a plan with new childen replaced with aliases.
+   */
+  private def replaceChildrenWithAliases(
+  plan: LogicalPlan,
+  attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
+plan.withNewChildren(plan.children.map { plan =>
+  Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
+})
+  }
+
+  /**
+   * Returns true for those operators that project can be pushed through.
+   */
+  private def canProjectPushThrough(plan: LogicalPlan) = plan match {
+case _: GlobalLimit => true
+case _: LocalLimit => true
+case _: Repartition => true
+case _: Sample => true
+case _ => false
+  }
+
+  /**
+   * Similar to [[QueryPlan.references]], but this only returns all attributes
+   * that are explicitly referenced on the root levels in a [[LogicalPlan]].
+   */
+  private def getRootReferences(plan: LogicalPlan): AttributeSet = {
+def helper(e: Expression): AttributeSet = e match {
+  case attr: AttributeReference => AttributeSet(attr)
+  case _: GetStructField => AttributeSet.empty
+  case es if es.children.nonEmpty => 
AttributeSet(es.children.flatMap(helper))
+  case _ => AttributeSet.empty
+}
+AttributeSet.fromAttributeSets(plan.expressions.map(helper))
+  }
+
+  /**
+   * Returns all the nested fields that are explicitly referenced as 
[[Expression]]
+   * in a [[LogicalPlan]]. Currently, we only support having 
[[GetStructField]] in the chain
+   * of the expressions. If the chain contains GetArrayStructFields, 
GetMapValue, or
+   * GetArrayItem, the nested field substitution will not be performed.
+   */
+  private def getNestedFieldReferences(plan: LogicalPlan): Seq[GetStructField] 
= {
+def helper(e: Expression): Seq[GetStructField] = e match {
+  

[GitHub] [spark] AmplabJenkins removed a comment on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23947: 
[SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not 
in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470408131
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check 
if code size statistics is empty or not in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470408136
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103115/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23947: 
[SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not 
in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470408136
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103115/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check 
if code size statistics is empty or not in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470408131
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
SparkQA removed a comment on issue #23947: [SPARK-25863][SPARK-21871][SQL] 
Check if code size statistics is empty or not in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470361512
 
 
   **[Test build #103115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103115/testReport)**
 for PR 23947 at commit 
[`c359b8d`](https://github.com/apache/spark/commit/c359b8ddf4d52d2529d7a0b7dd24f962e128dd6e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty or not in updateAndGetCompilationStats

2019-03-06 Thread GitBox
SparkQA commented on issue #23947: [SPARK-25863][SPARK-21871][SQL] Check if 
code size statistics is empty or not in updateAndGetCompilationStats
URL: https://github.com/apache/spark/pull/23947#issuecomment-470407653
 
 
   **[Test build #103115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103115/testReport)**
 for PR 23947 at commit 
[`c359b8d`](https://github.com/apache/spark/commit/c359b8ddf4d52d2529d7a0b7dd24f962e128dd6e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470405704
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470406127
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470405610
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Jeffwan commented on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
Jeffwan commented on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client 
version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470405747
 
 
   Please hold this PR until @shaneknapp confirms. Not sure if minikube get 
updated on Jenkins side. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470405704
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002#issuecomment-470405610
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Jeffwan opened a new pull request #24002: [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2

2019-03-06 Thread GitBox
Jeffwan opened a new pull request #24002: [SPARK-26742][K8S] Update 
Kubernetes-Client version to 4.1.2
URL: https://github.com/apache/spark/pull/24002
 
 
   ## What changes were proposed in this pull request?
   https://github.com/apache/spark/pull/23814 was reverted because of Jenkins 
integration tests failed. After minikube upgrade, v1.4.2 work with kubernetes 
v1.13. We can bring this change back. 
   
   Reference: 
   [Bump Kubernetes Client Version to 
4.1.2](https://issues.apache.org/jira/browse/SPARK-26742)
   [Original PR] (https://github.com/apache/spark/pull/23814)
   [Kubernetes client upgrade for Spark 
2.4](https://github.com/apache/spark/pull/23993)
   
   
   https://issues.apache.org/jira/browse/SPARK-26742
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] prodeezy commented on issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields in DataSource Strategy

2019-03-06 Thread GitBox
prodeezy commented on issue #22573: [SPARK-25558][SQL] Pushdown predicates for 
nested fields in DataSource Strategy
URL: https://github.com/apache/spark/pull/22573#issuecomment-470403048
 
 
   @dbtsai  can you point me to the jira that tracks the maps/arrays support? 
thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on issue #22466: [SPARK-25464][SQL] Create Database to the location, only if it is empty or does not exists.

2019-03-06 Thread GitBox
sandeep-katta commented on issue #22466: [SPARK-25464][SQL] Create Database to 
the location,only if it is empty or does not exists.
URL: https://github.com/apache/spark/pull/22466#issuecomment-470403068
 
 
   @gatorsmile I have updated the SQL migration guide


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #22466: [SPARK-25464][SQL] Create Database to the location, only if it is empty or does not exists.

2019-03-06 Thread GitBox
SparkQA commented on issue #22466: [SPARK-25464][SQL] Create Database to the 
location,only if it is empty or does not exists.
URL: https://github.com/apache/spark/pull/22466#issuecomment-470403101
 
 
   **[Test build #103124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103124/testReport)**
 for PR 22466 at commit 
[`6532466`](https://github.com/apache/spark/commit/653246692eaccaec5da1e22d80ce8f7982d1161f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r263250824
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
 ##
 @@ -103,42 +43,13 @@ class ParquetSchemaPruningSuite
   Nil)
   }
 
-  testSchemaPruning("select a single complex field array and its parent struct 
array") {
-val query = sql("select friends.middle, friends from contacts where p=1")
-checkScan(query,
-  "struct>>")
-checkAnswer(query.orderBy("id"),
-  Row(Array("Z."), Array(Row("Susan", "Z.", "Smith"))) ::
-  Row(Array.empty[String], Array.empty[Row]) ::
-  Nil)
-  }
-
-  testSchemaPruning("select a single complex field from a map entry and its 
parent map entry") {
-val query =
-  sql("select relatives[\"brother\"].middle, relatives[\"brother\"] from 
contacts where p=1")
-checkScan(query,
-  
"struct>>")
-checkAnswer(query.orderBy("id"),
-  Row("Y.", Row("John", "Y.", "Doe")) ::
-  Row(null, null) ::
-  Nil)
-  }
-
   testSchemaPruning("select a single complex field and the partition column") {
 val query = sql("select name.middle, p from contacts")
 checkScan(query, "struct>")
 checkAnswer(query.orderBy("id"),
   Row("X.", 1) :: Row("Y.", 1) :: Row(null, 2) :: Row(null, 2) :: Nil)
 
 Review comment:
   Ah, sorry, it is not due to schema merging.
   
   But the inferred schema between ORC and Parquet is different. We can test it 
on current master branch like:
   
   ```scala
   withTempPath { dir =>
 val path = dir.getCanonicalPath
   
 makeDataSourceFile(contacts, new File(path + "/contacts/p=1"))
 makeDataSourceFile(briefContacts, new File(path + "/contacts/p=2"))
   
 spark.read.format(dataSourceName).load(path + 
"/contacts").createOrReplaceTempView("contacts")
 spark.sql("select * from contacts").printSchema()
   }
   ```
   
   When `dataSourceName` is parquet, the schema is:
   ```
   root 
  
|-- id: integer (nullable = true)   
  
|-- name: struct (nullable = true)  
  
||-- first: string (nullable = true)
  
||-- middle: string (nullable = true)   
  
||-- last: string (nullable = true) 
  
|-- address: string (nullable = true)   
  
|-- pets: integer (nullable = true) 
  
|-- friends: array (nullable = true)
  
||-- element: struct (containsNull = true)  
  
|||-- first: string (nullable = true)   
  
|||-- middle: string (nullable = true)  
  
|||-- last: string (nullable = true)
  
|-- relatives: map (nullable = true)
  
||-- key: string
  
||-- value: struct (valueContainsNull = true)   
  
|||-- first: string (nullable = true)   
  
|||-- middle: string (nullable = true)  
  
|||-- last: string (nullable = true)

[GitHub] [spark] felixcheungu removed a comment on issue #23521: [SPARK-26604][CORE] Clean up channel registration for StreamManager

2019-03-06 Thread GitBox
felixcheungu removed a comment on issue #23521: [SPARK-26604][CORE] Clean up 
channel registration for StreamManager
URL: https://github.com/apache/spark/pull/23521#issuecomment-470290769
 
 
   I think we need to backport this into branch-2.4 @felixcheung 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23946: [SPARK-26860][PySpark] 
[SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark 
documentation
URL: https://github.com/apache/spark/pull/23946#issuecomment-470397967
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8583/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23946: [SPARK-26860][PySpark] 
[SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark 
documentation
URL: https://github.com/apache/spark/pull/23946#issuecomment-470397959
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix 
for RangeBetween and RowsBetween docs to be in sync with spark documentation
URL: https://github.com/apache/spark/pull/23946#issuecomment-470397959
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix 
for RangeBetween and RowsBetween docs to be in sync with spark documentation
URL: https://github.com/apache/spark/pull/23946#issuecomment-470397967
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8583/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
SparkQA commented on issue #23946: [SPARK-26860][PySpark] [SparkR] Fix for 
RangeBetween and RowsBetween docs to be in sync with spark documentation
URL: https://github.com/apache/spark/pull/23946#issuecomment-470396985
 
 
   **[Test build #103123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103123/testReport)**
 for PR 23946 at commit 
[`b349544`](https://github.com/apache/spark/commit/b3495445fa1944134b59ac7b8a16115d7979ce95).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on a change in pull request #22466: [SPARK-25464][SQL] Create Database to the location, only if it is empty or does not exists.

2019-03-06 Thread GitBox
sandeep-katta commented on a change in pull request #22466: [SPARK-25464][SQL] 
Create Database to the location,only if it is empty or does not exists.
URL: https://github.com/apache/spark/pull/22466#discussion_r263247189
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
 ##
 @@ -2370,4 +2370,17 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25464 create a database with a non empty location") {
 
 Review comment:
   yes  [Test code is 
here](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L1119)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner

2019-03-06 Thread GitBox
felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] 
Refactor ArrowRRunner and RRunner to share one BaseRRunner
URL: https://github.com/apache/spark/pull/23977#discussion_r263245255
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/r/ArrowRRunner.scala
 ##
 @@ -202,14 +175,17 @@ class ArrowRRunner(
   // Likewise, there looks no way to send each batch in streaming 
format via socket
   // connection. See ARROW-4512.
   // So, it reads the whole Arrow streaming-formatted binary at 
once for now.
-  val in = new 
ByteArrayReadableSeekableByteChannel(readByteArrayData(length))
+  val buffer = new Array[Byte](length)
 
 Review comment:
   why replace `readByteArrayData`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner

2019-03-06 Thread GitBox
felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] 
Refactor ArrowRRunner and RRunner to share one BaseRRunner
URL: https://github.com/apache/spark/pull/23977#discussion_r263245012
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala
 ##
 @@ -546,14 +548,23 @@ case class FlatMapGroupsInRWithArrowExec(
 
   override protected def doExecute(): RDD[InternalRow] = {
 child.execute().mapPartitionsInternal { iter =>
-  val grouped = GroupedIterator(iter, groupingAttributes, child.output)
+  val grouped = GroupedIterator(iter, groupingAttributes, 
child.output).filter(_._2.hasNext)
   val getKey = ObjectOperator.deserializeRowToObject(keyDeserializer, 
groupingAttributes)
-  val runner = new ArrowRRunner(func, packageNames, broadcastVars, 
inputSchema,
-SQLConf.get.sessionLocalTimeZone, RRunnerModes.DATAFRAME_GAPPLY)
 
-  val groupedByRKey = grouped.map { case (key, rowIter) =>
-val newKey = rowToRBytes(getKey(key).asInstanceOf[Row])
-(newKey, rowIter)
+  // Iterating over keys is relatively cheap.
+  val keys: Iterator[Array[Byte]] =
+grouped.map { case (key, rowIter) => 
rowToRBytes(getKey(key).asInstanceOf[Row]) }
+  val groupedByRKey: Iterator[Iterator[InternalRow]] =
+grouped.map { case (key, rowIter) => rowIter }
+
+  val runner = new ArrowRRunner(func, packageNames, broadcastVars, 
inputSchema,
+SQLConf.get.sessionLocalTimeZone, RRunnerModes.DATAFRAME_GAPPLY) {
+protected override def bufferedWrite(
+dataOut: DataOutputStream)(writeFunc: ByteArrayOutputStream => 
Unit): Unit = {
+  super.bufferedWrite(dataOut)(writeFunc)
+  // Don't forget we're sending keys additionally.
+  keys.foreach(dataOut.write)
 
 Review comment:
   It's a bit hard for me to visualize... isnt each row associated with a key, 
which could be different? if I understand before it's a pairing for `(key, 
row)` or `(key, rowIter)` - how does it handle this now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner

2019-03-06 Thread GitBox
felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] 
Refactor ArrowRRunner and RRunner to share one BaseRRunner
URL: https://github.com/apache/spark/pull/23977#discussion_r263246347
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/api/r/RRunner.scala
 ##
 @@ -44,380 +35,149 @@ private[spark] class RRunner[U](
 isDataFrame: Boolean = false,
 colNames: Array[String] = null,
 mode: Int = RRunnerModes.RDD)
-  extends Logging {
-  protected var bootTime: Double = _
-  private var dataStream: DataInputStream = _
-  val readData = numPartitions match {
-case -1 =>
-  serializer match {
-case SerializationFormats.STRING => readStringData _
-case _ => readByteArrayData _
-  }
-case _ => readShuffledData _
-  }
-
-  def compute(
-  inputIterator: Iterator[_],
-  partitionIndex: Int): Iterator[U] = {
-// Timing start
-bootTime = System.currentTimeMillis / 1000.0
-
-// we expect two connections
-val serverSocket = new ServerSocket(0, 2, 
InetAddress.getByName("localhost"))
-val listenPort = serverSocket.getLocalPort()
-
-// The stdout/stderr is shared by multiple tasks, because we use one daemon
-// to launch child process as worker.
-val errThread = RRunner.createRWorker(listenPort)
-
-// We use two sockets to separate input and output, then it's easy to 
manage
-// the lifecycle of them to avoid deadlock.
-// TODO: optimize it to use one socket
-
-// the socket used to send out the input of task
-serverSocket.setSoTimeout(1)
-dataStream = try {
-  val inSocket = serverSocket.accept()
-  RRunner.authHelper.authClient(inSocket)
-  startStdinThread(inSocket.getOutputStream(), inputIterator, 
partitionIndex)
-
-  // the socket used to receive the output of task
-  val outSocket = serverSocket.accept()
-  RRunner.authHelper.authClient(outSocket)
-  val inputStream = new BufferedInputStream(outSocket.getInputStream)
-  new DataInputStream(inputStream)
-} finally {
-  serverSocket.close()
-}
-
-try {
-  newReaderIterator(dataStream, errThread)
-} catch {
-  case e: Exception =>
-throw new SparkException("R computation failed with\n " + 
errThread.getLines(), e)
-}
-  }
+  extends BaseRRunner[IN, OUT](
+func,
+deserializer,
+serializer,
+packageNames,
+broadcastVars,
+numPartitions,
+isDataFrame,
+colNames,
+mode) {
 
   protected def newReaderIterator(
-  dataStream: DataInputStream, errThread: BufferedStreamThread): 
Iterator[U] = {
-new Iterator[U] {
-  def next(): U = {
-val obj = _nextObj
-if (hasNext()) {
-  _nextObj = read()
-}
-obj
+  dataStream: DataInputStream, errThread: BufferedStreamThread): 
ReaderIterator = {
+new ReaderIterator(dataStream, errThread) {
+  private val readData = numPartitions match {
+case -1 =>
+  serializer match {
+case SerializationFormats.STRING => readStringData _
+case _ => readByteArrayData _
+  }
+case _ => readShuffledData _
   }
 
-  private var _nextObj = read()
-
-  def hasNext(): Boolean = {
-val hasMore = _nextObj != null
-if (!hasMore) {
-  dataStream.close()
+  private def readShuffledData(length: Int): (Int, Array[Byte]) = {
+length match {
+  case length if length == 2 =>
+val hashedKey = dataStream.readInt()
+val contentPairsLength = dataStream.readInt()
+val contentPairs = new Array[Byte](contentPairsLength)
+dataStream.readFully(contentPairs)
+(hashedKey, contentPairs)
+  case _ => null
 }
-hasMore
   }
-}
-  }
 
-  protected def writeData(
-  dataOut: DataOutputStream,
-  printOut: PrintStream,
-  iter: Iterator[_]): Unit = {
-def writeElem(elem: Any): Unit = {
-  if (deserializer == SerializationFormats.BYTE) {
-val elemArr = elem.asInstanceOf[Array[Byte]]
-dataOut.writeInt(elemArr.length)
-dataOut.write(elemArr)
-  } else if (deserializer == SerializationFormats.ROW) {
-dataOut.write(elem.asInstanceOf[Array[Byte]])
-  } else if (deserializer == SerializationFormats.STRING) {
-// write string(for StringRRDD)
-// scalastyle:off println
-printOut.println(elem)
-// scalastyle:on println
+  private def readByteArrayData(length: Int): Array[Byte] = {
+length match {
+  case length if length > 0 =>
+val obj = new Array[Byte](length)
+dataStream.readFully(obj)
+obj
+  case _ => null
+}
   }
-}
 
-for (elem <- iter) {
-  elem match {
-case (key, innerIter: Iterator[_]) =>
-  for (innerElem <- innerIter) {
-writeElem(innerElem)
-  }
- 

[GitHub] [spark] felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner

2019-03-06 Thread GitBox
felixcheung commented on a change in pull request #23977: [SPARK-26923][SQL][R] 
Refactor ArrowRRunner and RRunner to share one BaseRRunner
URL: https://github.com/apache/spark/pull/23977#discussion_r263244744
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala
 ##
 @@ -546,14 +548,23 @@ case class FlatMapGroupsInRWithArrowExec(
 
   override protected def doExecute(): RDD[InternalRow] = {
 child.execute().mapPartitionsInternal { iter =>
-  val grouped = GroupedIterator(iter, groupingAttributes, child.output)
+  val grouped = GroupedIterator(iter, groupingAttributes, 
child.output).filter(_._2.hasNext)
   val getKey = ObjectOperator.deserializeRowToObject(keyDeserializer, 
groupingAttributes)
-  val runner = new ArrowRRunner(func, packageNames, broadcastVars, 
inputSchema,
-SQLConf.get.sessionLocalTimeZone, RRunnerModes.DATAFRAME_GAPPLY)
 
-  val groupedByRKey = grouped.map { case (key, rowIter) =>
-val newKey = rowToRBytes(getKey(key).asInstanceOf[Row])
-(newKey, rowIter)
+  // Iterating over keys is relatively cheap.
+  val keys: Iterator[Array[Byte]] =
+grouped.map { case (key, rowIter) => 
rowToRBytes(getKey(key).asInstanceOf[Row]) }
+  val groupedByRKey: Iterator[Iterator[InternalRow]] =
+grouped.map { case (key, rowIter) => rowIter }
+
+  val runner = new ArrowRRunner(func, packageNames, broadcastVars, 
inputSchema,
+SQLConf.get.sessionLocalTimeZone, RRunnerModes.DATAFRAME_GAPPLY) {
+protected override def bufferedWrite(
+dataOut: DataOutputStream)(writeFunc: ByteArrayOutputStream => 
Unit): Unit = {
+  super.bufferedWrite(dataOut)(writeFunc)
+  // Don't forget we're sending keys additionally.
+  keys.foreach(dataOut.write)
 
 Review comment:
   so we expect `keys` is small?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #22466: [SPARK-25464][SQL] Create Database to the location, only if it is empty or does not exists.

2019-03-06 Thread GitBox
SparkQA commented on issue #22466: [SPARK-25464][SQL] Create Database to the 
location,only if it is empty or does not exists.
URL: https://github.com/apache/spark/pull/22466#issuecomment-470395577
 
 
   **[Test build #103122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103122/testReport)**
 for PR 22466 at commit 
[`6f4c9ce`](https://github.com/apache/spark/commit/6f4c9ce71a4d5951e3306c047263b37dfeb657db).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] codeborui commented on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
codeborui commented on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470395593
 
 
   @xuanyuanking 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470394959
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470394070
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470395080
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470394959
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001#issuecomment-470394070
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470393882
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470393882
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470393884
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103114/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#issuecomment-470393759
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#issuecomment-470393765
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8582/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470393884
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103114/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] codeborui opened a new pull request #24001: [SPARK-27080][SQL]bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-06 Thread GitBox
codeborui opened a new pull request #24001: [SPARK-27080][SQL]bug fix: 
mergeWithMetastoreSchema with uniform lower case comparison
URL: https://github.com/apache/spark/pull/24001
 
 
   (cherry picked from commit f47a765)
   
   ## What changes were proposed in this pull request?
   
   (Please fill in changes proposed in this fix)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23943: [SPARK-27034][SQL] Nested schema 
pruning for ORC
URL: https://github.com/apache/spark/pull/23943#issuecomment-470393765
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8582/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23943: [SPARK-27034][SQL] Nested schema 
pruning for ORC
URL: https://github.com/apache/spark/pull/23943#issuecomment-470393759
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
SparkQA removed a comment on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470334837
 
 
   **[Test build #103114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103114/testReport)**
 for PR 23996 at commit 
[`1f977a5`](https://github.com/apache/spark/commit/1f977a5e5404c01325ef08473f58ed0c49e8be6e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23996: [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)

2019-03-06 Thread GitBox
SparkQA commented on issue #23996: [MINOR][BUILD] Add 2 maven 
properties(hive.classifier and hive.parquet.group)
URL: https://github.com/apache/spark/pull/23996#issuecomment-470393474
 
 
   **[Test build #103114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103114/testReport)**
 for PR 23996 at commit 
[`1f977a5`](https://github.com/apache/spark/commit/1f977a5e5404c01325ef08473f58ed0c49e8be6e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on issue #23521: [SPARK-26604][CORE] Clean up channel registration for StreamManager

2019-03-06 Thread GitBox
viirya commented on issue #23521: [SPARK-26604][CORE] Clean up channel 
registration for StreamManager
URL: https://github.com/apache/spark/pull/23521#issuecomment-470392984
 
 
   If it can't be directly merged to branch-2.4, I can make a PR for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
SparkQA commented on issue #23943: [SPARK-27034][SQL] Nested schema pruning for 
ORC
URL: https://github.com/apache/spark/pull/23943#issuecomment-470392739
 
 
   **[Test build #103121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103121/testReport)**
 for PR 23943 at commit 
[`8ac4aed`](https://github.com/apache/spark/commit/8ac4aed746aae252d17c20474bf29edddb8b6ca9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should not be supported for stream-stream Outer Joins

2019-03-06 Thread GitBox
sandeep-katta commented on issue #22824: [SPARK-25834] [Structured 
Streaming]Update Mode should not be supported for stream-stream Outer Joins
URL: https://github.com/apache/spark/pull/22824#issuecomment-470391935
 
 
   @koeninger @jose-torres, any comments need to be handled further ??


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on issue #23521: [SPARK-26604][CORE] Clean up channel registration for StreamManager

2019-03-06 Thread GitBox
felixcheung commented on issue #23521: [SPARK-26604][CORE] Clean up channel 
registration for StreamManager
URL: https://github.com/apache/spark/pull/23521#issuecomment-470391521
 
 
   I think we need to backport this into branch-2.4


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on issue #23842: [SPARK-26927]Fix race condition may cause dynamic allocation not working

2019-03-06 Thread GitBox
felixcheung commented on issue #23842: [SPARK-26927]Fix race condition may 
cause dynamic allocation not working
URL: https://github.com/apache/spark/pull/23842#issuecomment-470391381
 
 
   FYI, Jenkins hasn't run on this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] felixcheung commented on issue #23975: [SPARK-27056][MESOS]Remove start-shuffle-service.sh

2019-03-06 Thread GitBox
felixcheung commented on issue #23975: [SPARK-27056][MESOS]Remove  
start-shuffle-service.sh
URL: https://github.com/apache/spark/pull/23975#issuecomment-470390980
 
 
   @tnachen @susanxhuynh do you know about this for mesos?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chandulal edited a comment on issue #23990: [SPARK-27061][K8S] Expose Driver UI port on driver service to access …

2019-03-06 Thread GitBox
chandulal edited a comment on issue #23990: [SPARK-27061][K8S] Expose Driver UI 
port on driver service to access …
URL: https://github.com/apache/spark/pull/23990#issuecomment-470349577
 
 
   > IIRC a headless service doesn't expose the port outside the k8s cluster, 
right?
   > 
   > If that's the case, in which situations is this useful?
   
   
   
   > IIRC a headless service doesn't expose the port outside the k8s cluster, 
right?
   > 
   > If that's the case, in which situations is this useful?
   
   Yes, It won't expose the port outside of the cluster. 
   
   Our use-case is we have users who submit spark jobs to Kubernetes cluster, 
but they don't have Kubernetes background and access to the cluster. But, we 
need to provide the spark job logs for the debugging. There are two ways to do 
that:
   1)  Port forward driver pod on 4040. (This won't work, because users don't 
have access to Kubernetes cluster)
   2)  Expose 4040 port on driver service. Then, we can easily relay these logs 
to UI using Nginx reverse proxy on all driver services.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vanzin commented on a change in pull request #23842: [SPARK-26927]Fix race condition may cause dynamic allocation not working

2019-03-06 Thread GitBox
vanzin commented on a change in pull request #23842: [SPARK-26927]Fix race 
condition may cause dynamic allocation not working
URL: https://github.com/apache/spark/pull/23842#discussion_r263240842
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##
 @@ -725,10 +740,15 @@ private[spark] class ExecutorAllocationManager(
 if (stageIdToNumRunningTask.contains(stageId)) {
   stageIdToNumRunningTask(stageId) += 1
 }
-// This guards against the race condition in which the 
`SparkListenerTaskStart`
-// event is posted before the `SparkListenerBlockManagerAdded` event, 
which is
-// possible because these events are posted in different threads. (see 
SPARK-4951)
-if (!allocationManager.executorIds.contains(executorId)) {
+// This guards against the following race condition:
+// 1. The `SparkListenerTaskStart` event is posted before the
+// `SparkListenerExecutorAdded` event
+// 2. The `SparkListenerExecutorRemoved` event is posted before the
+// `SparkListenerTaskStart` event
+// Above cases are possible because these events are posted in 
different threads.
+// (see SPARK-4951 SPARK-26927)
+if (!allocationManager.executorIds.contains(executorId) &&
+  !allocationManager.removedExecutorIds.contains(executorId)) {
 
 Review comment:
   You didn't understand my suggestion.
   
   I'm asking you to, instead of keeping a list of executors that have been 
removed here, just as the scheduler about whether that executor exist. The 
scheduler keeps the authoritative lists of executors.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chandulal commented on a change in pull request #23990: [SPARK-27061][K8S] Expose Driver UI port on driver service to access …

2019-03-06 Thread GitBox
chandulal commented on a change in pull request #23990: [SPARK-27061][K8S] 
Expose Driver UI port on driver service to access …
URL: https://github.com/apache/spark/pull/23990#discussion_r263240701
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala
 ##
 @@ -55,6 +56,9 @@ private[spark] class DriverServiceFeatureStep(
   private val driverBlockManagerPort = kubernetesConf.sparkConf.getInt(
 config.DRIVER_BLOCK_MANAGER_PORT.key, DEFAULT_BLOCKMANAGER_PORT)
 
+  val driverUIPort = SparkUI.getUIPort(kubernetesConf
+.sparkConf)
 
 Review comment:
   Made these changes and pushed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chandulal commented on a change in pull request #23990: [SPARK-27061][K8S] Expose Driver UI port on driver service to access …

2019-03-06 Thread GitBox
chandulal commented on a change in pull request #23990: [SPARK-27061][K8S] 
Expose Driver UI port on driver service to access …
URL: https://github.com/apache/spark/pull/23990#discussion_r263240684
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ##
 @@ -45,6 +45,7 @@ private[spark] object Constants {
   // Default and fixed ports
   val DEFAULT_DRIVER_PORT = 7078
   val DEFAULT_BLOCKMANAGER_PORT = 7079
+  val DEFAULT_UI_PORT = 4040
 
 Review comment:
   Made these changes and pushed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23943: [SPARK-27034][SQL] Nested 
schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r263240456
 
 

 ##
 File path: sql/core/benchmarks/OrcNestedSchemaPruningBenchmark-results.txt
 ##
 @@ -6,35 +6,35 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 
10.14.3
 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
 Selection:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Top-level column113196 
 89  8.8 113.0   1.0X
-Nested column  1316   1639 
240  0.81315.5   0.1X
+Top-level column116151 
 36  8.6 116.3   1.0X
+Nested column   544604 
 31  1.8 544.5   0.2X
 
 Review comment:
   I think the read performance is more determined by persistent library than 
Spark side here. As Parquet, at Spark side we provide correctly pruned nested 
schema to ORC library. Pruning nested fields when reading data is done by ORC 
library. I'm not sure if we have much room to optimize at Spark side for this. 
Of course I'm open to any suggestion I'm missing right now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

2019-03-06 Thread GitBox
HyukjinKwon commented on a change in pull request #23946: 
[SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be 
in sync with spark documentation
URL: https://github.com/apache/spark/pull/23946#discussion_r263240174
 
 

 ##
 File path: python/pyspark/sql/window.py
 ##
 @@ -97,6 +97,33 @@ def rowsBetween(start, end):
 and ``Window.currentRow`` to specify special boundary values, rather 
than using integral
 values directly.
 
+A row based boundary is based on the position of the row within the 
partition.
+An offset indicates the number of rows above or below the current row, 
the frame for the
+current row starts or ends. For instance, given a row based sliding 
frame with a lower bound
+offset of -1 and a upper bound offset of +2. The frame for row with 
index 5 would range from
+index 4 to index 6.
+
+>>> from pyspark.sql import Window
+>>> from pyspark.sql import functions as func
+>>> from pyspark.sql import SQLContext
+>>> sc = SparkContext.getOrCreate()
+>>> sqlContext = SQLContext(sc)
+>>> tup = [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")]
+>>> df = sqlContext.createDataFrame(tup, ["id", "category"])
+>>> window = 
Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
+>>> df.withColumn("sum", func.sum("id").over(window)).show()
++---++---+
+| id|category|sum|
++---++---+
+|  1|   b|  3|
+|  2|   b|  5|
+|  3|   b|  3|
+|  1|   a|  2|
+|  1|   a|  3|
+|  2|   a|  2|
++---++---+
+
 
 Review comment:
   You can change the doctest running codes from:
   
   ```python
   import doctest
   SparkContext('local[4]', 'PythonTest')
   (failure_count, test_count) = doctest.testmod()
   ```
   
   to:
   
   ```python
   import doctest
   import pyspark.sql.window
   
   SparkContext('local[4]', 'PythonTest')
   globs = pyspark.sql.window.__dict__.copy()
   (failure_count, test_count) = doctest.testmod(
   pyspark.sql.window, globs=globs,
   optionflags=doctest.NORMALIZE_WHITESPACE))
   ```
   
   so that:
   
   1. it doesn't need to add ``
   2. when the tests are skipped, it shows the correct fully qualified module 
names like `pyspark.sql.window...`, rather then `__main__. ...`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-03-06 Thread GitBox
SparkQA commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange 
thread pool configurable
URL: https://github.com/apache/spark/pull/23670#issuecomment-470386400
 
 
   **[Test build #103120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103120/testReport)**
 for PR 23670 at commit 
[`27b0ec4`](https://github.com/apache/spark/commit/27b0ec46ad7e0ffae096c3520359c86e56ca6ebc).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #23915: [SPARK-24252][SQL] Add v2 catalog plugin system

2019-03-06 Thread GitBox
viirya commented on a change in pull request #23915: [SPARK-24252][SQL] Add v2 
catalog plugin system
URL: https://github.com/apache/spark/pull/23915#discussion_r263238897
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
 ##
 @@ -620,6 +622,12 @@ class SparkSession private(
*/
   @transient lazy val catalog: Catalog = new CatalogImpl(self)
 
+  @transient private lazy val catalogs = new mutable.HashMap[String, 
CatalogPlugin]()
+
+  private[sql] def catalog(name: String): CatalogPlugin = synchronized {
+catalogs.getOrElseUpdate(name, Catalogs.load(name, sessionState.conf))
 
 Review comment:
   Looks like we don't support unloading or reloading a catalog plugin?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-03-06 Thread GitBox
viirya commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange 
thread pool configurable
URL: https://github.com/apache/spark/pull/23670#issuecomment-470385336
 
 
   retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #23982: [SQL][MINOR] Reconcile the join types between data frame and sql interface

2019-03-06 Thread GitBox
maropu commented on issue #23982: [SQL][MINOR] Reconcile the join types between 
data frame and sql interface
URL: https://github.com/apache/spark/pull/23982#issuecomment-470384537
 
 
   WDYT? @dongjoon-hyun @gatorsmile @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] caneGuy commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-03-06 Thread GitBox
caneGuy commented on issue #23670: [SPARK-26601][SQL] Make broadcast-exchange 
thread pool configurable
URL: https://github.com/apache/spark/pull/23670#issuecomment-470384498
 
 
   Sorry for bothering @viirya but the failed case i think is still not related 
with this pr.
   Any suggestion?Thanks
   `org.scalatest.exceptions.TestFailedException: 
scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, 
scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
 was false
   Stacktrace
   sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, 
scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
 was false
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] 
Remove the Calcite dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470384269
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103112/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] 
Remove the Calcite dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470384265
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the 
Calcite dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470384269
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103112/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the 
Calcite dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470384265
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
SparkQA removed a comment on issue #23970: [SPARK-27054][BUILD][SQL] Remove the 
Calcite dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470325496
 
 
   **[Test build #103112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103112/testReport)**
 for PR 23970 at commit 
[`66a1704`](https://github.com/apache/spark/commit/66a1704a6436c60a16f2d4daae27c333e73358e3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-06 Thread GitBox
SparkQA commented on issue #23970: [SPARK-27054][BUILD][SQL] Remove the Calcite 
dependency
URL: https://github.com/apache/spark/pull/23970#issuecomment-470383916
 
 
   **[Test build #103112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103112/testReport)**
 for PR 23970 at commit 
[`66a1704`](https://github.com/apache/spark/commit/66a1704a6436c60a16f2d4daae27c333e73358e3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] caneGuy commented on issue #23826: [SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only want to use default pool and do not set spark.scheduler.pool for the session

2019-03-06 Thread GitBox
caneGuy commented on issue #23826: [SPARK-26914][SQL] Fix scheduler pool may be 
unpredictable when we only want to use default pool and do not set 
spark.scheduler.pool for the session
URL: https://github.com/apache/spark/pull/23826#issuecomment-470383325
 
 
   No with two non-default pool, it can also has some case that we want to 
submit to 'A' but always be submitted to 'B' @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] liupc commented on a change in pull request #23842: [SPARK-26927]Fix race condition may cause dynamic allocation not working

2019-03-06 Thread GitBox
liupc commented on a change in pull request #23842: [SPARK-26927]Fix race 
condition may cause dynamic allocation not working
URL: https://github.com/apache/spark/pull/23842#discussion_r263236018
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##
 @@ -725,10 +740,15 @@ private[spark] class ExecutorAllocationManager(
 if (stageIdToNumRunningTask.contains(stageId)) {
   stageIdToNumRunningTask(stageId) += 1
 }
-// This guards against the race condition in which the 
`SparkListenerTaskStart`
-// event is posted before the `SparkListenerBlockManagerAdded` event, 
which is
-// possible because these events are posted in different threads. (see 
SPARK-4951)
-if (!allocationManager.executorIds.contains(executorId)) {
+// This guards against the following race condition:
+// 1. The `SparkListenerTaskStart` event is posted before the
+// `SparkListenerExecutorAdded` event
+// 2. The `SparkListenerExecutorRemoved` event is posted before the
+// `SparkListenerTaskStart` event
+// Above cases are possible because these events are posted in 
different threads.
+// (see SPARK-4951 SPARK-26927)
+if (!allocationManager.executorIds.contains(executorId) &&
+  !allocationManager.removedExecutorIds.contains(executorId)) {
 
 Review comment:
   Or we change the `AsyncEventQueue` for executor management to a SyncQueue  
which directly calls listener's callbacks.
   e.g:
   ```
   class SyncQueue extends SparkListenerBus {
   
   def post(event: SparkListenerEvent): Unit = {
   super.postToAll(event)
   
   }
   
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
SparkQA commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove 
useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381933
 
 
   **[Test build #103119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103119/testReport)**
 for PR 24000 at commit 
[`389456a`](https://github.com/apache/spark/commit/389456a026fb3e2f64e8aefc2111c2be2f2ff5d8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix 
typo & Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381656
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sujith71955 commented on issue #23921: [SPARK-26560][SQL]:Repeat select on HiveUDF fails

2019-03-06 Thread GitBox
sujith71955 commented on issue #23921: [SPARK-26560][SQL]:Repeat select on 
HiveUDF fails
URL: https://github.com/apache/spark/pull/23921#issuecomment-470381787
 
 
   cc @srowen @HyukjinKwon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix 
typo & Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381658
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8581/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & 
Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381658
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/8581/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
AmplabJenkins commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & 
Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381656
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
AmplabJenkins removed a comment on issue #24000: [SPARK-27079][MINOR][SQL] Fix 
typo & Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470377580
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & Remove useless imports & Add missing `override` annotation

2019-03-06 Thread GitBox
HyukjinKwon commented on issue #24000: [SPARK-27079][MINOR][SQL] Fix typo & 
Remove useless imports & Add missing `override` annotation
URL: https://github.com/apache/spark/pull/24000#issuecomment-470381465
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on issue #23982: [SQL][MINOR] Reconcile the join types between data frame and sql interface

2019-03-06 Thread GitBox
dilipbiswal commented on issue #23982: [SQL][MINOR] Reconcile the join types 
between data frame and sql interface
URL: https://github.com/apache/spark/pull/23982#issuecomment-470380969
 
 
   @maropu OK.. I will do the 2nd option. Thank you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] liupc commented on a change in pull request #23842: [SPARK-26927]Fix race condition may cause dynamic allocation not working

2019-03-06 Thread GitBox
liupc commented on a change in pull request #23842: [SPARK-26927]Fix race 
condition may cause dynamic allocation not working
URL: https://github.com/apache/spark/pull/23842#discussion_r263231449
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##
 @@ -725,10 +740,15 @@ private[spark] class ExecutorAllocationManager(
 if (stageIdToNumRunningTask.contains(stageId)) {
   stageIdToNumRunningTask(stageId) += 1
 }
-// This guards against the race condition in which the 
`SparkListenerTaskStart`
-// event is posted before the `SparkListenerBlockManagerAdded` event, 
which is
-// possible because these events are posted in different threads. (see 
SPARK-4951)
-if (!allocationManager.executorIds.contains(executorId)) {
+// This guards against the following race condition:
+// 1. The `SparkListenerTaskStart` event is posted before the
+// `SparkListenerExecutorAdded` event
+// 2. The `SparkListenerExecutorRemoved` event is posted before the
+// `SparkListenerTaskStart` event
+// Above cases are possible because these events are posted in 
different threads.
+// (see SPARK-4951 SPARK-26927)
+if (!allocationManager.executorIds.contains(executorId) &&
+  !allocationManager.removedExecutorIds.contains(executorId)) {
 
 Review comment:
   @vanzin 
   
   > I don't believe the scheduler would post a task start if the executor was 
not tracked internally already.
   
   ~~It can happen, because now the `ListenerBus` works with several 
`AsyncEventQueue`, the `allocationManager.executorIds` differs from 
`CoarseGrainedSchedulerBackend.exectuorDataMap` in that 
`allocationManager.executorIds` is added when `SparkListenerExecutorAdded` 
polled and processed in `AsyncEventQueue`.
   So executor allocation is actually not kept sync with task scheduling for 
now.~~
   
   I looked closer, seems it's because we now post a task start and then call 
the makeOffers immediately:
   ```
 listenerBus.post(
   SparkListenerExecutorAdded(System.currentTimeMillis(), 
executorId, data))
 makeOffers()
   ```
   
https://github.com/apache/spark/blob/a0e26cffc5c0163a780e86d37183b0cebbc7b24c/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L230
   
   Maybe we should use a sync way to post this `SparkListenerExecutorAdded` 
events for `appManagement` queue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on issue #23995: [MINOR][SQL][TEST] Include usage example for generating output for single test in SQLQueryTestSuite

2019-03-06 Thread GitBox
dilipbiswal commented on issue #23995: [MINOR][SQL][TEST] Include usage example 
for generating output for single test in SQLQueryTestSuite
URL: https://github.com/apache/spark/pull/23995#issuecomment-470380454
 
 
   Thanks a lot @HyukjinKwon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >