[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

2018-01-18 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20091
  
Thanks for coding it up @jiangxb1987 !

So if I understand it correctly, the requirements where the PR helps with 
are :
* Max partitioner is not eligible since it is atleast an order smaller, and
* User has explicitly set 'spark.default.parallelism', and
* Value of 'spark.default.parallelism' is lower than max partitioner
** Since max partitioner was discarded due to being atleast an order 
smaller, default parallelism is worse - even though user specified.

Does it impact any other usecase or flow ? I want to make sure I am not 
missing anything.
If strictly this, then I agree that the PR makes sense. It is a fairly 
suboptimal situation which we are hopefully not worsening - even if we are 
ignoring user specified value (by relying on existing behavior :-) )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [WIP][SPARK-23011] Support alternative function form wit...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20295
  
How do we turn a single group column to a series? just repeat the group 
column?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20177
  
can you fix the test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20026
  
cc @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19862
  
I don't agree this is a small change, and users using spark prior to 2.0 
won't get this patch, as we don't backport performance improvement patches.

Overall this patch won't bring much benifit to Spark and may not worth the 
time to review it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-18 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162552121
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -67,31 +69,32 @@ object Partitioner {
   None
 }
 
-if (isEligiblePartitioner(hasMaxPartitioner, rdds)) {
+val defaultNumPartitions = if 
(rdd.context.conf.contains("spark.default.parallelism")) {
+  rdd.context.defaultParallelism
+} else {
+  rdds.map(_.partitions.length).max
+}
+
+// If the existing max partitioner is an eligible one, or its 
partitions number is larger
+// than the default number of partitions, use the existing partitioner.
+if (hasMaxPartitioner.nonEmpty && 
(isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {
--- End diff --

> It depends on how you define "default".

I dont see an ambiguity here - am I missing something ?
To rephrase my point - this proposed PR has an impact only if user has 
explicitly set 'spark.default.parallelism' - else it is a noop.

What is the concern here ? Users have set incorrect values for 
spark.default.parallelism ?

> If we want to respect spark.default.parallelism strictly, we should not 
reuse partitioner at all.

I agree with you - we should not have - except that ship has sailed long 
long time back - since atleast 0.5 this has been the behavior in spark - I dont 
have context before that.
Historically, default parallelism was added later - using "largest 
partitioner if set or largest partition size when no partitioner is set" was 
the behavior. When default parallelism was introduced, probably (I guess) for 
backward compatible,  the behavior was continued.


#20002 surgically fixed only the case when inferred partition size was off 
by atleast an order.
When it is off by an order *and* if user has explicitly specified 
spark.default.parallelism, rely on user provided value - else preserve existing 
behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-18 Thread fjh100456
Github user fjh100456 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162551602
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 500
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => 
ParquetOptions.shortParquetCompressionCodecNames(name).name()
+  case "orc" => OrcOptions.shortOrcCompressionCodecNames(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter{ file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+   

[GitHub] spark issue #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push epoch u...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19583
  
also cc @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162551462
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 500
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => 
ParquetOptions.shortParquetCompressionCodecNames(name).name()
+  case "orc" => OrcOptions.shortOrcCompressionCodecNames(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter{ file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+  

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162551351
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 500
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => 
ParquetOptions.shortParquetCompressionCodecNames(name).name()
+  case "orc" => OrcOptions.shortOrcCompressionCodecNames(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter{ file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+  

[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19054#discussion_r162551159
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ---
@@ -220,45 +220,76 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 operator.withNewChildren(children)
   }
 
+  private def isSubset(biggerSet: Seq[Expression], smallerSet: 
Seq[Expression]): Boolean =
+smallerSet.length <= biggerSet.length &&
+  smallerSet.forall(x => biggerSet.exists(_.semanticEquals(x)))
+
   private def reorder(
   leftKeys: Seq[Expression],
   rightKeys: Seq[Expression],
-  expectedOrderOfKeys: Seq[Expression],
-  currentOrderOfKeys: Seq[Expression]): (Seq[Expression], 
Seq[Expression]) = {
-val leftKeysBuffer = ArrayBuffer[Expression]()
-val rightKeysBuffer = ArrayBuffer[Expression]()
+  expectedOrderOfKeys: Seq[Expression], // comes from child's output 
partitioning
+  currentOrderOfKeys: Seq[Expression]): // comes from join predicate
+  (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = {
+
+assert(leftKeys.length == rightKeys.length)
+
+val allLeftKeys = ArrayBuffer[Expression]()
+val allRightKeys = ArrayBuffer[Expression]()
+val reorderedLeftKeys = ArrayBuffer[Expression]()
+val reorderedRightKeys = ArrayBuffer[Expression]()
+val processedIndicies = mutable.Set[Int]()
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = currentOrderOfKeys.indexWhere(e => 
e.semanticEquals(expression))
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  val index = currentOrderOfKeys.zipWithIndex.find { case (currKey, i) 
=>
+!processedIndicies.contains(i) && 
currKey.semanticEquals(expression)
+  }.get._2
+  processedIndicies.add(index)
+
+  reorderedLeftKeys.append(leftKeys(index))
+  allLeftKeys.append(leftKeys(index))
+
+  reorderedRightKeys.append(rightKeys(index))
+  allRightKeys.append(rightKeys(index))
 })
-(leftKeysBuffer, rightKeysBuffer)
+
+// If len(currentOrderOfKeys) > len(expectedOrderOfKeys), then the 
re-ordering won't have
+// all the keys. Append the remaining keys to the end so that we are 
covering all the keys
+for (i <- leftKeys.indices) {
+  if (!processedIndicies.contains(i)) {
+allLeftKeys.append(leftKeys(i))
+allRightKeys.append(rightKeys(i))
+  }
+}
+
+assert(allLeftKeys.length == leftKeys.length)
+assert(allRightKeys.length == rightKeys.length)
+assert(reorderedLeftKeys.length == reorderedRightKeys.length)
+
+(allLeftKeys, reorderedLeftKeys, allRightKeys, reorderedRightKeys)
   }
 
   private def reorderJoinKeys(
   leftKeys: Seq[Expression],
   rightKeys: Seq[Expression],
   leftPartitioning: Partitioning,
-  rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) 
= {
+  rightPartitioning: Partitioning):
+  (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = {
+
 if (leftKeys.forall(_.deterministic) && 
rightKeys.forall(_.deterministic)) {
   leftPartitioning match {
-case HashPartitioning(leftExpressions, _)
-  if leftExpressions.length == leftKeys.length &&
-leftKeys.forall(x => 
leftExpressions.exists(_.semanticEquals(x))) =>
+case HashPartitioning(leftExpressions, _) if isSubset(leftKeys, 
leftExpressions) =>
   reorder(leftKeys, rightKeys, leftExpressions, leftKeys)
--- End diff --

if `leftPartitioning` is `HashPartitioning`, we don't need to care about 
`rightPartitioning` at all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19054#discussion_r162550613
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ---
@@ -220,45 +220,76 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 operator.withNewChildren(children)
   }
 
+  private def isSubset(biggerSet: Seq[Expression], smallerSet: 
Seq[Expression]): Boolean =
+smallerSet.length <= biggerSet.length &&
+  smallerSet.forall(x => biggerSet.exists(_.semanticEquals(x)))
+
   private def reorder(
   leftKeys: Seq[Expression],
   rightKeys: Seq[Expression],
-  expectedOrderOfKeys: Seq[Expression],
-  currentOrderOfKeys: Seq[Expression]): (Seq[Expression], 
Seq[Expression]) = {
-val leftKeysBuffer = ArrayBuffer[Expression]()
-val rightKeysBuffer = ArrayBuffer[Expression]()
+  expectedOrderOfKeys: Seq[Expression], // comes from child's output 
partitioning
+  currentOrderOfKeys: Seq[Expression]): // comes from join predicate
+  (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = {
+
+assert(leftKeys.length == rightKeys.length)
+
+val allLeftKeys = ArrayBuffer[Expression]()
+val allRightKeys = ArrayBuffer[Expression]()
+val reorderedLeftKeys = ArrayBuffer[Expression]()
+val reorderedRightKeys = ArrayBuffer[Expression]()
+val processedIndicies = mutable.Set[Int]()
 
 expectedOrderOfKeys.foreach(expression => {
-  val index = currentOrderOfKeys.indexWhere(e => 
e.semanticEquals(expression))
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
+  val index = currentOrderOfKeys.zipWithIndex.find { case (currKey, i) 
=>
+!processedIndicies.contains(i) && 
currKey.semanticEquals(expression)
+  }.get._2
+  processedIndicies.add(index)
+
+  reorderedLeftKeys.append(leftKeys(index))
+  allLeftKeys.append(leftKeys(index))
+
+  reorderedRightKeys.append(rightKeys(index))
+  allRightKeys.append(rightKeys(index))
 })
-(leftKeysBuffer, rightKeysBuffer)
+
+// If len(currentOrderOfKeys) > len(expectedOrderOfKeys), then the 
re-ordering won't have
+// all the keys. Append the remaining keys to the end so that we are 
covering all the keys
+for (i <- leftKeys.indices) {
+  if (!processedIndicies.contains(i)) {
+allLeftKeys.append(leftKeys(i))
+allRightKeys.append(rightKeys(i))
+  }
+}
+
+assert(allLeftKeys.length == leftKeys.length)
+assert(allRightKeys.length == rightKeys.length)
+assert(reorderedLeftKeys.length == reorderedRightKeys.length)
+
+(allLeftKeys, reorderedLeftKeys, allRightKeys, reorderedRightKeys)
   }
 
   private def reorderJoinKeys(
   leftKeys: Seq[Expression],
   rightKeys: Seq[Expression],
   leftPartitioning: Partitioning,
-  rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) 
= {
+  rightPartitioning: Partitioning):
+  (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = {
--- End diff --

We should add some documentation to explain what the return value is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86374/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #86374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86374/testReport)**
 for PR 19175 at commit 
[`d1133ca`](https://github.com/apache/spark/commit/d1133caf92fee9201f9637b828c1e6a52d715eff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-18 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162550217
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -48,14 +48,16 @@ public synchronized void disconnect() {
 
   @Override
   public synchronized void kill() {
-disconnect();
-if (childProc != null) {
-  if (childProc.isAlive()) {
-childProc.destroyForcibly();
+if (!isDisposed()) {
+  setState(State.KILLED);
--- End diff --

+1,I have the same question in last review. We should figure it out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-18 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162549759
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -261,37 +263,93 @@ private[spark] class MemoryStore(
   // If this task attempt already owns more unroll memory than is 
necessary to store the
   // block, then release the extra memory that will not be used.
   val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
-  releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, 
excessUnrollMemory)
+  releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory)
   transferUnrollToStorage(size)
   true
 }
   }
+
   if (enoughStorageMemory) {
 entries.synchronized {
-  entries.put(blockId, entry)
+  entries.put(blockId, createMemoryEntry())
 }
 logInfo("Block %s stored as values in memory (estimated size %s, 
free %s)".format(
   blockId, Utils.bytesToString(size), 
Utils.bytesToString(maxMemory - blocksMemoryUsed)))
 Right(size)
   } else {
 assert(currentUnrollMemoryForThisTask >= 
unrollMemoryUsedByThisBlock,
   "released too much unroll memory")
+Left(unrollMemoryUsedByThisBlock)
+  }
+} else {
+  Left(unrollMemoryUsedByThisBlock)
+}
+  }
+
+  /**
+   * Attempt to put the given block in memory store as values.
+   *
+   * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
+   * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
+   * whether there is enough free memory. If the block is successfully 
materialized, then the
+   * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
+   * so we won't acquire more memory than is actually needed to store the 
block.
+   *
+   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
+   * an iterator containing the values of the block. The returned 
iterator will be backed
+   * by the combination of the partially-unrolled block and the 
remaining elements of the
+   * original input iterator. The caller must either fully consume 
this iterator or call
+   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
+   * block.
+   */
+  private[storage] def putIteratorAsValues[T](
+  blockId: BlockId,
+  values: Iterator[T],
+  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
+
+// Underlying vector for unrolling the block
+var vector = new SizeTrackingVector[T]()(classTag)
+var arrayValues: Array[T] = null
+var preciseSize: Long = -1
+
+def storeValue(value: T): Unit = {
+  vector += value
+}
+
+def estimateSize(precise: Boolean): Long = {
+  if (precise) {
+// We only call need the precise size after all values unrolled.
+arrayValues = vector.toArray
+preciseSize = SizeEstimator.estimate(arrayValues)
+preciseSize
+  } else {
+vector.estimateSize()
+  }
+}
+
+def createMemoryEntry(): MemoryEntry[T] = {
+  // We successfully unrolled the entirety of this block
+  DeserializedMemoryEntry[T](arrayValues, preciseSize, classTag)
+}
+
+putIterator(blockId, values, classTag, MemoryMode.ON_HEAP, storeValue,
+  estimateSize, createMemoryEntry) match {
+  case Right(storedSize) => Right(storedSize)
+  case Left(unrollMemoryUsedByThisBlock) =>
+// We ran out of space while unrolling the values for this block
+val (unrolledIterator, size) = if (vector != null) {
--- End diff --

Under what situation will vector be null ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-18 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162548350
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -162,26 +162,33 @@ private[spark] class MemoryStore(
   }
 
   /**
-   * Attempt to put the given block in memory store as values.
+   * Attempt to put the given block in memory store as values or bytes.
*
* It's possible that the iterator is too large to materialize and store 
in memory. To avoid
* OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
* whether there is enough free memory. If the block is successfully 
materialized, then the
* temporary unroll memory used during the materialization is 
"transferred" to storage memory,
* so we won't acquire more memory than is actually needed to store the 
block.
*
-   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
-   * an iterator containing the values of the block. The returned 
iterator will be backed
-   * by the combination of the partially-unrolled block and the 
remaining elements of the
-   * original input iterator. The caller must either fully consume 
this iterator or call
-   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
-   * block.
+   * @param blockId The block id.
+   * @param values The values which need be stored.
+   * @param classTag the [[ClassTag]] for the block.
+   * @param memoryMode The values saved mode.
+   * @param storeValue Store the record of values to the MemoryStore.
+   * @param estimateSize Get the memory size which used to unroll the 
block. The parameters
+   * determine whether we need precise size.
+   * @param createMemoryEntry Using [[MemoryEntry]] to hold the stored 
values or bytes.
+   * @return if the block is stored successfully, return the stored data 
size. Else return the
+   * memory has used for unroll the block.
*/
-  private[storage] def putIteratorAsValues[T](
+  private def putIterator[T](
   blockId: BlockId,
   values: Iterator[T],
-  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
-
+  classTag: ClassTag[T],
+  memoryMode: MemoryMode,
+  storeValue: T => Unit,
+  estimateSize: Boolean => Long,
+  createMemoryEntry: () => MemoryEntry[T]): Either[Long, Long] = {
--- End diff --

trait?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-18 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162548052
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -233,17 +235,13 @@ private[spark] class MemoryStore(
 }
 
 if (keepUnrolling) {
-  // We successfully unrolled the entirety of this block
-  val arrayValues = vector.toArray
-  vector = null
-  val entry =
-new DeserializedMemoryEntry[T](arrayValues, 
SizeEstimator.estimate(arrayValues), classTag)
-  val size = entry.size
+  // get the precise size
+  val size = estimateSize(true)
--- End diff --

It seems deserialized values do not have a **precise** size, even for 
`SizeEstimator.estimate(arrayValues)`. This would be confused.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19285
  
It's just a refactor so I'd like to target it for 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162549412
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -67,31 +69,32 @@ object Partitioner {
   None
 }
 
-if (isEligiblePartitioner(hasMaxPartitioner, rdds)) {
+val defaultNumPartitions = if 
(rdd.context.conf.contains("spark.default.parallelism")) {
+  rdd.context.defaultParallelism
+} else {
+  rdds.map(_.partitions.length).max
+}
+
+// If the existing max partitioner is an eligible one, or its 
partitions number is larger
+// than the default number of partitions, use the existing partitioner.
+if (hasMaxPartitioner.nonEmpty && 
(isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {
--- End diff --

It depends on how you define "default". In this case, if we can benefit 
from reusing an existing partitioner, we should pick that partitioner. If we 
want to respect `spark.default.parallelism` strictly, we should not reuse 
partitioner at all.

For this particular case, picking the existing partitioner is obviously a 
better choice and it was the behavior before #20002 , so I'm +1 on this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/28/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86377/testReport)**
 for PR 20316 at commit 
[`b3fb8f2`](https://github.com/apache/spark/commit/b3fb8f22d25878b69166b80b8926256a811796ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20316
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-18 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162548620
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -67,31 +69,32 @@ object Partitioner {
   None
 }
 
-if (isEligiblePartitioner(hasMaxPartitioner, rdds)) {
+val defaultNumPartitions = if 
(rdd.context.conf.contains("spark.default.parallelism")) {
+  rdd.context.defaultParallelism
+} else {
+  rdds.map(_.partitions.length).max
+}
+
+// If the existing max partitioner is an eligible one, or its 
partitions number is larger
+// than the default number of partitions, use the existing partitioner.
+if (hasMaxPartitioner.nonEmpty && 
(isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {
--- End diff --


There are multiple cases here.

a) spark.default.parallelism is not set by user.
For this case, PR is a noop

b) maxPartitions is atleast an order higher than max partitioner

b.1) If spark.default.parallelism is not set, the PR is a noop.

b.2) spark.default.parallelism is explicitly set by user.

This is a change in behavior which has been introduced - rely on user 
specified value instead of trying to infer it when inferred value is off by 
atleast an order.


If users were setting suboptimal values for "spark.default.parallelism" - 
then there will be a change in behavior - though I would argue this is the 
*expected* behavior given documentation of 'spark.default.parallelism'



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86376/testReport)**
 for PR 20277 at commit 
[`3972093`](https://github.com/apache/spark/commit/397209342646a253a56650df8a00dfb6d66c834e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/27/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-18 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19340
  
@viirya yes you're right in your analysis. Where in the doc should we put 
this?

@srowen please if you.think this.is.ok, may you start a build? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-18 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162548031
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java ---
@@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws 
IOException {
   }
 
   @Override
-  public void close() throws IOException {
+  public synchronized void close() throws IOException {
 if (!closed) {
--- End diff --

=> isOpen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20324: [SPARK-23091][ML] Incorrect unit test for approxQ...

2018-01-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20324#discussion_r162547532
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
@@ -154,24 +154,24 @@ class DataFrameStatSuite extends QueryTest with 
SharedSQLContext {
   val Array(d1, d2) = df.stat.approxQuantile("doubles", Array(q1, q2), 
epsilon)
   val Array(s1, s2) = df.stat.approxQuantile("singles", Array(q1, q2), 
epsilon)
 
-  val error_single = 2 * 1000 * epsilon
-  val error_double = 2 * 2000 * epsilon
+  val errorSingle = 1000 * epsilon
+  val errorDouble = 2.0 * errorSingle
 
-  assert(math.abs(single1 - q1 * n) < error_single)
-  assert(math.abs(double2 - 2 * q2 * n) < error_double)
-  assert(math.abs(s1 - q1 * n) < error_single)
-  assert(math.abs(s2 - q2 * n) < error_single)
-  assert(math.abs(d1 - 2 * q1 * n) < error_double)
-  assert(math.abs(d2 - 2 * q2 * n) < error_double)
+  assert(math.abs(single1 - q1 * n) < errorSingle)
--- End diff --

Seems the intervals are inclusive, so this might be `<=` instead of `<`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162547306
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala 
---
@@ -50,7 +50,14 @@ private[sql] trait ColumnarBatchScan extends 
CodegenSupport {
   dataType: DataType,
   nullable: Boolean): ExprCode = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(columnVar, dataType, ordinal)
+val value = if (dataType.isInstanceOf[StructType]) {
+  // `ColumnVector.getStruct` is different from 
`InternalRow.getStruct`, it only takes an
+  // `ordinal` parameter.
+  s"$columnVar.getStruct($ordinal)"
+} else {
+  ctx.getValue(columnVar, dataType, ordinal)
+}
--- End diff --

ah I didn't know there is such an API. I'll use it instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86375/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86375/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86372/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86372 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86372/testReport)**
 for PR 20277 at commit 
[`37c82e6`](https://github.com/apache/spark/commit/37c82e6ed07664eed8c3e8ff10e1e6c582e69d33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20327
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20327
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-18 Thread gerashegalov
GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/20327

[SPARK-12963][CORE] NM host for driver end points

## What changes were proposed in this pull request?

Driver end points on YARN in the cluster mode are potentially bound to 
incorrect network interfaces because the host name is not retrieved from YARN 
as in the executor container case.

## How was this patch tested?

On a cluster previously experiencing `Service 'Driver' failed after 16 
retries  (on a random free port) ...`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark 
gera/driver-host-on-in-yarn-cluster-mode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20327


commit 016323a8d163e31052776a50590b47d9a38b6cdb
Author: Gera Shegalov 
Date:   2018-01-19T06:05:32Z

[SPARK-12963][CORE] NM host for driver end points




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18277
  
This change looks reasonable to me for now. But I'm also concerned about 
the behavior change. A note into release notes should be good or maybe we need 
a note at migration guide in `RDD Programming Guide`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86375/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86367/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86367/testReport)**
 for PR 20316 at commit 
[`e05fb06`](https://github.com/apache/spark/commit/e05fb06ae59b633ec6adc1a2943edd9ebd6951d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18277
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20326: [SPARK-23155][DEPLOY] log.server.url links in SHS

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20326
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20326: [SPARK-23155][DEPLOY] log.server.url links in SHS

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20326
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20326: [SPARK-23155][DEPLOY] log.server.url links in SHS

2018-01-18 Thread gerashegalov
GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/20326

[SPARK-23155][DEPLOY] log.server.url links in SHS

## What changes were proposed in this pull request?

Ensure driver/executor log availability via Spark History Server UI even if 
the original NodeManagers are gone. The patch is a minimum viable prototype 
that assumes access to NM's yarn conf with a fixed RPC port on a SHS machine 
because it does not add anything to the event log regarding YARN nodeId.

## How was this patch tested?

Manually tested with a pseudo-distirbuted YARN cluster. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark gera/shs-aggregated-logs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20326


commit 1a8c7909e9102bbc99c06521b660c20dc4f0d5aa
Author: Gera Shegalov 
Date:   2018-01-18T07:54:25Z

[SPARK-23155][DEPLOY] log.server.url links in SHS




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues

2018-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20323
  
> If Travis CI can not handle the full traffic of Apache Spark PRs, we may 
run it for only Java code change PRs.

@dongjoon-hyun, do you know if Travis CI supports exclusion/inclusion of 
only changed files? I was under impression that Travis CI doesn't have this 
feature although AppVeyor has.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20325: [SPARK-22808][DOCS] add insertInto when save hive built ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20325
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20325: [SPARK-22808][DOCS] add insertInto when save hive built ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20325
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-18 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19892
  
I’m generally ok with these small python api wrapper additions getting
merged as long as the risk of breaking anything is low - and here it is
since it’s just api parity
On Fri, 19 Jan 2018 at 06:08, Holden Karau  wrote:

> I mean I think it might have a chance, generally speaking we've allowed
> outstanding PRs to be merged after the freeze. Since there are outstanding
> blockers on the branch preventing us from cutting RC2 maybe its ok to move
> forward if we can do it quickly? Of course I defer to MLNick :)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

2018-01-18 Thread brandonJY
GitHub user brandonJY opened a pull request:

https://github.com/apache/spark/pull/20325

[SPARK-22808][DOCS] add insertInto when save hive built dataframe

## What changes were proposed in this pull request?

based on https://issues.apache.org/jira/browse/SPARK-22808 &
https://issues.apache.org/jira/browse/SPARK-16803, insertInto should be
used instead of saveAsTable when dataframe is built on Hive table.
Example code in this doc does not affect. Additional example code is not
added in the moment, due to we may patch for the saveAsTable later.
So just editing the doc at the moment.

## How was this patch tested?

manual tested


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brandonJY/spark SPARK-22808

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20325


commit a40a8fae927eb63dabb8461f02f737d5b25ce5e6
Author: Brandon Jiang 
Date:   2018-01-19T05:48:22Z

[SPARK-22808][DOCS] add insertInto when save hive built dataframe

based on https://issues.apache.org/jira/browse/SPARK-22808 &
https://issues.apache.org/jira/browse/SPARK-16803, insertInto should be
used instead of saveAsTable when dataframe is built on Hive table.
Example code in this doc does not affect. Additional example code is not
added in the moment, due to we may patch for the saveAsTable later.
So just editing the doc at the moment.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86368/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20324
  
**[Test build #86368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86368/testReport)**
 for PR 20324 at commit 
[`4eef6d6`](https://github.com/apache/spark/commit/4eef6d6b127ababb81f79ec4e8f4168f8fe89e34).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86365/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20298
  
**[Test build #86365 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86365/testReport)**
 for PR 20298 at commit 
[`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-18 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19285
  
Are we targeting this to 2.3 or 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162536698
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala 
---
@@ -50,7 +50,14 @@ private[sql] trait ColumnarBatchScan extends 
CodegenSupport {
   dataType: DataType,
   nullable: Boolean): ExprCode = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(columnVar, dataType, ordinal)
+val value = if (dataType.isInstanceOf[StructType]) {
+  // `ColumnVector.getStruct` is different from 
`InternalRow.getStruct`, it only takes an
+  // `ordinal` parameter.
+  s"$columnVar.getStruct($ordinal)"
+} else {
+  ctx.getValue(columnVar, dataType, ordinal)
+}
--- End diff --

Can't we use this API?
```scala
  /**
   * Returns the specialized code to access a value from a column vector 
for a given `DataType`.
   */
  def getValue(vector: String, rowId: String, dataType: DataType): String = 
{
...
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162536737
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala
 ---
@@ -127,8 +127,14 @@ class VectorizedHashMapGenerator(
 
 def genEqualsForKeys(groupingKeys: Seq[Buffer]): String = {
   groupingKeys.zipWithIndex.map { case (key: Buffer, ordinal: Int) =>
-s"""(${ctx.genEqual(key.dataType, 
ctx.getValue(s"vectors[$ordinal]", "buckets[idx]",
-  key.dataType), key.name)})"""
+// `ColumnVector.getStruct` is different from 
`InternalRow.getStruct`, it only takes an
+// `ordinal` parameter.
+val value = if (key.dataType.isInstanceOf[StructType]) {
+  s"vectors[$ordinal].getStruct(buckets[idx])"
+} else {
+  ctx.getValue(s"vectors[$ordinal]", "buckets[idx]", key.dataType)
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86369/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86369/testReport)**
 for PR 20316 at commit 
[`b3fb8f2`](https://github.com/apache/spark/commit/b3fb8f22d25878b69166b80b8926256a811796ca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vectors.sp...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86373/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vectors.sp...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20275
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vectors.sp...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20275
  
**[Test build #86373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86373/testReport)**
 for PR 20275 at commit 
[`f3a4329`](https://github.com/apache/spark/commit/f3a4329d1ec0358de485ea0955f1dbb590342caf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162534791
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -53,166 +41,83 @@ public int numNulls() {
   @Override
   public void close() {
 if (childColumns != null) {
-  for (int i = 0; i < childColumns.length; i++) {
-childColumns[i].close();
+  for (ArrowColumnVector childColumn : childColumns) {
--- End diff --

the performance is same, it's just a more standard way to iterate an array 
in java


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19301
  
I believe this has been fixed, can we close it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19293: [SPARK-22079][SQL] Serializer in HiveOutputWriter miss l...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19293
  
if it's too hard to write a UT, can we have a code snippet to reproduce 
this bug and put it in PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19285
  
overall looks good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162534339
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -261,37 +263,93 @@ private[spark] class MemoryStore(
   // If this task attempt already owns more unroll memory than is 
necessary to store the
   // block, then release the extra memory that will not be used.
   val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
-  releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, 
excessUnrollMemory)
+  releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory)
   transferUnrollToStorage(size)
   true
 }
   }
+
   if (enoughStorageMemory) {
 entries.synchronized {
-  entries.put(blockId, entry)
+  entries.put(blockId, createMemoryEntry())
 }
 logInfo("Block %s stored as values in memory (estimated size %s, 
free %s)".format(
   blockId, Utils.bytesToString(size), 
Utils.bytesToString(maxMemory - blocksMemoryUsed)))
 Right(size)
   } else {
 assert(currentUnrollMemoryForThisTask >= 
unrollMemoryUsedByThisBlock,
   "released too much unroll memory")
+Left(unrollMemoryUsedByThisBlock)
+  }
+} else {
+  Left(unrollMemoryUsedByThisBlock)
+}
+  }
+
+  /**
+   * Attempt to put the given block in memory store as values.
+   *
+   * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
+   * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
+   * whether there is enough free memory. If the block is successfully 
materialized, then the
+   * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
+   * so we won't acquire more memory than is actually needed to store the 
block.
--- End diff --

let's not duplicated this document


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162534289
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -162,26 +162,33 @@ private[spark] class MemoryStore(
   }
 
   /**
-   * Attempt to put the given block in memory store as values.
+   * Attempt to put the given block in memory store as values or bytes.
*
* It's possible that the iterator is too large to materialize and store 
in memory. To avoid
* OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
* whether there is enough free memory. If the block is successfully 
materialized, then the
* temporary unroll memory used during the materialization is 
"transferred" to storage memory,
* so we won't acquire more memory than is actually needed to store the 
block.
*
-   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
-   * an iterator containing the values of the block. The returned 
iterator will be backed
-   * by the combination of the partially-unrolled block and the 
remaining elements of the
-   * original input iterator. The caller must either fully consume 
this iterator or call
-   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
-   * block.
+   * @param blockId The block id.
+   * @param values The values which need be stored.
+   * @param classTag the [[ClassTag]] for the block.
+   * @param memoryMode The values saved mode.
+   * @param storeValue Store the record of values to the MemoryStore.
+   * @param estimateSize Get the memory size which used to unroll the 
block. The parameters
+   * determine whether we need precise size.
+   * @param createMemoryEntry Using [[MemoryEntry]] to hold the stored 
values or bytes.
+   * @return if the block is stored successfully, return the stored data 
size. Else return the
+   * memory has used for unroll the block.
*/
-  private[storage] def putIteratorAsValues[T](
+  private def putIterator[T](
   blockId: BlockId,
   values: Iterator[T],
-  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
-
+  classTag: ClassTag[T],
+  memoryMode: MemoryMode,
+  storeValue: T => Unit,
+  estimateSize: Boolean => Long,
+  createMemoryEntry: () => MemoryEntry[T]): Either[Long, Long] = {
--- End diff --

instead of passing 3 functions, I'd like to introduce 
```
class ValuesHolder {
  def store(value)
  def esitimatedSize()
  def build(): MemoryEntry
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19892
  
I mean I think it might have a chance, generally speaking we've allowed 
outstanding PRs to be merged after the freeze. Since there are outstanding 
blockers on the branch preventing us from cutting RC2 maybe its ok to move 
forward if we can do it quickly? Of course I defer to MLNick :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #86374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86374/testReport)**
 for PR 19175 at commit 
[`d1133ca`](https://github.com/apache/spark/commit/d1133caf92fee9201f9637b828c1e6a52d715eff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18277
  
Jenkins OK to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19175
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19420: [SPARK-22191] [SQL] Add hive serde example with serde pr...

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19420
  
I love more examples, but is there a place we plan to put this in the 
documentation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19420: [SPARK-22191] [SQL] Add hive serde example with s...

2018-01-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19420#discussion_r162532658
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/hive/JavaSparkHiveExample.java
 ---
@@ -124,6 +124,13 @@ public static void main(String[] args) {
 // ...
 // $example off:spark_hive$
 
+// Hive serde's are also supported with serde properties.
+   String sqlQuery = "CREATE TABLE src_serde(key decimal(38,18), value 
int) USING hive" +
--- End diff --

The indentation here doesn't match the rest of the function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...

2018-01-18 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20260
  
I will fix the error of this PR after 
https://github.com/apache/spark/pull/20249#issuecomment-358720962 merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19420: [SPARK-22191] [SQL] Add hive serde example with serde pr...

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19420
  
Jenkins OK to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17185
  
I agree it's a valid use case, do you wanna bring it up to date? sorry for 
the delay!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17123
  
cc @WeichenXu123 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r162532163
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4279,6 +4273,425 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
--- End diff --

oh sorry I misread the code, I thought it was testing 
`pandas_agg_mean_udf`, then this is totally fine, we don't need the manual test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19876
  
also maybe @dbtsai ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20306


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162531810
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -152,19 +198,11 @@ public final ColumnarRow getStruct(int rowId) {
 return new ColumnarRow(this, rowId);
   }
 
-  /**
-   * A special version of {@link #getStruct(int)}, which is only used as 
an adapter for Spark
-   * codegen framework, the second parameter is totally ignored.
-   */
-  public final ColumnarRow getStruct(int rowId, int size) {
-return getStruct(rowId);
-  }
-
   /**
* Returns the array for rowId.
*/
   public final ColumnarArray getArray(int rowId) {
-return new ColumnarArray(arrayData(), getArrayOffset(rowId), 
getArrayLength(rowId));
+return new ColumnarArray(getChild(0), getArrayOffset(rowId), 
getArrayLength(rowId));
--- End diff --

Why change this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20306
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162531441
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -53,166 +41,83 @@ public int numNulls() {
   @Override
   public void close() {
 if (childColumns != null) {
-  for (int i = 0; i < childColumns.length; i++) {
-childColumns[i].close();
+  for (ArrowColumnVector childColumn : childColumns) {
--- End diff --

Will this be faster?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162531459
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -53,166 +41,83 @@ public int numNulls() {
   @Override
   public void close() {
 if (childColumns != null) {
-  for (int i = 0; i < childColumns.length; i++) {
-childColumns[i].close();
+  for (ArrowColumnVector childColumn : childColumns) {
--- End diff --

Should also apply similar change to `WritableColumnVector`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162531289
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -67,31 +69,32 @@ object Partitioner {
   None
 }
 
-if (isEligiblePartitioner(hasMaxPartitioner, rdds)) {
+val defaultNumPartitions = if 
(rdd.context.conf.contains("spark.default.parallelism")) {
+  rdd.context.defaultParallelism
+} else {
+  rdds.map(_.partitions.length).max
+}
+
+// If the existing max partitioner is an eligible one, or its 
partitions number is larger
+// than the default number of partitions, use the existing partitioner.
+if (hasMaxPartitioner.nonEmpty && 
(isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {
--- End diff --

This is the core change. I think it makes sense as it fixes a regression in 
https://github.com/apache/spark/pull/20002

If the partitioner is not eligible, but its numPartition is larger the the 
default one, we should still pick this partitioner instead of creating a new 
one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20025: [SPARK-22837][SQL]Session timeout checker does not work ...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20025
  
**[Test build #86370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86370/testReport)**
 for PR 20025 at commit 
[`5b3c06f`](https://github.com/apache/spark/commit/5b3c06f07f017c9a7b1dc51a4461fbf1c63ac350).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20025: [SPARK-22837][SQL]Session timeout checker does not work ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86370/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20025: [SPARK-22837][SQL]Session timeout checker does not work ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20025
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18983
  
LGTM, although I'm not very familiar with the thrift server code...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/26/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >