date:20180119

[GitHub] spark pull request #20302: [SPARK-23094] Fix invalid character handling in J...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20302#discussion_r162682668
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala
 ---
@@ -105,4 +107,36 @@ class JsonHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   )
 }
   }
+
+  test("invalid json with leading nulls - from file (multiLine=true)") {
+import testImplicits._
+withTempDir { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path)
+  val expected = s"""$badJson\n{"a":1}\n"""
+  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
+  val df =
+spark.read.format(dataSourceName).option("multiLine", 
true).schema(schema).load(path)
+  checkAnswer(df, Row(null, expected))
+}
+  }
+
+  test("invalid json with leading nulls - from file (multiLine=false)") {
+import testImplicits._
+withTempDir { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path)
+  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
+  val df =
+spark.read.format(dataSourceName).option("multiLine", 
false).schema(schema).load(path)
+  checkAnswer(df, Seq(Row(1, null), Row(null, badJson)))
+}
+  }
+
+  test("invalid json with leading nulls - from dataset") {
--- End diff --

See the PR https://github.com/apache/spark/pull/20331


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...

2018-01-19 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/20331

[SPARK-23158] [SQL] Move HadoopFsRelationTest test suites to from sql/hive 
to sql/core

## What changes were proposed in this pull request?
The test suites that extend HadoopFsRelationTest are not in sql/hive 
packages, but their directories are in sql/hive. We should move them to 
sql/core.

## How was this patch tested?
The existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark moveTests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20331


commit f7693f0abfe0923868c1918ddcaeaece2c107c5d
Author: gatorsmile 
Date:   2018-01-19T16:57:50Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20302: [SPARK-23094] Fix invalid character handling in J...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20302#discussion_r162682091
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala
 ---
@@ -105,4 +107,36 @@ class JsonHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   )
 }
   }
+
+  test("invalid json with leading nulls - from file (multiLine=true)") {
+import testImplicits._
+withTempDir { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path)
+  val expected = s"""$badJson\n{"a":1}\n"""
+  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
+  val df =
+spark.read.format(dataSourceName).option("multiLine", 
true).schema(schema).load(path)
+  checkAnswer(df, Row(null, expected))
+}
+  }
+
+  test("invalid json with leading nulls - from file (multiLine=false)") {
+import testImplicits._
+withTempDir { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path)
+  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
+  val df =
+spark.read.format(dataSourceName).option("multiLine", 
false).schema(schema).load(path)
+  checkAnswer(df, Seq(Row(1, null), Row(null, badJson)))
+}
+  }
+
+  test("invalid json with leading nulls - from dataset") {
--- End diff --

This test suite is still in ` org.apache.spark.sql.sources`. We should move 
these test suite to `/sql/core`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19340
  
Thanks, I didn't know its existence.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86391/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues

2018-01-19 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20323
  
The only downside is spreading the CI here across many different systems. I 
know we add Appveyor because it was the only way to test on Windows (right?). 
Adding Travis too just for Java style checks is more questionable. Yes it has 
nothing to do with Jenkins though. I think we've just punted on this and 
accepted that Java style checks need to be executed manually once in a while.

One middle-ground is to enable style checks in the Jenkins jobs besides the 
PR builder. You still don't catch violations at the time a PR is submitted, but 
at least catch them automatically, promptly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20319
  
**[Test build #86391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86391/testReport)**
 for PR 20319 at commit 
[`b6e06e8`](https://github.com/apache/spark/commit/b6e06e8e280f97560a342e287072f0b49e85bb79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest `
  * `class GaussianMixtureSuite extends MLTest with DefaultReadWriteTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19340
  
**[Test build #4065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4065/testReport)**
 for PR 19340 at commit 
[`fda93ae`](https://github.com/apache/spark/commit/fda93aeadd782d520f32eb34475e3a7fa349c425).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues

2018-01-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20323
  
I'm wondering why we are okay for AppVoyer and not okay for Travis CI. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19340
  
I think it may not be responding now for whatever reason. I use 
https://spark-prs.appspot.com/ to view and trigger tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19340
  
@srowen sorry, I don't know why but it seems that I cannot start new 
jenkins jobs for this PR... May you white-list it or trigger a new test please? 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues

2018-01-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20323
  
@HyukjinKwon . Travis CI will trigger at every commit. What I mean is our 
script can check the Java changes only.

@sameeragarwal . Travis CI is independently running on Travis CI site like 
AppVoyer. That is the exact reason why I added Travis CI.
- Travis CI will finish faster than Jenkins.
- Travis CI will not add a time or any overload to Jenkins.

Please see [this](https://travis-ci.org/dongjoon-hyun/spark/builds).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20321: [SPARK-23152][ML] - Correctly guard against empty...

2018-01-19 Thread tovbinm

Github user tovbinm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20321#discussion_r162678515
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -109,7 +109,7 @@ abstract class Classifier[
   case None =>
 // Get number of classes from dataset itself.
 val maxLabelRow: Array[Row] = 
dataset.select(max($(labelCol))).take(1)
-if (maxLabelRow.isEmpty) {
+if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) {
--- End diff --

@dongjoon-hyun done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18983
  
cc @liufengdb 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20316


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20316
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20025: [SPARK-22837][SQL]Session timeout checker does no...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20025#discussion_r162673886
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java
 ---
@@ -23,11 +23,7 @@
 import java.util.ArrayList;
 import java.util.Date;
 import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
-import java.util.concurrent.Future;
-import java.util.concurrent.LinkedBlockingQueue;
-import java.util.concurrent.ThreadPoolExecutor;
-import java.util.concurrent.TimeUnit;
--- End diff --

revert this back. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20087
  
@fjh100456 Thanks for working on it! It is pretty close to be merged. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162673688
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 500
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => 
ParquetOptions.shortParquetCompressionCodecNames(name).name()
+  case "orc" => OrcOptions.shortOrcCompressionCodecNames(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter{ file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162672650
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 500
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => 
ParquetOptions.shortParquetCompressionCodecNames(name).name()
+  case "orc" => OrcOptions.shortOrcCompressionCodecNames(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter{ file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162671108
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 50
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => ParquetOptions.getParquetCompressionCodecName(name)
+  case "orc" => OrcOptions.getORCCompressionCodecName(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter { file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+  partitionV

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162673292
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -260,17 +282,21 @@ class CompressionCodecSuite extends TestHiveSingleton 
with ParquetTest with Befo
   def checkForTableWithoutCompressProp(format: String, compressCodecs: 
List[String]): Unit = {
 Seq(true, false).foreach { isPartitioned =>
   Seq(true, false).foreach { convertMetastore =>
-checkTableCompressionCodecForCodecs(
-  format,
-  isPartitioned,
-  convertMetastore,
-  compressionCodecs = compressCodecs,
-  tableCompressionCodecs = List(null)) {
-  case (tableCompressionCodec, sessionCompressionCodec, 
realCompressionCodec, tableSize) =>
-// Always expect session-level take effect
-assert(sessionCompressionCodec == realCompressionCodec)
-assert(checkTableSize(format, sessionCompressionCodec,
-  isPartitioned, convertMetastore, tableSize))
+Seq(true, false).foreach { usingCTAS =>
+  checkTableCompressionCodecForCodecs(
+format,
+isPartitioned,
+convertMetastore,
+usingCTAS,
+compressionCodecs = compressCodecs,
+tableCompressionCodecs = List(null)) {
+case
+  (tableCompressionCodec, sessionCompressionCodec, 
realCompressionCodec, tableSize) =>
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162671801
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 50
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => ParquetOptions.getParquetCompressionCodecName(name)
+  case "orc" => OrcOptions.getORCCompressionCodecName(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter { file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+  partitionV

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162673245
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -260,17 +282,21 @@ class CompressionCodecSuite extends TestHiveSingleton 
with ParquetTest with Befo
   def checkForTableWithoutCompressProp(format: String, compressCodecs: 
List[String]): Unit = {
 Seq(true, false).foreach { isPartitioned =>
   Seq(true, false).foreach { convertMetastore =>
-checkTableCompressionCodecForCodecs(
-  format,
-  isPartitioned,
-  convertMetastore,
-  compressionCodecs = compressCodecs,
-  tableCompressionCodecs = List(null)) {
-  case (tableCompressionCodec, sessionCompressionCodec, 
realCompressionCodec, tableSize) =>
-// Always expect session-level take effect
-assert(sessionCompressionCodec == realCompressionCodec)
-assert(checkTableSize(format, sessionCompressionCodec,
-  isPartitioned, convertMetastore, tableSize))
+Seq(true, false).foreach { usingCTAS =>
+  checkTableCompressionCodecForCodecs(
+format,
+isPartitioned,
+convertMetastore,
+usingCTAS,
+compressionCodecs = compressCodecs,
+tableCompressionCodecs = List(null)) {
+case
+  (tableCompressionCodec, sessionCompressionCodec, 
realCompressionCodec, tableSize) =>
+  // Always expect session-level take effect
+  assert(sessionCompressionCodec == realCompressionCodec)
+  assert(checkTableSize(format, sessionCompressionCodec,
+  isPartitioned, convertMetastore, usingCTAS, tableSize))
--- End diff --

```
assert(checkTableSize(
  format, sessionCompressionCodec, isPartitioned, convertMetastore, 
usingCTAS, tableSize))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162672130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
 ---
@@ -82,4 +82,7 @@ object ParquetOptions {
 "snappy" -> CompressionCodecName.SNAPPY,
 "gzip" -> CompressionCodecName.GZIP,
 "lzo" -> CompressionCodecName.LZO)
+
+  def getParquetCompressionCodecName(name: String): String =
+shortParquetCompressionCodecNames(name).name()
--- End diff --

```Scala
def getParquetCompressionCodecName(name: String): String = {
  shortParquetCompressionCodecNames(name).name()
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86388/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86388/testReport)**
 for PR 20316 at commit 
[`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20319
  
**[Test build #86391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86391/testReport)**
 for PR 20319 at commit 
[`b6e06e8`](https://github.com/apache/spark/commit/b6e06e8e280f97560a342e287072f0b49e85bb79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20319
  
Jenkins, add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19340
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/41/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19340
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19340
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19340
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19340
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/40/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...

2018-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20275


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vectors.sp...

2018-01-19 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20275
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19340#discussion_r162650427
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -546,10 +577,112 @@ object KMeans {
   .run(data)
   }
 
+  private[spark] def validateInitMode(initMode: String): Boolean = {
+initMode match {
+  case KMeans.RANDOM => true
+  case KMeans.K_MEANS_PARALLEL => true
+  case _ => false
+}
+  }
+
+  private[spark] def validateDistanceMeasure(distanceMeasure: String): 
Boolean = {
+distanceMeasure match {
+  case DistanceMeasure.EUCLIDEAN => true
+  case DistanceMeasure.COSINE => true
+  case _ => false
+}
+  }
+}
+
+/**
+ * A vector with its norm for fast distance computation.
+ *
+ * @see [[org.apache.spark.mllib.clustering.KMeans#fastSquaredDistance]]
--- End diff --

This seems to fail the doc build for some reason. You can just remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19340
  
**[Test build #4064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4064/testReport)**
 for PR 19340 at commit 
[`5ed87ea`](https://github.com/apache/spark/commit/5ed87ea9d946228dbf84d624e019008bb98219c7).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-01-19 Thread brad-kaiser

Github user brad-kaiser commented on the issue:

https://github.com/apache/spark/pull/19041
  
Hey @vanzin, I just wanted to follow up and see if you've had a chance to 
look at this. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19340
  
**[Test build #4064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4064/testReport)**
 for PR 19340 at commit 
[`5ed87ea`](https://github.com/apache/spark/commit/5ed87ea9d946228dbf84d624e019008bb98219c7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20324
  
**[Test build #86390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86390/testReport)**
 for PR 20324 at commit 
[`673c520`](https://github.com/apache/spark/commit/673c52042a70b5dfc061dd053ae2e6553a4a2612).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/39/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20328: [SPARK-23000] [TEST] Keep Derby DB Location Uncha...

2018-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20328


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20328
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86387/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)**
 for PR 20276 at commit 
[`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-19 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/19872
  
@ueshin I think all comments are addressed. Can you take a final look? 
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-19 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r162635572
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4279,6 +4273,425 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
--- End diff --

Ah. No worries. Thanks for clarification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86384/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20328
  
**[Test build #86384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86384/testReport)**
 for PR 20328 at commit 
[`b9aa879`](https://github.com/apache/spark/commit/b9aa879104ab010700e5f19c457fd791cc255ff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20216: [SPARK-23024][WEB-UI]Spark ui about the contents ...

2018-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20216


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20216: [SPARK-23024][WEB-UI]Spark ui about the contents of the ...

2018-01-19 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20216
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #86389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86389/testReport)**
 for PR 20330 at commit 
[`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [WIP][SPARK-23011] Support alternative function form wit...

2018-01-19 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/20295
  
Yep, that's correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #4063 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4063/testReport)**
 for PR 20330 at commit 
[`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20330
  
Jenkins add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #4062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4062/testReport)**
 for PR 20297 at commit 
[`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #4060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4060/testReport)**
 for PR 20297 at commit 
[`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162622469
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -65,10 +68,13 @@ private[ui] class AllJobsPage(parent: JobsTab, store: 
AppStatusStore) extends We
 }.map { job =>
   val jobId = job.jobId
   val status = job.status
-  val jobDescription = 
store.lastStageAttempt(job.stageIds.max).description
-  val displayJobDescription = jobDescription
-.map(UIUtils.makeDescription(_, "", plainText = true).text)
-.getOrElse("")
+  val (_, lastStageDescription) = lastStageNameAndDescription(store, 
job)
+  val displayJobDescription =
+if (lastStageDescription.isEmpty) {
+  job.name
--- End diff --

Using job.name instead of "" to behave more like the pre-2.3 version:

https://github.com/smurakozi/spark/blob/772e4648d95bda3353723337723543c741ea8476/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala#L70


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread smurakozi

Github user smurakozi commented on the issue:

https://github.com/apache/spark/pull/20330
  
cc @jiangxb1987, @srowen, @vanzin  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86388/testReport)**
 for PR 20316 at commit 
[`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/38/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20316
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20316
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86386/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20316
  
**[Test build #86386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86386/testReport)**
 for PR 20316 at commit 
[`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread smurakozi

Github user smurakozi commented on the issue:

https://github.com/apache/spark/pull/20330
  
@guoxiaolongzte could you please check if this change fixes the issue you 
have observed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread smurakozi

GitHub user smurakozi opened a pull request:

https://github.com/apache/spark/pull/20330

[SPARK-23121][core] Fix for ui becoming unaccessible for long running 
streaming apps 

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG 
visualization from the store, but for long running jobs they are not guaranteed 
to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and 
`store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default 
values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in 
SPARK-23121:

./bin/spark-submit --class 
org.apache.spark.examples.streaming.HdfsWordCount 
./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smurakozi/spark SPARK-23121

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20330


commit 94d50b42d6bf233afd398049c95386920c21c252
Author: Sandor Murakozi 
Date:   2018-01-19T10:59:36Z

Fixed issue caused by the store cleaning up old stages

commit d60ae4f39337b91118324064c6a3dc58a3fc2832
Author: Sandor Murakozi 
Date:   2018-01-19T11:33:27Z

JobPage doesn't break if operationGraphForJob is not in the store for a 
jobid

commit 832378d25245126c285e794fadcaea019b70a78a
Author: Sandor Murakozi 
Date:   2018-01-19T11:34:59Z

lastStageNameAndDescription uses store.lastStageAttempt

commit 6525ef4eda0bf65bbbcb842495341afc8c5971ad
Author: Sandor Murakozi 
Date:   2018-01-19T12:15:33Z

Changed message in case of missing DAG visualization info




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #4061 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4061/testReport)**
 for PR 20297 at commit 
[`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...

2018-01-19 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20026#discussion_r162610474
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -152,7 +153,7 @@ private class DiskBlockData(
 file: File,
 blockSize: Long) extends BlockData {
 
-  override def toInputStream(): InputStream = new FileInputStream(file)
+  override def toInputStream(): InputStream = new 
NioBufferedFileInputStream(file)
--- End diff --

IIUC, the returned `InputStream` will be deserialized in `BlockManger`, And 
deserializer will copy the data from direct memory to on-heap memory, otherwise 
how do we visit POJO?

So unless if we purely manipulate binary data, otherwise we have to copy 
the data to on-heap. Please correct me if I'm wrong.

Besides, I think this is not the hotspot, so memory copying should not 
bring in big overhead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86383/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #86383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86383/testReport)**
 for PR 20087 at commit 
[`99271d6`](https://github.com/apache/spark/commit/99271d670a0aed444ad624d56304d94490eed0cb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)**
 for PR 20276 at commit 
[`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/37/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20281: [SPARK-23089][STS] Recreate session log directory...

2018-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20281


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20281
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86381/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20328
  
**[Test build #86381 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86381/testReport)**
 for PR 20328 at commit 
[`5b97119`](https://github.com/apache/spark/commit/5b971190485468ebdc436dd98bad4e61fbc574bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20329: Merge pull request #1 from apache/master

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20329
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20329: Merge pull request #1 from apache/master

2018-01-19 Thread simon-wind

Github user simon-wind closed the pull request at:

https://github.com/apache/spark/pull/20329


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20329: Merge pull request #1 from apache/master

2018-01-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20329
  
@simon-wind, seems mistakenly open. Could you close this please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20329: Merge pull request #1 from apache/master

2018-01-19 Thread simon-wind

Github user simon-wind commented on the issue:

https://github.com/apache/spark/pull/20329
  
merge latest branch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20329: Merge pull request #1 from apache/master

2018-01-19 Thread simon-wind

GitHub user simon-wind opened a pull request:

https://github.com/apache/spark/pull/20329

Merge pull request #1 from apache/master

Fork The  Latest Version

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/simon-wind/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20329


commit 27dd069615237f601da2d3d9edc403824f0dd6af
Author: Simon <1031131669@...>
Date:   2017-06-09T03:59:45Z

Merge pull request #1 from apache/master

Fork The  Latest Version




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20281
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20281
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86385/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20281
  
**[Test build #86385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86385/testReport)**
 for PR 20281 at commit 
[`8b4eb1c`](https://github.com/apache/spark/commit/8b4eb1c33c525ba3eaab79fe1efa4f61fba7367f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86378/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20328
  
**[Test build #86378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86378/testReport)**
 for PR 20328 at commit 
[`a7359a9`](https://github.com/apache/spark/commit/a7359a9634966851c14be02cbd6468e5c41a4347).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SessionStateSuite extends SparkFunSuite `
  * `class HiveSessionStateSuite extends SessionStateSuite with 
TestHiveSingleton `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

201 - 300 of 372 matches

Mail list logo