[GitHub] spark pull request #20302: [SPARK-23094] Fix invalid character handling in J...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20302#discussion_r162682668 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala --- @@ -105,4 +107,36 @@ class JsonHadoopFsRelationSuite extends HadoopFsRelationTest { ) } } + + test("invalid json with leading nulls - from file (multiLine=true)") { +import testImplicits._ +withTempDir { tempDir => + val path = tempDir.getAbsolutePath + Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) + val expected = s"""$badJson\n{"a":1}\n""" + val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) + val df = +spark.read.format(dataSourceName).option("multiLine", true).schema(schema).load(path) + checkAnswer(df, Row(null, expected)) +} + } + + test("invalid json with leading nulls - from file (multiLine=false)") { +import testImplicits._ +withTempDir { tempDir => + val path = tempDir.getAbsolutePath + Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) + val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) + val df = +spark.read.format(dataSourceName).option("multiLine", false).schema(schema).load(path) + checkAnswer(df, Seq(Row(1, null), Row(null, badJson))) +} + } + + test("invalid json with leading nulls - from dataset") { --- End diff -- See the PR https://github.com/apache/spark/pull/20331 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20331 [SPARK-23158] [SQL] Move HadoopFsRelationTest test suites to from sql/hive to sql/core ## What changes were proposed in this pull request? The test suites that extend HadoopFsRelationTest are not in sql/hive packages, but their directories are in sql/hive. We should move them to sql/core. ## How was this patch tested? The existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark moveTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20331 commit f7693f0abfe0923868c1918ddcaeaece2c107c5d Author: gatorsmile Date: 2018-01-19T16:57:50Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20302: [SPARK-23094] Fix invalid character handling in J...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20302#discussion_r162682091 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala --- @@ -105,4 +107,36 @@ class JsonHadoopFsRelationSuite extends HadoopFsRelationTest { ) } } + + test("invalid json with leading nulls - from file (multiLine=true)") { +import testImplicits._ +withTempDir { tempDir => + val path = tempDir.getAbsolutePath + Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) + val expected = s"""$badJson\n{"a":1}\n""" + val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) + val df = +spark.read.format(dataSourceName).option("multiLine", true).schema(schema).load(path) + checkAnswer(df, Row(null, expected)) +} + } + + test("invalid json with leading nulls - from file (multiLine=false)") { +import testImplicits._ +withTempDir { tempDir => + val path = tempDir.getAbsolutePath + Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) + val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) + val df = +spark.read.format(dataSourceName).option("multiLine", false).schema(schema).load(path) + checkAnswer(df, Seq(Row(1, null), Row(null, badJson))) +} + } + + test("invalid json with leading nulls - from dataset") { --- End diff -- This test suite is still in ` org.apache.spark.sql.sources`. We should move these test suite to `/sql/core` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 Thanks, I didn't know its existence. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20319 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20323 The only downside is spreading the CI here across many different systems. I know we add Appveyor because it was the only way to test on Windows (right?). Adding Travis too just for Java style checks is more questionable. Yes it has nothing to do with Jenkins though. I think we've just punted on this and accepted that Java style checks need to be executed manually once in a while. One middle-ground is to enable style checks in the Jenkins jobs besides the PR builder. You still don't catch violations at the time a PR is submitted, but at least catch them automatically, promptly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20319 **[Test build #86391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86391/testReport)** for PR 20319 at commit [`b6e06e8`](https://github.com/apache/spark/commit/b6e06e8e280f97560a342e287072f0b49e85bb79). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest ` * `class GaussianMixtureSuite extends MLTest with DefaultReadWriteTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19340 **[Test build #4065 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4065/testReport)** for PR 19340 at commit [`fda93ae`](https://github.com/apache/spark/commit/fda93aeadd782d520f32eb34475e3a7fa349c425). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20323 I'm wondering why we are okay for AppVoyer and not okay for Travis CI. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19340 I think it may not be responding now for whatever reason. I use https://spark-prs.appspot.com/ to view and trigger tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 @srowen sorry, I don't know why but it seems that I cannot start new jenkins jobs for this PR... May you white-list it or trigger a new test please? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20323: [BUILD][MINOR] Fix java style check issues
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20323 @HyukjinKwon . Travis CI will trigger at every commit. What I mean is our script can check the Java changes only. @sameeragarwal . Travis CI is independently running on Travis CI site like AppVoyer. That is the exact reason why I added Travis CI. - Travis CI will finish faster than Jenkins. - Travis CI will not add a time or any overload to Jenkins. Please see [this](https://travis-ci.org/dongjoon-hyun/spark/builds). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20321: [SPARK-23152][ML] - Correctly guard against empty...
Github user tovbinm commented on a diff in the pull request: https://github.com/apache/spark/pull/20321#discussion_r162678515 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala --- @@ -109,7 +109,7 @@ abstract class Classifier[ case None => // Get number of classes from dataset itself. val maxLabelRow: Array[Row] = dataset.select(max($(labelCol))).take(1) -if (maxLabelRow.isEmpty) { +if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) { --- End diff -- @dongjoon-hyun done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18983 cc @liufengdb --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20316 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20316 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20025: [SPARK-22837][SQL]Session timeout checker does no...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20025#discussion_r162673886 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java --- @@ -23,11 +23,7 @@ import java.util.ArrayList; import java.util.Date; import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; -import java.util.concurrent.Future; -import java.util.concurrent.LinkedBlockingQueue; -import java.util.concurrent.ThreadPoolExecutor; -import java.util.concurrent.TimeUnit; --- End diff -- revert this back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20087 @fjh100456 Thanks for working on it! It is pretty close to be merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162673688 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.Path +import org.apache.orc.OrcConf.COMPRESS +import org.apache.parquet.hadoop.ParquetOutputFormat +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.execution.datasources.orc.OrcOptions +import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, ParquetTest} +import org.apache.spark.sql.hive.orc.OrcFileOperator +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf + +class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with BeforeAndAfterAll { + import spark.implicits._ + + override def beforeAll(): Unit = { +super.beforeAll() +(0 until maxRecordNum).toDF("a").createOrReplaceTempView("table_source") + } + + override def afterAll(): Unit = { +try { + spark.catalog.dropTempView("table_source") +} finally { + super.afterAll() +} + } + + private val maxRecordNum = 500 + + private def getConvertMetastoreConfName(format: String): String = format.toLowerCase match { +case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key +case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key + } + + private def getSparkCompressionConfName(format: String): String = format.toLowerCase match { +case "parquet" => SQLConf.PARQUET_COMPRESSION.key +case "orc" => SQLConf.ORC_COMPRESSION.key + } + + private def getHiveCompressPropName(format: String): String = format.toLowerCase match { +case "parquet" => ParquetOutputFormat.COMPRESSION +case "orc" => COMPRESS.getAttribute + } + + private def normalizeCodecName(format: String, name: String): String = { +format.toLowerCase match { + case "parquet" => ParquetOptions.shortParquetCompressionCodecNames(name).name() + case "orc" => OrcOptions.shortOrcCompressionCodecNames(name) +} + } + + private def getTableCompressionCodec(path: String, format: String): Seq[String] = { +val hadoopConf = spark.sessionState.newHadoopConf() +val codecs = format.toLowerCase match { + case "parquet" => for { +footer <- readAllFootersWithoutSummaryFiles(new Path(path), hadoopConf) +block <- footer.getParquetMetadata.getBlocks.asScala +column <- block.getColumns.asScala + } yield column.getCodec.name() + case "orc" => new File(path).listFiles().filter{ file => +file.isFile && !file.getName.endsWith(".crc") && file.getName != "_SUCCESS" + }.map { orcFile => + OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString + }.toSeq +} +codecs.distinct + } + + private def createTable( + rootDir: File, + tableName: String, + isPartitioned: Boolean, + format: String, + compressionCodec: Option[String]): Unit = { +val tblProperties = compressionCodec match { + case Some(prop) => s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')" + case _ => "" +} +val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" else "" +sql( + s""" +|CREATE TABLE $tableName(a int) +|$partitionCreate +|STORED AS $format +|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName' +|$tblProperties + """.stripMargin) + } + + private def writeDataToTable( + tableName: String, +
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162672650 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.Path +import org.apache.orc.OrcConf.COMPRESS +import org.apache.parquet.hadoop.ParquetOutputFormat +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.execution.datasources.orc.OrcOptions +import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, ParquetTest} +import org.apache.spark.sql.hive.orc.OrcFileOperator +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf + +class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with BeforeAndAfterAll { + import spark.implicits._ + + override def beforeAll(): Unit = { +super.beforeAll() +(0 until maxRecordNum).toDF("a").createOrReplaceTempView("table_source") + } + + override def afterAll(): Unit = { +try { + spark.catalog.dropTempView("table_source") +} finally { + super.afterAll() +} + } + + private val maxRecordNum = 500 + + private def getConvertMetastoreConfName(format: String): String = format.toLowerCase match { +case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key +case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key + } + + private def getSparkCompressionConfName(format: String): String = format.toLowerCase match { +case "parquet" => SQLConf.PARQUET_COMPRESSION.key +case "orc" => SQLConf.ORC_COMPRESSION.key + } + + private def getHiveCompressPropName(format: String): String = format.toLowerCase match { +case "parquet" => ParquetOutputFormat.COMPRESSION +case "orc" => COMPRESS.getAttribute + } + + private def normalizeCodecName(format: String, name: String): String = { +format.toLowerCase match { + case "parquet" => ParquetOptions.shortParquetCompressionCodecNames(name).name() + case "orc" => OrcOptions.shortOrcCompressionCodecNames(name) +} + } + + private def getTableCompressionCodec(path: String, format: String): Seq[String] = { +val hadoopConf = spark.sessionState.newHadoopConf() +val codecs = format.toLowerCase match { + case "parquet" => for { +footer <- readAllFootersWithoutSummaryFiles(new Path(path), hadoopConf) +block <- footer.getParquetMetadata.getBlocks.asScala +column <- block.getColumns.asScala + } yield column.getCodec.name() + case "orc" => new File(path).listFiles().filter{ file => +file.isFile && !file.getName.endsWith(".crc") && file.getName != "_SUCCESS" + }.map { orcFile => + OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString + }.toSeq +} +codecs.distinct + } + + private def createTable( + rootDir: File, + tableName: String, + isPartitioned: Boolean, + format: String, + compressionCodec: Option[String]): Unit = { +val tblProperties = compressionCodec match { + case Some(prop) => s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')" + case _ => "" +} +val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" else "" +sql( + s""" +|CREATE TABLE $tableName(a int) +|$partitionCreate +|STORED AS $format +|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName' +|$tblProperties + """.stripMargin) + } + + private def writeDataToTable( + tableName: String, +
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162671108 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -0,0 +1,349 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.Path +import org.apache.orc.OrcConf.COMPRESS +import org.apache.parquet.hadoop.ParquetOutputFormat +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.execution.datasources.orc.OrcOptions +import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, ParquetTest} +import org.apache.spark.sql.hive.orc.OrcFileOperator +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf + +class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with BeforeAndAfterAll { + import spark.implicits._ + + override def beforeAll(): Unit = { +super.beforeAll() +(0 until maxRecordNum).toDF("a").createOrReplaceTempView("table_source") + } + + override def afterAll(): Unit = { +try { + spark.catalog.dropTempView("table_source") +} finally { + super.afterAll() +} + } + + private val maxRecordNum = 50 + + private def getConvertMetastoreConfName(format: String): String = format.toLowerCase match { +case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key +case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key + } + + private def getSparkCompressionConfName(format: String): String = format.toLowerCase match { +case "parquet" => SQLConf.PARQUET_COMPRESSION.key +case "orc" => SQLConf.ORC_COMPRESSION.key + } + + private def getHiveCompressPropName(format: String): String = format.toLowerCase match { +case "parquet" => ParquetOutputFormat.COMPRESSION +case "orc" => COMPRESS.getAttribute + } + + private def normalizeCodecName(format: String, name: String): String = { +format.toLowerCase match { + case "parquet" => ParquetOptions.getParquetCompressionCodecName(name) + case "orc" => OrcOptions.getORCCompressionCodecName(name) +} + } + + private def getTableCompressionCodec(path: String, format: String): Seq[String] = { +val hadoopConf = spark.sessionState.newHadoopConf() +val codecs = format.toLowerCase match { + case "parquet" => for { +footer <- readAllFootersWithoutSummaryFiles(new Path(path), hadoopConf) +block <- footer.getParquetMetadata.getBlocks.asScala +column <- block.getColumns.asScala + } yield column.getCodec.name() + case "orc" => new File(path).listFiles().filter { file => +file.isFile && !file.getName.endsWith(".crc") && file.getName != "_SUCCESS" + }.map { orcFile => + OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString + }.toSeq +} +codecs.distinct + } + + private def createTable( + rootDir: File, + tableName: String, + isPartitioned: Boolean, + format: String, + compressionCodec: Option[String]): Unit = { +val tblProperties = compressionCodec match { + case Some(prop) => s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')" + case _ => "" +} +val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" else "" +sql( + s""" +|CREATE TABLE $tableName(a int) +|$partitionCreate +|STORED AS $format +|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName' +|$tblProperties + """.stripMargin) + } + + private def writeDataToTable( + tableName: String, + partitionV
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162673292 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -260,17 +282,21 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo def checkForTableWithoutCompressProp(format: String, compressCodecs: List[String]): Unit = { Seq(true, false).foreach { isPartitioned => Seq(true, false).foreach { convertMetastore => -checkTableCompressionCodecForCodecs( - format, - isPartitioned, - convertMetastore, - compressionCodecs = compressCodecs, - tableCompressionCodecs = List(null)) { - case (tableCompressionCodec, sessionCompressionCodec, realCompressionCodec, tableSize) => -// Always expect session-level take effect -assert(sessionCompressionCodec == realCompressionCodec) -assert(checkTableSize(format, sessionCompressionCodec, - isPartitioned, convertMetastore, tableSize)) +Seq(true, false).foreach { usingCTAS => + checkTableCompressionCodecForCodecs( +format, +isPartitioned, +convertMetastore, +usingCTAS, +compressionCodecs = compressCodecs, +tableCompressionCodecs = List(null)) { +case + (tableCompressionCodec, sessionCompressionCodec, realCompressionCodec, tableSize) => --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162671801 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -0,0 +1,349 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.Path +import org.apache.orc.OrcConf.COMPRESS +import org.apache.parquet.hadoop.ParquetOutputFormat +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.execution.datasources.orc.OrcOptions +import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, ParquetTest} +import org.apache.spark.sql.hive.orc.OrcFileOperator +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf + +class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with BeforeAndAfterAll { + import spark.implicits._ + + override def beforeAll(): Unit = { +super.beforeAll() +(0 until maxRecordNum).toDF("a").createOrReplaceTempView("table_source") + } + + override def afterAll(): Unit = { +try { + spark.catalog.dropTempView("table_source") +} finally { + super.afterAll() +} + } + + private val maxRecordNum = 50 + + private def getConvertMetastoreConfName(format: String): String = format.toLowerCase match { +case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key +case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key + } + + private def getSparkCompressionConfName(format: String): String = format.toLowerCase match { +case "parquet" => SQLConf.PARQUET_COMPRESSION.key +case "orc" => SQLConf.ORC_COMPRESSION.key + } + + private def getHiveCompressPropName(format: String): String = format.toLowerCase match { +case "parquet" => ParquetOutputFormat.COMPRESSION +case "orc" => COMPRESS.getAttribute + } + + private def normalizeCodecName(format: String, name: String): String = { +format.toLowerCase match { + case "parquet" => ParquetOptions.getParquetCompressionCodecName(name) + case "orc" => OrcOptions.getORCCompressionCodecName(name) +} + } + + private def getTableCompressionCodec(path: String, format: String): Seq[String] = { +val hadoopConf = spark.sessionState.newHadoopConf() +val codecs = format.toLowerCase match { + case "parquet" => for { +footer <- readAllFootersWithoutSummaryFiles(new Path(path), hadoopConf) +block <- footer.getParquetMetadata.getBlocks.asScala +column <- block.getColumns.asScala + } yield column.getCodec.name() + case "orc" => new File(path).listFiles().filter { file => +file.isFile && !file.getName.endsWith(".crc") && file.getName != "_SUCCESS" + }.map { orcFile => + OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString + }.toSeq +} +codecs.distinct + } + + private def createTable( + rootDir: File, + tableName: String, + isPartitioned: Boolean, + format: String, + compressionCodec: Option[String]): Unit = { +val tblProperties = compressionCodec match { + case Some(prop) => s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')" + case _ => "" +} +val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" else "" +sql( + s""" +|CREATE TABLE $tableName(a int) +|$partitionCreate +|STORED AS $format +|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName' +|$tblProperties + """.stripMargin) + } + + private def writeDataToTable( + tableName: String, + partitionV
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162673245 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -260,17 +282,21 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo def checkForTableWithoutCompressProp(format: String, compressCodecs: List[String]): Unit = { Seq(true, false).foreach { isPartitioned => Seq(true, false).foreach { convertMetastore => -checkTableCompressionCodecForCodecs( - format, - isPartitioned, - convertMetastore, - compressionCodecs = compressCodecs, - tableCompressionCodecs = List(null)) { - case (tableCompressionCodec, sessionCompressionCodec, realCompressionCodec, tableSize) => -// Always expect session-level take effect -assert(sessionCompressionCodec == realCompressionCodec) -assert(checkTableSize(format, sessionCompressionCodec, - isPartitioned, convertMetastore, tableSize)) +Seq(true, false).foreach { usingCTAS => + checkTableCompressionCodecForCodecs( +format, +isPartitioned, +convertMetastore, +usingCTAS, +compressionCodecs = compressCodecs, +tableCompressionCodecs = List(null)) { +case + (tableCompressionCodec, sessionCompressionCodec, realCompressionCodec, tableSize) => + // Always expect session-level take effect + assert(sessionCompressionCodec == realCompressionCodec) + assert(checkTableSize(format, sessionCompressionCodec, + isPartitioned, convertMetastore, usingCTAS, tableSize)) --- End diff -- ``` assert(checkTableSize( format, sessionCompressionCodec, isPartitioned, convertMetastore, usingCTAS, tableSize)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162672130 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -82,4 +82,7 @@ object ParquetOptions { "snappy" -> CompressionCodecName.SNAPPY, "gzip" -> CompressionCodecName.GZIP, "lzo" -> CompressionCodecName.LZO) + + def getParquetCompressionCodecName(name: String): String = +shortParquetCompressionCodecNames(name).name() --- End diff -- ```Scala def getParquetCompressionCodecName(name: String): String = { shortParquetCompressionCodecNames(name).name() } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86388/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20316 **[Test build #86388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86388/testReport)** for PR 20316 at commit [`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20319 **[Test build #86391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86391/testReport)** for PR 20319 at commit [`b6e06e8`](https://github.com/apache/spark/commit/b6e06e8e280f97560a342e287072f0b49e85bb79). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20319 Jenkins, add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19340 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/41/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19340 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19340 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19340 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/40/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20275 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vectors.sp...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20275 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19340#discussion_r162650427 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -546,10 +577,112 @@ object KMeans { .run(data) } + private[spark] def validateInitMode(initMode: String): Boolean = { +initMode match { + case KMeans.RANDOM => true + case KMeans.K_MEANS_PARALLEL => true + case _ => false +} + } + + private[spark] def validateDistanceMeasure(distanceMeasure: String): Boolean = { +distanceMeasure match { + case DistanceMeasure.EUCLIDEAN => true + case DistanceMeasure.COSINE => true + case _ => false +} + } +} + +/** + * A vector with its norm for fast distance computation. + * + * @see [[org.apache.spark.mllib.clustering.KMeans#fastSquaredDistance]] --- End diff -- This seems to fail the doc build for some reason. You can just remove it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19340 **[Test build #4064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4064/testReport)** for PR 19340 at commit [`5ed87ea`](https://github.com/apache/spark/commit/5ed87ea9d946228dbf84d624e019008bb98219c7). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data
Github user brad-kaiser commented on the issue: https://github.com/apache/spark/pull/19041 Hey @vanzin, I just wanted to follow up and see if you've had a chance to look at this. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19340 **[Test build #4064 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4064/testReport)** for PR 19340 at commit [`5ed87ea`](https://github.com/apache/spark/commit/5ed87ea9d946228dbf84d624e019008bb98219c7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20324 **[Test build #86390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86390/testReport)** for PR 20324 at commit [`673c520`](https://github.com/apache/spark/commit/673c52042a70b5dfc061dd053ae2e6553a4a2612). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/39/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20324 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20328: [SPARK-23000] [TEST] Keep Derby DB Location Uncha...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20328 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20328 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86387/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/19872 @ueshin I think all comments are addressed. Can you take a final look? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r162635572 --- Diff: python/pyspark/sql/tests.py --- @@ -4279,6 +4273,425 @@ def test_unsupported_types(self): df.groupby('id').apply(f).collect() +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") +class GroupbyAggPandasUDFTests(ReusedSQLTestCase): + +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i * 1.0) + col('id') for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))) \ +.drop('vs') \ +.withColumn('w', lit(1.0)) + +@property +def python_plus_one(self): +from pyspark.sql.functions import udf + +@udf('double') +def plus_one(v): +assert isinstance(v, (int, float)) +return v + 1 +return plus_one + +@property +def pandas_scalar_plus_two(self): +import pandas as pd +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.SCALAR) +def plus_two(v): +assert isinstance(v, pd.Series) +return v + 2 +return plus_two + +@property +def pandas_agg_mean_udf(self): +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def avg(v): +return v.mean() +return avg + +@property +def pandas_agg_sum_udf(self): +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def sum(v): +return v.sum() +return sum + +@property +def pandas_agg_weighted_mean_udf(self): +import numpy as np +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def weighted_mean(v, w): +return np.average(v, weights=w) +return weighted_mean + +def test_basic(self): +from pyspark.sql.functions import col, lit, sum, mean + +df = self.data +weighted_mean_udf = self.pandas_agg_weighted_mean_udf + +# Groupby one column and aggregate one UDF with literal +result1 = df.groupby('id').agg(weighted_mean_udf(df.v, lit(1.0))).sort('id') +expected1 = df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id') --- End diff -- Ah. No worries. Thanks for clarification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86384/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20328 **[Test build #86384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86384/testReport)** for PR 20328 at commit [`b9aa879`](https://github.com/apache/spark/commit/b9aa879104ab010700e5f19c457fd791cc255ff7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20216: [SPARK-23024][WEB-UI]Spark ui about the contents ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20216 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20216: [SPARK-23024][WEB-UI]Spark ui about the contents of the ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20216 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86389/testReport)** for PR 20330 at commit [`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [WIP][SPARK-23011] Support alternative function form wit...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20295 Yep, that's correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #4063 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4063/testReport)** for PR 20330 at commit [`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20330 Jenkins add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4062/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4060/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162622469 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -65,10 +68,13 @@ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends We }.map { job => val jobId = job.jobId val status = job.status - val jobDescription = store.lastStageAttempt(job.stageIds.max).description - val displayJobDescription = jobDescription -.map(UIUtils.makeDescription(_, "", plainText = true).text) -.getOrElse("") + val (_, lastStageDescription) = lastStageNameAndDescription(store, job) + val displayJobDescription = +if (lastStageDescription.isEmpty) { + job.name --- End diff -- Using job.name instead of "" to behave more like the pre-2.3 version: https://github.com/smurakozi/spark/blob/772e4648d95bda3353723337723543c741ea8476/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala#L70 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user smurakozi commented on the issue: https://github.com/apache/spark/pull/20330 cc @jiangxb1987, @srowen, @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20316 **[Test build #86388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86388/testReport)** for PR 20316 at commit [`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/38/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20316 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20316 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86386/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20316 **[Test build #86386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86386/testReport)** for PR 20316 at commit [`ad976fe`](https://github.com/apache/spark/commit/ad976fe175e9cc07cfff859dd7f7331ad424aa8e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user smurakozi commented on the issue: https://github.com/apache/spark/pull/20330 @guoxiaolongzte could you please check if this change fixes the issue you have observed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
GitHub user smurakozi opened a pull request: https://github.com/apache/spark/pull/20330 [SPARK-23121][core] Fix for ui becoming unaccessible for long running streaming apps ## What changes were proposed in this pull request? The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered. To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing. ## How was this patch tested? Manual testing of the UI, also using the test command reported in SPARK-23121: ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/smurakozi/spark SPARK-23121 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20330.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20330 commit 94d50b42d6bf233afd398049c95386920c21c252 Author: Sandor Murakozi Date: 2018-01-19T10:59:36Z Fixed issue caused by the store cleaning up old stages commit d60ae4f39337b91118324064c6a3dc58a3fc2832 Author: Sandor Murakozi Date: 2018-01-19T11:33:27Z JobPage doesn't break if operationGraphForJob is not in the store for a jobid commit 832378d25245126c285e794fadcaea019b70a78a Author: Sandor Murakozi Date: 2018-01-19T11:34:59Z lastStageNameAndDescription uses store.lastStageAttempt commit 6525ef4eda0bf65bbbcb842495341afc8c5971ad Author: Sandor Murakozi Date: 2018-01-19T12:15:33Z Changed message in case of missing DAG visualization info --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4061/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20026#discussion_r162610474 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -152,7 +153,7 @@ private class DiskBlockData( file: File, blockSize: Long) extends BlockData { - override def toInputStream(): InputStream = new FileInputStream(file) + override def toInputStream(): InputStream = new NioBufferedFileInputStream(file) --- End diff -- IIUC, the returned `InputStream` will be deserialized in `BlockManger`, And deserializer will copy the data from direct memory to on-heap memory, otherwise how do we visit POJO? So unless if we purely manipulate binary data, otherwise we have to copy the data to on-heap. Please correct me if I'm wrong. Besides, I think this is not the hotspot, so memory copying should not bring in big overhead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86383/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #86383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86383/testReport)** for PR 20087 at commit [`99271d6`](https://github.com/apache/spark/commit/99271d670a0aed444ad624d56304d94490eed0cb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/37/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20281: [SPARK-23089][STS] Recreate session log directory...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20281 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20281 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86381/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20328 **[Test build #86381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86381/testReport)** for PR 20328 at commit [`5b97119`](https://github.com/apache/spark/commit/5b971190485468ebdc436dd98bad4e61fbc574bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20329: Merge pull request #1 from apache/master
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20329 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20329: Merge pull request #1 from apache/master
Github user simon-wind closed the pull request at: https://github.com/apache/spark/pull/20329 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20329: Merge pull request #1 from apache/master
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20329 @simon-wind, seems mistakenly open. Could you close this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20329: Merge pull request #1 from apache/master
Github user simon-wind commented on the issue: https://github.com/apache/spark/pull/20329 merge latest branch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20329: Merge pull request #1 from apache/master
GitHub user simon-wind opened a pull request: https://github.com/apache/spark/pull/20329 Merge pull request #1 from apache/master Fork The Latest Version ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/simon-wind/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20329.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20329 commit 27dd069615237f601da2d3d9edc403824f0dd6af Author: Simon <1031131669@...> Date: 2017-06-09T03:59:45Z Merge pull request #1 from apache/master Fork The Latest Version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20281 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20281 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86385/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20281 **[Test build #86385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86385/testReport)** for PR 20281 at commit [`8b4eb1c`](https://github.com/apache/spark/commit/8b4eb1c33c525ba3eaab79fe1efa4f61fba7367f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86378/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20328 **[Test build #86378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86378/testReport)** for PR 20328 at commit [`a7359a9`](https://github.com/apache/spark/commit/a7359a9634966851c14be02cbd6468e5c41a4347). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SessionStateSuite extends SparkFunSuite ` * `class HiveSessionStateSuite extends SessionStateSuite with TestHiveSingleton ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org