[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22960 Ur, maybe, I'm not clear to the point. The refactoring scope of this PR is limited to the new tests here. ``` test("from_csv uses DDL strings for defining a schema - java") test("roundtrip to_csv -> from_csv") test("roundtrip from_csv -> to_csv") test("infers schemas of a CSV string and pass to to from_csv") test("Support to_csv in SQL") test("Support from_csv in SQL") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20944 Please describe manual tests and how it relates to actual usecase. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22960 Yes. It would be great if we do that in this PR. When I did the similar thing for ORC (`port tests from Parquet to ORC`, `port from old ORC to new ORC`). I received the same comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r231404180 --- Diff: R/pkg/R/functions.R --- @@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") column(jc) }) +#' @details +#' \code{schema_of_json}: Parses a JSON string and infers its schema in DDL format. +#' +#' @rdname column_collection_functions +#' @aliases schema_of_json schema_of_json,characterOrColumn-method +#' @examples +#' +#' \dontrun{ +#' json <- '{"name":"Bob"}' +#' df <- sql("SELECT * FROM range(1)") +#' head(select(df, schema_of_json(json)))} +#' @note schema_of_json since 3.0.0 +setMethod("schema_of_json", signature(x = "characterOrColumn"), + function(x, ...) { +if (class(x) == "character") { + col <- callJStatic("org.apache.spark.sql.functions", "lit", x) +} else { + col <- x@jc --- End diff -- Yup.. only literal works but columns don't work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r231403827 --- Diff: R/pkg/R/functions.R --- @@ -319,6 +319,27 @@ setMethod("acos", column(jc) }) +#' @details +#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group. +#' +#' @rdname column_aggregate_functions +#' @aliases approx_count_distinct approx_count_distinct,Column-method +#' @examples +#' +#' \dontrun{ +#' head(select(df, approx_count_distinct(df$gear))) +#' head(select(df, approx_count_distinct(df$gear, 0.02))) +#' head(select(df, countDistinct(df$gear, df$cyl))) +#' head(select(df, n_distinct(df$gear))) +#' head(distinct(select(df, "gear")))} --- End diff -- we only need one set - they both are `@rdname column_aggregate_functions` so will duplicate all other examples --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r231403096 --- Diff: R/pkg/R/functions.R --- @@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") column(jc) }) +#' @details +#' \code{schema_of_json}: Parses a JSON string and infers its schema in DDL format. +#' +#' @rdname column_collection_functions +#' @aliases schema_of_json schema_of_json,characterOrColumn-method +#' @examples +#' +#' \dontrun{ +#' json <- '{"name":"Bob"}' +#' df <- sql("SELECT * FROM range(1)") +#' head(select(df, schema_of_json(json)))} +#' @note schema_of_json since 3.0.0 +setMethod("schema_of_json", signature(x = "characterOrColumn"), + function(x, ...) { +if (class(x) == "character") { + col <- callJStatic("org.apache.spark.sql.functions", "lit", x) +} else { + col <- x@jc --- End diff -- you are saying this `select(df, schema_of_csv(df$schemaCol))` is not allowed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22960 **[Test build #98542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98542/testReport)** for PR 22960 at commit [`1d3a31b`](https://github.com/apache/spark/commit/1d3a31b478622a8e76dfeef0f71973aa71730859). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20944 Sorry, why was this change required? I don't see https://github.com/apache/spark/pull/20944#issuecomment-379525776 is addressed Can you elaborate please? Why do we make `org.apache.derby` as shared? Ideally, minor or maintenance versions of `derby` can be dumped up, and they shouldn't be shared unless there's a strong reason to keep it shared, for instance, making class resolution failed. How did you reproduce this and why the unit test is not added? I found an actual issue while working on Apache Livy Spark 2.4 support. I am still investigating how it relates with the test failures but at the very list I see this specific commit matters since Apache Livy unittests pass without this specific commit. Adding @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r231402726 --- Diff: R/pkg/R/SQLContext.R --- @@ -147,6 +147,30 @@ getDefaultSqlSource <- function() { l[["spark.sql.sources.default"]] } +writeToTempFileInArrow <- function(rdf, numPartitions) { + stopifnot(require("arrow", quietly = TRUE)) + stopifnot(require("withr", quietly = TRUE)) + numPartitions <- if (!is.null(numPartitions)) { +numToInt(numPartitions) + } else { +1 + } + fileName <- tempfile() --- End diff -- might need to give it a dir prefix to use - the tempfile default is not CRAN compliant and possibly some ACL issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r231402235 --- Diff: R/pkg/R/SQLContext.R --- @@ -147,6 +147,30 @@ getDefaultSqlSource <- function() { l[["spark.sql.sources.default"]] } +writeToTempFileInArrow <- function(rdf, numPartitions) { + stopifnot(require("arrow", quietly = TRUE)) --- End diff -- btw, is it worthwhile to check the arrow package version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r231402297 --- Diff: R/pkg/R/SQLContext.R --- @@ -172,15 +196,17 @@ getDefaultSqlSource <- function() { createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0, numPartitions = NULL) { sparkSession <- getSparkSession() - + conf <- callJMethod(sparkSession, "conf") + arrowEnabled <- tolower(callJMethod(conf, "get", "spark.sql.execution.arrow.enabled")) == "true" --- End diff -- I think you can use sparkR.conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r231402063 --- Diff: R/pkg/R/SQLContext.R --- @@ -147,6 +147,30 @@ getDefaultSqlSource <- function() { l[["spark.sql.sources.default"]] } +writeToTempFileInArrow <- function(rdf, numPartitions) { + stopifnot(require("arrow", quietly = TRUE)) + stopifnot(require("withr", quietly = TRUE)) --- End diff -- is it possible to not depend on this withr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r231401994 --- Diff: R/pkg/R/SQLContext.R --- @@ -147,6 +147,30 @@ getDefaultSqlSource <- function() { l[["spark.sql.sources.default"]] } +writeToTempFileInArrow <- function(rdf, numPartitions) { + stopifnot(require("arrow", quietly = TRUE)) --- End diff -- perhaps best to add a clearer error message? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22960 > Sorry, but Porting seems to be not the best way to do this. I saw a bunch of common code in `Csv`/`JsonExpressionsSuite`, `Csv`/`JsonFunctionsSuite` and `Csv`/`JsonSuite`. I just didn't want to overcomplicate the tests especially in the case when there are small differences. So, passing functions (with inputs and expected result) to template functions will not make them easy to read. > Could you refactor this by introducing new test helper functions? In any case, I will try that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231399775 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), + Row(Row(1, "haa")) :: Nil) + } + + test("roundtrip to_csv -> from_csv") { +val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct") +val schema = df.schema(0).dataType.asInstanceOf[StructType] +val options = Map.empty[String, String] +val readback = df.select(to_csv($"struct").as("csv")) + .select(from_csv($"csv", schema, options).as("struct")) + +checkAnswer(df, readback) + } + + test("roundtrip from_csv -> to_csv") { +val df = Seq(Some("1"), None).toDF("csv") +val schema = new StructType().add("a", IntegerType) +val options = Map.empty[String, String] +val readback = df.select(from_csv($"csv", schema, options).as("struct")) + .select(to_csv($"struct").as("csv")) + +checkAnswer(df, readback) + } + + test("infers schemas of a CSV string and pass to to from_csv") { +val in = Seq("""0.123456789,987654321,"San Francisco).toDS() +val options = Map.empty[String, String].asJava +val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), options) as "parsed") +val expected = StructType(Seq(StructField( + "parsed", + StructType(Seq( +StructField("_c0", DoubleType, true), +StructField("_c1", IntegerType, true), +StructField("_c2", StringType, true)) + +assert(out.schema == expected) + } + + test("Support to_csv in SQL") { --- End diff -- This is only for double check that the functions are available/(and work) from expressions in Scala. Probably we can make the test smaller. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22951 **[Test build #98541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98541/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22958: [SPARK-25952][SQL] Passing actual schema to JacksonParse...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22958 @cloud-fan @HyukjinKwon May I ask you to have a look at this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22938 @HyukjinKwon Are you ok with the changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15899 Since the issue is closed, this PR will be closed at the next infra clean ups. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15899 +1 for the decision and closing it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15899 I see. Thank you for the clear decision, @rxin ! I'll close the issue as `Won't Fix`. And, could you close this PR, @reggert ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22818 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15899 Thanks for the example. I didn't even know that was possible in earlier versions. I just looked it up: looks like Scala 2.11 rewrites for comprehensions into map, filter, and flatMap. That said, I don't think it's a bad deal that this no longer works, given it was never intended to work and there's been a deprecation warning. I still maintain that it is risky to support this, because Scala users learn for comprehension not just for a simple "for filter yield", but as a way to chain multiple generators together, which is not really well supported by Spark (even if it is, it's a really bad operation for users to shoot themselves in the foot because it would be a cartesian product). Rather than faking it as a local collection, users should know RDD is not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15899 Hi, @rxin , @srowen , @dbtsai , @felixcheung , @gatorsmile , @cloud-fan . I know this was not a recommended style, but there really exists users with this issue. And, from Spark 2.4.0, we are releasing Scala-2.12 version as an experiment. Here, this case shows a regression because previously the code works with a warning. I'm +1 for this idea for Spark's Scala-2.12 supports. How do you think about this? ```scala To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context available as 'sc' (master = local[*], app id = local-1541571276105). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 /_/ Using Scala version 2.12.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181) Type in expressions to have them evaluated. Type :help for more information. scala> (for (n <- sc.parallelize(Seq(1,2,3)) if n > 2) yield n).toDebugString :25: error: value withFilter is not a member of org.apache.spark.rdd.RDD[Int] (for (n <- sc.parallelize(Seq(1,2,3)) if n > 2) yield n).toDebugString ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15899: [SPARK-18466] added withFilter method to RDD
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15899#discussion_r231390266 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -387,6 +387,14 @@ abstract class RDD[T: ClassTag]( preservesPartitioning = true) } + /** +* Return a new RDD containing only the elements that satisfy a predicate. --- End diff -- Why bother unless we have consensus to introduce this API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19796: [SPARK-22581][SQL] Catalog api does not allow to ...
Github user timvw closed the pull request at: https://github.com/apache/spark/pull/19796 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15899: [SPARK-18466] added withFilter method to RDD
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15899#discussion_r231389555 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -387,6 +387,14 @@ abstract class RDD[T: ClassTag]( preservesPartitioning = true) } + /** +* Return a new RDD containing only the elements that satisfy a predicate. --- End diff -- Hi, @reggert . Could you fix the indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22089: [SPARK-25098][SQL]‘Cast’ will return NULL whe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22089 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22943 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22943 Thank you, @wangyum and @cloud-fan . Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19796: [SPARK-22581][SQL] Catalog api does not allow to ...
Github user timvw commented on a diff in the pull request: https://github.com/apache/spark/pull/19796#discussion_r231382828 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -411,7 +410,29 @@ abstract class Catalog { tableName: String, source: String, schema: StructType, - options: Map[String, String]): DataFrame + options: Map[String, String]): DataFrame = { +createTable(tableName, source, schema, options, Nil) + } + + /** +* :: Experimental :: +* (Scala-specific) +* Create a table based on the dataset in a data source, a schema, a set of options and a set of partition columns. +* Then, returns the corresponding DataFrame. +* +* @param tableName is either a qualified or unqualified name that designates a table. +* If no database identifier is provided, it refers to a table in +* the current database. +* @since ??? +*/ + @Experimental + @InterfaceStability.Evolving + def createTable( +tableName: String, +source: String, +schema: StructType, +options: Map[String, String], +partitionColumnNames : Seq[String]): DataFrame --- End diff -- Imho, having an API without options to specify partitioning in a big-data context is just pointless. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22943#discussion_r231382309 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite { c = Calendar.getInstance() c.set(2015, 2, 18, 0, 0, 0) c.set(Calendar.MILLISECOND, 0) -assert(stringToDate(UTF8String.fromString("2015-03-18")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T")).get === - millisToDays(c.getTimeInMillis)) +Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", "2015-03-18 123142", --- End diff -- ah i see --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22932 Could you review this, @gatorsmile ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22943#discussion_r231381218 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite { c = Calendar.getInstance() c.set(2015, 2, 18, 0, 0, 0) c.set(Calendar.MILLISECOND, 0) -assert(stringToDate(UTF8String.fromString("2015-03-18")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T")).get === - millisToDays(c.getTimeInMillis)) +Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", "2015-03-18 123142", --- End diff -- New test cases (with space padding) are added; e.g. ` 2015-03-18` and ` 2015-03-18 `. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231380992 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), --- End diff -- The only difference is `from_csv` and `from_json`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22943#discussion_r231380552 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite { c = Calendar.getInstance() c.set(2015, 2, 18, 0, 0, 0) c.set(Calendar.MILLISECOND, 0) -assert(stringToDate(UTF8String.fromString("2015-03-18")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get === - millisToDays(c.getTimeInMillis)) -assert(stringToDate(UTF8String.fromString("2015-03-18T")).get === - millisToDays(c.getTimeInMillis)) +Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", "2015-03-18 123142", --- End diff -- the test result doesn't change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22952: [SPARK-20568][SS] Rename files which are complete...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22952#discussion_r231378889 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -530,6 +530,8 @@ Here are the details of all the sources in Spark. "s3://a/dataset.txt" "s3n://a/b/dataset.txt" "s3a://a/b/c/dataset.txt" + +renameCompletedFiles: whether to rename completed files in previous batch (default: false). If the option is enabled, input file will be renamed with additional postfix "_COMPLETED_". This is useful to clean up old input files to save space in storage. --- End diff -- Hi, @HeartSaVioR . Renaming is expensive in S3, isn't it? I don't worry about HDFS, but do you know if there is potential side effects like performance degradation in the cloud environment, especially with continuous processing mode? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22951 Could you rebase this once again, @MaxGekk ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22943 Could you review this, @gatorsmile and @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user gss2002 commented on the issue: https://github.com/apache/spark/pull/22867 @vanzin you are right! I appreciate the help with this one. I will cut a patch in the AM after testing on a large scale cluster job that is taking from IBM MQ and ETLing data and shipping off to Kafka. But this looks to work: val nonExistentDirectory = new File( System.getProperty("java.io.tmpdir"), UUID.randomUUID().toString).toURI.toString --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22943 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98540/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22943 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22943 **[Test build #98540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98540/testReport)** for PR 22943 at commit [`b866d65`](https://github.com/apache/spark/commit/b866d65c534d016f814946236b55ff05f79a4490). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98537/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98537/testReport)** for PR 22921 at commit [`af748d5`](https://github.com/apache/spark/commit/af748d5a2680ffbea859f186cf48c97e1d700ee5). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22951 Looks good. I or someone else should take a closer look before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22956: [SPARK-25950][SQL] from_csv should respect to spa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22956 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98539/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98539/testReport)** for PR 22932 at commit [`ef49a27`](https://github.com/apache/spark/commit/ef49a277d3fd39c6fd91b3fcda65f660b833ec95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22956: [SPARK-25950][SQL] from_csv should respect to spark.sql....
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22956 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98536/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22956: [SPARK-25950][SQL] from_csv should respect to spa...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22956#discussion_r231370599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -92,8 +93,14 @@ case class CsvToStructs( } } + val nameOfCorruptRecord = SQLConf.get.getConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD) --- End diff -- Yea, I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98536/testReport)** for PR 22921 at commit [`6bcbf79`](https://github.com/apache/spark/commit/6bcbf79a14866c2d6e11bfa7b89a095584cb8228). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98538/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98538/testReport)** for PR 22275 at commit [`bf2feec`](https://github.com/apache/spark/commit/bf2feec2ef023177d72ac1137dbd1b3a02eb9a89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98535/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22937 **[Test build #98535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98535/testReport)** for PR 22937 at commit [`b500199`](https://github.com/apache/spark/commit/b50019987da954956e407c55e56a4329f8e5633f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22617: [SPARK-25484][SQL][TEST] Refactor ExternalAppendOnlyUnsa...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22617 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22911#discussion_r231359962 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -123,7 +126,11 @@ private[spark] class KubernetesClusterSchedulerBackend( } override def createDriverEndpoint(properties: Seq[(String, String)]): DriverEndpoint = { -new KubernetesDriverEndpoint(rpcEnv, properties) +new KubernetesDriverEndpoint(sc.env.rpcEnv, properties) + } + + override protected def createTokenManager(): Option[HadoopDelegationTokenManager] = { +Some(new HadoopDelegationTokenManager(conf, sc.hadoopConfiguration)) --- End diff -- Yeah, I can always throw up a follow-up for that. No worries --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231359624 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- Ok. I will try to make a PR and see if we can have better fix for this. Thanks for suggestion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22956: [SPARK-25950][SQL] from_csv should respect to spark.sql....
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22956 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22956: [SPARK-25950][SQL] from_csv should respect to spa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22956#discussion_r231359024 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -92,8 +93,14 @@ case class CsvToStructs( } } + val nameOfCorruptRecord = SQLConf.get.getConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD) --- End diff -- should this be private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231358749 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- I wouldn't special-case primitive type while this is a general problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for ContextBarrierSta...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22165 gental ping @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231358690 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- I wouldn't special-case primitive type while this is a general problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22961: [SPARK-25947][SQL] Reduce memory usage in ShuffleExchang...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22961 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22961: [SPARK-25947][SQL] Reduce memory usage in ShuffleExchang...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22961 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22961: [SPARK-25947][SQL] Reduce memory usage in ShuffleExchang...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22961 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22961: [SPARK-25947][SQL] Reduce memory usage in Shuffle...
GitHub user mu5358271 opened a pull request: https://github.com/apache/spark/pull/22961 [SPARK-25947][SQL] Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns ## What changes were proposed in this pull request? When sorting rows, ShuffleExchangeExec uses the entire row instead of just the columns references in SortOrder to create the RangePartitioner. This causes the RangePartitioner to sample entire rows to create rangeBounds and can cause OOM issues on the driver when rows contain large fields. This change creates a projection and only use columns involved in the SortOrder for the RangePartitioner ## How was this patch tested? started a local spark-shell with a small spark.driver.maxResultSize: ``` spark-shell --master 'local[16]' --conf spark.driver.maxResultSize=128M --driver-memory 4g ``` and ran the following script: ``` import com.google.common.io.Files import org.apache.spark.SparkContext import org.apache.spark.sql.SparkSession import scala.util.Random @transient val sc = SparkContext.getOrCreate() @transient val spark = SparkSession.builder().getOrCreate() import spark.implicits._ val path = Files.createTempDir().toString // this creates a dataset with 1024 entries, each 1MB in size, across 16 partitions sc.parallelize(0 until (1 << 10), sc.defaultParallelism). map(_ => Array.fill(1 << 18)(Random.nextInt)). toDS. write.mode("overwrite").parquet(path) spark.read.parquet(path). orderBy('value (0)). write.mode("overwrite").parquet(s"$path-sorted") spark.read.parquet(s"$path-sorted").show ``` execution would fail when initializing RangePartitioner without this change. execution succeeds and generates a correctly sorted dataset with this change. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mu5358271/spark sort-improvement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22961 commit 61288d40475a4145561ea4be566bc63b78c25b5a Author: shuhengd Date: 2018-11-06T04:23:18Z [SPARK-25947][SQL] Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22855: [SPARK-25839] [Core] Implement use of KryoPool in KryoSe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22855 **[Test build #4417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4417/testReport)** for PR 22855 at commit [`60310c0`](https://github.com/apache/spark/commit/60310c0e18613f0c32f19b73e6ac25a49ba25e86). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231350156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- Thanks, I see. For this primitive type case, is current fix ok? Or we should deal with case classes together? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22926: [SPARK-25917][Spark UI] memoryMetrics should be Json ign...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22926 Is this a problem in master at all? The data is serialized with `JacksonMessageWriter`, which seems to be configured properly: ``` private[v1] class JacksonMessageWriter extends MessageBodyWriter[Object]{ ... mapper.setSerializationInclusion(JsonInclude.Include.NON_ABSENT) ``` An easy way to answer that question is to write a unit test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22590 I wonder how important it is. I know `spark-csv` at Databricks supported different quote modes and that's gone when we ported that into Spark - the root cause was due to replacing the library from apache-common into univocity. After few years, I only saw one request about reviving the quote mode proposed here - so I suspect how important it is. Basically, @MaxGekk described my stand correctly. Can we investigate a way to set the arbitrary parse settings options? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22911#discussion_r231348306 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -123,7 +126,11 @@ private[spark] class KubernetesClusterSchedulerBackend( } override def createDriverEndpoint(properties: Seq[(String, String)]): DriverEndpoint = { -new KubernetesDriverEndpoint(rpcEnv, properties) +new KubernetesDriverEndpoint(sc.env.rpcEnv, properties) + } + + override protected def createTokenManager(): Option[HadoopDelegationTokenManager] = { +Some(new HadoopDelegationTokenManager(conf, sc.hadoopConfiguration)) --- End diff -- Ah, ok I get it now. I can do that. I'll try to include support for (3) but it depends on how much I have to touch other parts of the code. Hopefully not much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22590 They should be documented in API doc like `DataFrameReader.scala`. For site, we should avoid doc duplication - It's a general issue to document options. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346067 --- Diff: docs/configuration.md --- @@ -266,6 +266,40 @@ of the most common options to set are: Only has effect in Spark standalone mode or Mesos cluster deploy mode. + + spark.driver.log.dfsDir + (none) + +Base directory in which Spark driver logs are synced, if spark.driver.log.persistToDfs.enabled +is true. Within this base directory, each application logs the driver logs to an application specific file. +Users may want to set this to a unified location like an HDFS directory so driver log files can be persisted +for later usage. This directory should allow any Spark user to read/write files and the Spark History Server +user to delete files. Additionally, older logs from this directory are cleaned by + Spark History Server if --- End diff -- remove space after `>` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346390 --- Diff: docs/configuration.md --- @@ -266,6 +266,40 @@ of the most common options to set are: Only has effect in Spark standalone mode or Mesos cluster deploy mode. + + spark.driver.log.dfsDir + (none) + +Base directory in which Spark driver logs are synced, if spark.driver.log.persistToDfs.enabled +is true. Within this base directory, each application logs the driver logs to an application specific file. +Users may want to set this to a unified location like an HDFS directory so driver log files can be persisted +for later usage. This directory should allow any Spark user to read/write files and the Spark History Server +user to delete files. Additionally, older logs from this directory are cleaned by + Spark History Server if +spark.history.fs.driverlog.cleaner.enabled is true and, if they are older than max age configured +at spark.history.fs.driverlog.cleaner.maxAge. + + + + spark.driver.log.persistToDfs.enabled + false + +If true, spark application running in client mode will write driver logs to a persistent storage, configured +in spark.driver.log.dfsDir. If spark.driver.log.dfsDir is not configured, driver logs +will not be persisted. Additionally, enable the cleaner by setting spark.history.fs.driverlog.cleaner.enabled +to true in Spark History Server. --- End diff -- no space after `>` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346161 --- Diff: docs/configuration.md --- @@ -266,6 +266,40 @@ of the most common options to set are: Only has effect in Spark standalone mode or Mesos cluster deploy mode. + + spark.driver.log.dfsDir + (none) + +Base directory in which Spark driver logs are synced, if spark.driver.log.persistToDfs.enabled +is true. Within this base directory, each application logs the driver logs to an application specific file. +Users may want to set this to a unified location like an HDFS directory so driver log files can be persisted +for later usage. This directory should allow any Spark user to read/write files and the Spark History Server +user to delete files. Additionally, older logs from this directory are cleaned by + Spark History Server if +spark.history.fs.driverlog.cleaner.enabled is true and, if they are older than max age configured +at spark.history.fs.driverlog.cleaner.maxAge. --- End diff -- s/at/by setting --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346117 --- Diff: docs/configuration.md --- @@ -266,6 +266,40 @@ of the most common options to set are: Only has effect in Spark standalone mode or Mesos cluster deploy mode. + + spark.driver.log.dfsDir + (none) + +Base directory in which Spark driver logs are synced, if spark.driver.log.persistToDfs.enabled +is true. Within this base directory, each application logs the driver logs to an application specific file. +Users may want to set this to a unified location like an HDFS directory so driver log files can be persisted +for later usage. This directory should allow any Spark user to read/write files and the Spark History Server +user to delete files. Additionally, older logs from this directory are cleaned by --- End diff -- ...cleaned by the... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346507 --- Diff: docs/configuration.md --- @@ -266,6 +266,40 @@ of the most common options to set are: Only has effect in Spark standalone mode or Mesos cluster deploy mode. + + spark.driver.log.dfsDir + (none) + +Base directory in which Spark driver logs are synced, if spark.driver.log.persistToDfs.enabled +is true. Within this base directory, each application logs the driver logs to an application specific file. +Users may want to set this to a unified location like an HDFS directory so driver log files can be persisted +for later usage. This directory should allow any Spark user to read/write files and the Spark History Server +user to delete files. Additionally, older logs from this directory are cleaned by + Spark History Server if +spark.history.fs.driverlog.cleaner.enabled is true and, if they are older than max age configured +at spark.history.fs.driverlog.cleaner.maxAge. + + + + spark.driver.log.persistToDfs.enabled + false + +If true, spark application running in client mode will write driver logs to a persistent storage, configured +in spark.driver.log.dfsDir. If spark.driver.log.dfsDir is not configured, driver logs +will not be persisted. Additionally, enable the cleaner by setting spark.history.fs.driverlog.cleaner.enabled +to true in Spark History Server. + + + + spark.driver.log.layout + %d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n + +The layout for the driver logs that are synced to spark.driver.log.dfsDir. If +spark.driver.log.persistToDfs.enabled is true and this configuration is used. If this is not configured, --- End diff -- No need to mention the `enabled` option here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r231346593 --- Diff: docs/monitoring.md --- @@ -202,6 +202,28 @@ Security options for the Spark History Server are covered more detail in the applications that fail to rename their event logs listed as in-progress. + +spark.history.fs.driverlog.cleaner.enabled +spark.history.fs.cleaner.enabled + + Specifies whether the History Server should periodically clean up driver logs from storage. + + + +spark.history.fs.driverlog.cleaner.interval +spark.history.fs.cleaner.interval + + How often the filesystem driver log history cleaner checks for files to delete. --- End diff -- driver log cleaner --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22954 So far, the regressions tests are passed and newly added test for R optimization is verified locally. Let me fix CRAN test and some nits. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22956: [SPARK-25950][SQL] from_csv should respect to spark.sql....
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22956 Looks good. I or someone else should take a closer look before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22911#discussion_r231344398 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -123,7 +126,11 @@ private[spark] class KubernetesClusterSchedulerBackend( } override def createDriverEndpoint(properties: Seq[(String, String)]): DriverEndpoint = { -new KubernetesDriverEndpoint(rpcEnv, properties) +new KubernetesDriverEndpoint(sc.env.rpcEnv, properties) + } + + override protected def createTokenManager(): Option[HadoopDelegationTokenManager] = { +Some(new HadoopDelegationTokenManager(conf, sc.hadoopConfiguration)) --- End diff -- Oh, I was referencing the creation of `Delegation Token` secret when a `--keytab` is specified. I believe that you are right in that in client-mode you would not need to worry about running this step. But I think the 3rd option would be good to include here. I think that with the introduction of `HadoopDelegationTokenManager` we should remove the creation of the `dtSecret`, and that should be included in this PR if you are introducing this. Therefore, I think it is sensible to refactor the `KerberosConfigSpec` to have a generic `secret`, `secretName`, `secretKey`, that would either contain a `DelegationToken` or a `keytab`. Such that the code block: ``` private val kerberosConfSpec: Option[KerberosConfigSpec] = (for { secretName <- existingSecretName secretItemKey <- existingSecretItemKey } yield { KerberosConfigSpec( secret = None, secretName = secretName, secretItemKey = secretItemKey, jobUserName = kubeTokenManager.getCurrentUser.getShortUserName) }).orElse( if (isKerberosEnabled) { keytab.map { . } } else { None } ``` would return a kerberosConfSpec that would account for either case. Erm, that would also mean that you could delete the `HadoopKerberosLogin` method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231344120 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), + Row(Row(1, "haa")) :: Nil) + } + + test("roundtrip to_csv -> from_csv") { +val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct") +val schema = df.schema(0).dataType.asInstanceOf[StructType] +val options = Map.empty[String, String] +val readback = df.select(to_csv($"struct").as("csv")) + .select(from_csv($"csv", schema, options).as("struct")) + +checkAnswer(df, readback) + } + + test("roundtrip from_csv -> to_csv") { +val df = Seq(Some("1"), None).toDF("csv") +val schema = new StructType().add("a", IntegerType) +val options = Map.empty[String, String] +val readback = df.select(from_csv($"csv", schema, options).as("struct")) + .select(to_csv($"struct").as("csv")) + +checkAnswer(df, readback) + } + + test("infers schemas of a CSV string and pass to to from_csv") { +val in = Seq("""0.123456789,987654321,"San Francisco).toDS() +val options = Map.empty[String, String].asJava +val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), options) as "parsed") +val expected = StructType(Seq(StructField( + "parsed", + StructType(Seq( +StructField("_c0", DoubleType, true), +StructField("_c1", IntegerType, true), +StructField("_c2", StringType, true)) + +assert(out.schema == expected) + } + + test("Support to_csv in SQL") { --- End diff -- @MaxGekk, wouldn't the tests in `csv-functions.sql` be enough for SQL support test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapStatus des...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22894 **[Test build #4416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4416/testReport)** for PR 22894 at commit [`57bdd75`](https://github.com/apache/spark/commit/57bdd7525f3353a6d59772b2a86abbe6a0d5f4ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22909 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22909 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98532/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22943 **[Test build #98540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98540/testReport)** for PR 22943 at commit [`b866d65`](https://github.com/apache/spark/commit/b866d65c534d016f814946236b55ff05f79a4490). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22909 **[Test build #98532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98532/testReport)** for PR 22909 at commit [`f07f50c`](https://github.com/apache/spark/commit/f07f50c4e495eb25f92a930e424a579da68c5be6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22943 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4808/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22943: [SPARK-25098][SQL] Trim the string when cast stringToTim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22943 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22926: [SPARK-25917][Spark UI] memoryMetrics should be Json ign...
Github user jianjianjiao commented on the issue: https://github.com/apache/spark/pull/22926 @AmplabJenkins Could you please find someone to review this? I believe this is a bug in Spark UI. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22960 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22960 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98531/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org