[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224318955 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal + case class StrWrapper(s: String) extends AnyVal + + case class ValueClassData( +intField: Int, +wrappedInt: IntWrapper, +strField: String, +wrappedStr: StrWrapper) --- End diff -- We might need a comment to describe what this class is look like in Java. Seems like it has 2 int fields `intField`, `wrappedInt`, and 2 string fields `strField`, `wrappedStr`. I'm not sure it is the same in Scala 2.12, though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97232/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #97232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97232/testReport)** for PR 22309 at commit [`5613217`](https://github.com/apache/spark/commit/5613217771b1929b9f66106468fd2da2c3ea7dec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r224322470 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources +--- + +In this section, we introduce how to use data source in ML to load data. +Beside some general data sources like Parquet, CSV, JSON, JDBC, we also provide some specific data source for ML. + +**Table of Contents** + +* This will become a table of contents (this text will be scraped). +{:toc} + +## Image data source + +This image data source is used to load image files from a directory. +The loaded DataFrame has one StructType column: "image". containing image data stored as image schema. + + + +[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource) +implements a Spark SQL data source API for loading image data as a DataFrame. + +{% highlight scala %} +scala> spark.read.format("image").load("data/mllib/images/origin") +res1: org.apache.spark.sql.DataFrame = [image: struct] +{% endhighlight %} + + + +[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html) +implements Spark SQL data source API for loading image data as DataFrame. + +{% highlight java %} +Dataset imagesDF = spark.read().format("image").load("data/mllib/images/origin"); --- End diff -- Can we do a simple transformation so that how the image datasource can be utilized? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r224322298 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources +--- + +In this section, we introduce how to use data source in ML to load data. +Beside some general data sources like Parquet, CSV, JSON, JDBC, we also provide some specific data source for ML. + +**Table of Contents** + +* This will become a table of contents (this text will be scraped). +{:toc} + +## Image data source + +This image data source is used to load image files from a directory. +The loaded DataFrame has one StructType column: "image". containing image data stored as image schema. + + + +[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource) +implements a Spark SQL data source API for loading image data as a DataFrame. + +{% highlight scala %} +scala> spark.read.format("image").load("data/mllib/images/origin") +res1: org.apache.spark.sql.DataFrame = [image: struct] +{% endhighlight %} + + + +[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html) +implements Spark SQL data source API for loading image data as DataFrame. + +{% highlight java %} +Dataset imagesDF = spark.read().format("image").load("data/mllib/images/origin"); +{% endhighlight %} + + + --- End diff -- how about SQL syntax? I think we can use `CREATE TABLE tableA USING LOCATION 'data/image.png'` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r224321873 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources +--- + +In this section, we introduce how to use data source in ML to load data. +Beside some general data sources like Parquet, CSV, JSON, JDBC, we also provide some specific data source for ML. + +**Table of Contents** + +* This will become a table of contents (this text will be scraped). +{:toc} + +## Image data source + +This image data source is used to load image files from a directory. +The loaded DataFrame has one StructType column: "image". containing image data stored as image schema. --- End diff -- Shall we describe which image we can load? For instance, I think this delegates to ImageIO in Java which allows to read compressed format like PNG or JPG to raw image representation like BMP so that OpenCS can handles them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r224321949 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources +--- + +In this section, we introduce how to use data source in ML to load data. +Beside some general data sources like Parquet, CSV, JSON, JDBC, we also provide some specific data source for ML. + +**Table of Contents** + +* This will become a table of contents (this text will be scraped). +{:toc} + +## Image data source + +This image data source is used to load image files from a directory. +The loaded DataFrame has one StructType column: "image". containing image data stored as image schema. --- End diff -- I would also describe the schema structure and what each field means. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r224321446 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources +--- + +In this section, we introduce how to use data source in ML to load data. +Beside some general data sources like Parquet, CSV, JSON, JDBC, we also provide some specific data source for ML. --- End diff -- `JSON, JDBC` -> `JSON and JDBC` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...
Github user shivusondur commented on a diff in the pull request: https://github.com/apache/spark/pull/22668#discussion_r224318421 --- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala --- @@ -123,10 +123,9 @@ private[ui] trait PagedTable[T] { /** * Return a page navigation. * - * If the totalPages is 1, the page navigation will be empty * - * If the totalPages is more than 1, it will create a page navigation including a group of - * page numbers and a form to submit the page number. + * It will create a page navigation including a group of page numbers and a form --- End diff -- @gengliangwang @felixcheung i have updated according to your suggestion. please check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22685#discussion_r224317853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -96,7 +95,7 @@ case class DataSource( private val caseInsensitiveOptions = CaseInsensitiveMap(options) private val equality = sparkSession.sessionState.conf.resolver - bucketSpec.map { bucket => + bucketSpec.foreach { bucket => --- End diff -- Yea, this is legitimate change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(numb...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22419#discussion_r224318028 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1245,3 +1245,27 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * The number truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(number, scale) - Returns number truncated to scale decimal places. " + +"If scale is omitted, then number is truncated to 0 places. " + +"scale can be negative to truncate (make zero) scale digits left of the decimal point.", + examples = """ +Examples: + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Truncate(child: Expression, scale: Expression) --- End diff -- In that case, its ok to handle the string as date. How about only accepting float, double, and decimal for number truncation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22676 **[Test build #97235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97235/testReport)** for PR 22676 at commit [`c504356`](https://github.com/apache/spark/commit/c504356b847e183f571a09ce5f808d4a7f229255). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22676 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3871/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22676 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22676 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97230/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22594 **[Test build #97230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97230/testReport)** for PR 22594 at commit [`04eba30`](https://github.com/apache/spark/commit/04eba3019fa8e05b73823c91db48a50c544e8350). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97234/testReport)** for PR 22688 at commit [`ded852c`](https://github.com/apache/spark/commit/ded852c3f99d9fe904a6b54691ac6c170da9a298). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3870/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22688#discussion_r224316297 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-25700: do not read schema when writing in other modes except append mode") { +withTempPath { file => + val cls = classOf[SimpleWriteOnlyDataSource] + val path = file.getCanonicalPath + val df = spark.range(5).select('id as 'i, -'id as 'j) --- End diff -- The write path looks requiring two columns: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22688#discussion_r224316130 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-25700: do not read schema when writing in other modes except append mode") { +withTempPath { file => + val cls = classOf[SimpleWriteOnlyDataSource] + val path = file.getCanonicalPath + val df = spark.range(5).select($"id", $"id") --- End diff -- The write path looks requiring two columns: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22668#discussion_r224316034 --- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala --- @@ -123,10 +123,9 @@ private[ui] trait PagedTable[T] { /** * Return a page navigation. * - * If the totalPages is 1, the page navigation will be empty * - * If the totalPages is more than 1, it will create a page navigation including a group of - * page numbers and a form to submit the page number. + * It will create a page navigation including a group of page numbers and a form --- End diff -- true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22688 I have no idea why it passes in my local. I fixed the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22689 **[Test build #97233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97233/testReport)** for PR 22689 at commit [`c309f34`](https://github.com/apache/spark/commit/c309f3464522341f286fd4791d7989dcde988cac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22688 Hm, yea, this was passed in my local so I expected this was flaky but seems I should fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22689 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22681: [SPARK-25682][k8s] Package example jars in same t...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22681#discussion_r224314585 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile --- @@ -18,6 +18,7 @@ FROM openjdk:8-alpine ARG spark_jars=jars +ARG example_jars=examples/jars --- End diff -- could we make this optional? if someone wants to build a smaller image without example --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22688 Seems the same test failed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 > The major comments are in the test cases. Could you help clean up the existing test cases? All the comments are fixed and corrected the testcases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97229/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97229/testReport)** for PR 22688 at commit [`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SchemaReadAttemptException(m: String) extends RuntimeException(m)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22678#discussion_r224309582 --- Diff: dev/run-tests-jenkins.py --- @@ -39,7 +39,8 @@ def print_err(msg): def post_message_to_github(msg, ghprb_pull_id): print("Attempting to post to Github...") -url = "https://api.github.com/repos/apache/spark/issues/; + ghprb_pull_id + "/comments" +api_url = os.getenv("GITHUB_SERVER_API_URL", "https://api.github.com/repos/apache/spark;) --- End diff -- Sure. @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97226/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22690 **[Test build #97226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97226/testReport)** for PR 22690 at commit [`a35b54f`](https://github.com/apache/spark/commit/a35b54fbb000665a87998c14ed940316d45d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97228/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22612 **[Test build #97228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97228/testReport)** for PR 22612 at commit [`067b81d`](https://github.com/apache/spark/commit/067b81d24de7999afe5b9660e89d9a2e41de6d21). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22678 Sorry for closing the conversation mistakenly @dongjoon-hyun . I will update the documentation soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #97232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97232/testReport)** for PR 22309 at commit [`5613217`](https://github.com/apache/spark/commit/5613217771b1929b9f66106468fd2da2c3ea7dec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22309 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22674 **[Test build #97231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97231/testReport)** for PR 22674 at commit [`3ffa536`](https://github.com/apache/spark/commit/3ffa536f3c29f6655843a4d45c215393f51e23c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22674 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3869/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22309 somehow I lost track of this PR. ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r224300113 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -108,6 +108,16 @@ object TestingUDT { } } +object TestingValueClass { + case class IntWrapper(i: Int) extends AnyVal --- End diff -- does value class must be a case class? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22661#discussion_r224300031 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala --- @@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark import org.apache.spark.sql.execution.joins._ import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.IntegerType /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.JoinBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: + * bin/spark-submit --class --jars + * 2. build/sbt "sql/test:runMain " + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/JoinBenchmark-results.txt". + * }}} */ -class JoinBenchmark extends BenchmarkWithCodegen { +object JoinBenchmark extends SqlBasedBenchmark { - ignore("broadcast hash join, long key") { + def broadcastHashJoinLongKey(): Unit = { val N = 20 << 20 val M = 1 << 16 -val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v")) -runBenchmark("Join w long", N) { - val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k")) +val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v")) +codegenBenchmark("Join w long", N) { + val df = spark.range(N).join(dim, (col("id") % M) === col("k")) assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined) df.count() } - -/* -Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5 -Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz -Join w long:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- -Join w long codegen=false3002 / 3262 7.0 143.2 1.0X -Join w long codegen=true 321 / 371 65.3 15.3 9.3X -*/ } - ignore("broadcast hash join, long key with duplicates") { + + def broadcastHashJoinLongKeyWithDuplicates(): Unit = { val N = 20 << 20 val M = 1 << 16 -val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v")) --- End diff -- Yes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22674 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97221/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22692 **[Test build #97221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97221/testReport)** for PR 22692 at commit [`4b39ac3`](https://github.com/apache/spark/commit/4b39ac3500d1ee6f8b3d93f4822c6e5f36e30e3b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19330 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19330 @maropu Thanks, and yes I'm still here and I can keep going if this pr is interested. I will update this pr this weekend. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22692 sounds reasonable, also cc @tdas @zsxwing @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22259: [SPARK-25044][SQL] (take 2) Address translation o...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22259#discussion_r224295469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,7 +48,8 @@ case class ScalaUDF( inputTypes: Seq[DataType] = Nil, udfName: Option[String] = None, nullable: Boolean = true, -udfDeterministic: Boolean = true) +udfDeterministic: Boolean = true, +nullableTypes: Seq[Boolean] = Nil) --- End diff -- Yes, the test should not pass after removing `isInstanceOf[KnownNotNull]` condition from `needsNullCheck` test (https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2160). The idea was to add a `KnownNotNull` node on top of the original node to mark it as null-checked, so the rule won't add redundant null checks even if it is accidentally applied again. I'm not sure about the exact reason why you removed `isInstanceOf[KnownNotNull]` condition in this PR, but I think it should be left there alongside the new nullable type check. After adding the `nullableTypes` parameter in the test, the issue can be reproduced: ``` test("SPARK-24891 Fix HandleNullInputsForUDF rule") { val a = testRelation.output(0) val func = (x: Int, y: Int) => x + y val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes = false :: false :: Nil) val plan = Project(Alias(udf2, "")() :: Nil, testRelation) comparePlans(plan.analyze, plan.analyze.analyze) } ``` BTW, I'm just curious: It looks like `nullableTypes` indicates something opposite to "nullable" used in schema. I would assume when `nullableTypes` is `Seq(false)`, it means this type is not nullable and we need not add the null check, vice versa. Did I miss something here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22594 **[Test build #97230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97230/testReport)** for PR 22594 at commit [`04eba30`](https://github.com/apache/spark/commit/04eba3019fa8e05b73823c91db48a50c544e8350). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3868/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97220/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #97220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97220/testReport)** for PR 21669 at commit [`dd95fca`](https://github.com/apache/spark/commit/dd95fcab754e71e9465f4e46818c3cef09e86c8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22691 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97222/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22691 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22691 **[Test build #97222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97222/testReport)** for PR 22691 at commit [`8850c7a`](https://github.com/apache/spark/commit/8850c7a7d563cf6bc46a84b7480b4d338d58b80f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3867/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97229/testReport)** for PR 22688 at commit [`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22688 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22664 Hi, @peter-toth . Could you review and merge https://github.com/peter-toth/spark/pull/1 which contains the result on EC2 r3.xlarge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97224/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97224/testReport)** for PR 22688 at commit [`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SchemaReadAttemptException(m: String) extends RuntimeException(m)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22689 @srowen . Yes. We should read only from the finished frames of zstd. When the listener try to read from the unfinished frame, zstd input reader throws an exception (unless we make set continuous true). Currently the behavior is, it reads from the finished frames, but after that it tried to read from the unfinished frame and throws exception while loading the webui. So, the solution should be, we should not parse from the unfinished frame, and load the UI based on only the finish frames. @vanzin has good idea about the history server. Hi @vanzin , could you please give your inputs? Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22594: [SPARK-25674][SQL] If the records are incremented...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/22594#discussion_r224286853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -70,6 +70,8 @@ class FileScanRDD( private val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles private val ignoreMissingFiles = sparkSession.sessionState.conf.ignoreMissingFiles + // only for test + private val inputMetricsTest = sparkSession.sessionState.conf.contains("spark.inputmetrics.test") --- End diff -- If this place is controlled by `spark.testing`, other unit tests may fail. Yeah, I agree with you ,this a simple change, it is better to drop this. thanks @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22612 **[Test build #97228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97228/testReport)** for PR 22612 at commit [`067b81d`](https://github.com/apache/spark/commit/067b81d24de7999afe5b9660e89d9a2e41de6d21). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22612 Looking at the logs the failure doesn't seem to be related to this change. It is in HiveVersionSuite. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22664 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97227/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22664 **[Test build #97227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97227/testReport)** for PR 22664 at commit [`cf61f1c`](https://github.com/apache/spark/commit/cf61f1c4df40b681f2db8cf233b8fbc0df88598b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22664 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22664 **[Test build #97227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97227/testReport)** for PR 22664 at commit [`cf61f1c`](https://github.com/apache/spark/commit/cf61f1c4df40b681f2db8cf233b8fbc0df88598b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22664 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchm...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22664#discussion_r224272542 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala --- @@ -34,10 +34,15 @@ import org.apache.spark.sql.vectorized.ColumnVector /** * Benchmark to measure data source read performance. - * To run this: - * spark-submit --class + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class --jars , --- End diff -- Could you run `dev/scalastyle` and fix this in your branch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22690 **[Test build #97226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97226/testReport)** for PR 22690 at commit [`a35b54f`](https://github.com/apache/spark/commit/a35b54fbb000665a87998c14ed940316d45d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3866/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224272133 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestTestHelper.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.util.Utils + +object ResourceRequestTestHelper { + def initializeResourceTypes(resourceTypes: List[String]): Unit = { +if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) { + throw new IllegalStateException("This method should not be invoked " + +"since YARN resource types is not available because of old Hadoop version!" ) +} + +val allResourceTypes = new ListBuffer[AnyRef] +val defaultResourceTypes = List( + createResourceTypeInfo("memory-mb"), + createResourceTypeInfo("vcores")) +val customResourceTypes = resourceTypes.map(rt => createResourceTypeInfo(rt)) --- End diff -- `.map { rt => ... }`, or `.map(createResourceTypeInfo)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224271908 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.Matchers + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + test("resource request value does not match pattern") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), + Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1) + } + + test("resource request just unit defined") { +verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), CUSTOM_RES_1) + } + + test("resource request with null value should not be allowed") { +verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> "123"), + "requirement failed: Resource parameter should not be null!") + } + + test("resource request with valid value and invalid unit") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource, + Map(CUSTOM_RES_1 -> "123ppp"), "") + } + + test("resource request with valid value and without unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "123"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, ""))) + } + + test("resource request with valid value and unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "2g"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G"))) + } + + test("two resource requests with valid values and units") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2), + Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"), +CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G"))) + } + + test("empty SparkConf should be valid") { +val sparkConf = new SparkConf() +ResourceRequestHelper.validateResources(sparkConf) + } + + test("just normal resources are defined") { +val sparkConf = new SparkConf() +sparkConf.set(DRIVER_MEMORY.key, "3G") +sparkConf.set(DRIVER_CORES.key, "4") +sparkConf.set(EXECUTOR_MEMORY.key, "4G") +sparkConf.set(EXECUTOR_CORES.key, "2") +ResourceRequestHelper.validateResources(sparkConf) + } + + test("memory defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G") +verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY) + } + + test("cores defined with new config for executor") { +val sparkConf =
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224270997 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.collection.mutable +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + private val RESOURCE_INFO_CLASS = "org.apache.hadoop.yarn.api.records.ResourceInformation" + + /** + * Validates sparkConf and throws a SparkException if any of standard resources (memory or cores) + * is defined with the property spark.yarn.x.resource.y + */ + def validateResources(sparkConf: SparkConf): Unit = { +val resourceDefinitions = Seq[(String, String)]( + (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"), + (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"), + (DRIVER_MEMORY.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"), + (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"), + (EXECUTOR_MEMORY.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "memory"), + (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores")) +val errorMessage = new mutable.StringBuilder() + +resourceDefinitions.foreach { case (sparkName, resourceRequest) => + if (sparkConf.contains(resourceRequest)) { +errorMessage.append(s"Error: Do not use $resourceRequest, " + +s"please use $sparkName instead!\n") + } +} + +if (errorMessage.nonEmpty) { + throw new SparkException(errorMessage.toString()) +} + } + + /** + * Sets resource amount with the corresponding unit to the passed resource object. + * @param resources resource values to set + * @param resource resource object to update + */ + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") + +logDebug(s"Custom resources requested: $resources") +if (!isYarnResourceTypesAvailable()) { + if (resources.nonEmpty) { +logWarning("Ignoring custom resource requests because " + +"the version of YARN does not support it!") + } + return +} + +val resInfoClass = Utils.classForName(RESOURCE_INFO_CLASS) +val setResourceInformationMethod = + resource.getClass.getMethod("setResourceInformation", classOf[String], resInfoClass) +resources.foreach { case (name, rawAmount) => + try { +val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount +val amount = amountPart.toLong +val unit = unitPart match { + case "g" => "G" + case "t" => "T" + case "p" => "P" + case _ => unitPart +} +logDebug(s"Registering resource with name: $name, amount: $amount, unit: $unit") +val resourceInformation = createResourceInformation( + name, amount, unit, resInfoClass) --- End diff -- Fits in previous line. ---
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224271778 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.Matchers + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + test("resource request value does not match pattern") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), + Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1) + } + + test("resource request just unit defined") { +verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), CUSTOM_RES_1) + } + + test("resource request with null value should not be allowed") { +verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> "123"), + "requirement failed: Resource parameter should not be null!") + } + + test("resource request with valid value and invalid unit") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource, + Map(CUSTOM_RES_1 -> "123ppp"), "") + } + + test("resource request with valid value and without unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "123"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, ""))) + } + + test("resource request with valid value and unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "2g"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G"))) + } + + test("two resource requests with valid values and units") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2), + Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"), +CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G"))) + } + + test("empty SparkConf should be valid") { +val sparkConf = new SparkConf() +ResourceRequestHelper.validateResources(sparkConf) + } + + test("just normal resources are defined") { +val sparkConf = new SparkConf() +sparkConf.set(DRIVER_MEMORY.key, "3G") +sparkConf.set(DRIVER_CORES.key, "4") +sparkConf.set(EXECUTOR_MEMORY.key, "4G") +sparkConf.set(EXECUTOR_CORES.key, "2") +ResourceRequestHelper.validateResources(sparkConf) + } + + test("memory defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G") +verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY) + } + + test("cores defined with new config for executor") { +val sparkConf =
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224270816 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.collection.mutable +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + private val RESOURCE_INFO_CLASS = "org.apache.hadoop.yarn.api.records.ResourceInformation" + + /** + * Validates sparkConf and throws a SparkException if any of standard resources (memory or cores) + * is defined with the property spark.yarn.x.resource.y + */ + def validateResources(sparkConf: SparkConf): Unit = { +val resourceDefinitions = Seq[(String, String)]( + (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"), --- End diff -- I went and looked at the documentation because I remember this being confusing. The documentation mentions both `memory` and `memory-mb` as being valid, with the latter being preferred. So it sounds to me like you can use either, and that this code should disallow both. You even initialize `memory-mb` in your tests, instead of `memory`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224271821 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.Matchers + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + test("resource request value does not match pattern") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), + Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1) + } + + test("resource request just unit defined") { +verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), CUSTOM_RES_1) + } + + test("resource request with null value should not be allowed") { +verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> "123"), + "requirement failed: Resource parameter should not be null!") + } + + test("resource request with valid value and invalid unit") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource, + Map(CUSTOM_RES_1 -> "123ppp"), "") + } + + test("resource request with valid value and without unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "123"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, ""))) + } + + test("resource request with valid value and unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "2g"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G"))) + } + + test("two resource requests with valid values and units") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2), + Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"), +CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G"))) + } + + test("empty SparkConf should be valid") { +val sparkConf = new SparkConf() +ResourceRequestHelper.validateResources(sparkConf) + } + + test("just normal resources are defined") { +val sparkConf = new SparkConf() +sparkConf.set(DRIVER_MEMORY.key, "3G") +sparkConf.set(DRIVER_CORES.key, "4") +sparkConf.set(EXECUTOR_MEMORY.key, "4G") +sparkConf.set(EXECUTOR_CORES.key, "2") +ResourceRequestHelper.validateResources(sparkConf) + } + + test("memory defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G") +verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY) + } + + test("cores defined with new config for executor") { +val sparkConf =
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224271845 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.Matchers + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config._ +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + test("resource request value does not match pattern") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), + Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1) + } + + test("resource request just unit defined") { +verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), CUSTOM_RES_1) + } + + test("resource request with null value should not be allowed") { +verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> "123"), + "requirement failed: Resource parameter should not be null!") + } + + test("resource request with valid value and invalid unit") { +verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource, + Map(CUSTOM_RES_1 -> "123ppp"), "") + } + + test("resource request with valid value and without unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "123"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, ""))) + } + + test("resource request with valid value and unit") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), Map(CUSTOM_RES_1 -> "2g"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G"))) + } + + test("two resource requests with valid values and units") { +verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2), + Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"), + Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"), +CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G"))) + } + + test("empty SparkConf should be valid") { +val sparkConf = new SparkConf() +ResourceRequestHelper.validateResources(sparkConf) + } + + test("just normal resources are defined") { +val sparkConf = new SparkConf() +sparkConf.set(DRIVER_MEMORY.key, "3G") +sparkConf.set(DRIVER_CORES.key, "4") +sparkConf.set(EXECUTOR_MEMORY.key, "4G") +sparkConf.set(EXECUTOR_CORES.key, "2") +ResourceRequestHelper.validateResources(sparkConf) + } + + test("memory defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G") +verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY) + } + + test("cores defined with new config for executor") { +val sparkConf =
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97225/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22690 **[Test build #97225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97225/testReport)** for PR 22690 at commit [`6c6c1a3`](https://github.com/apache/spark/commit/6c6c1a3ab141353d1867a7d8ab9145da77048980). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22661#discussion_r224270755 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala --- @@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark import org.apache.spark.sql.execution.joins._ import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.IntegerType /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.JoinBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: + * bin/spark-submit --class --jars + * 2. build/sbt "sql/test:runMain " + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/JoinBenchmark-results.txt". + * }}} */ -class JoinBenchmark extends BenchmarkWithCodegen { +object JoinBenchmark extends SqlBasedBenchmark { - ignore("broadcast hash join, long key") { + def broadcastHashJoinLongKey(): Unit = { val N = 20 << 20 val M = 1 << 16 -val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v")) -runBenchmark("Join w long", N) { - val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k")) +val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v")) +codegenBenchmark("Join w long", N) { + val df = spark.range(N).join(dim, (col("id") % M) === col("k")) assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined) df.count() } - -/* -Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5 -Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz -Join w long:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- -Join w long codegen=false3002 / 3262 7.0 143.2 1.0X -Join w long codegen=true 321 / 371 65.3 15.3 9.3X -*/ } - ignore("broadcast hash join, long key with duplicates") { + + def broadcastHashJoinLongKeyWithDuplicates(): Unit = { val N = 20 << 20 val M = 1 << 16 -val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v")) -runBenchmark("Join w long duplicated", N) { - val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k")) - val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k")) +codegenBenchmark("Join w long duplicated", N) { + val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k")) --- End diff -- According to another bechmark case in this file, `broadcast` seems to be put outside of `codegenBenchmark`. How do you think about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org