[spark] branch branch-3.2 updated: [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 0e5812c49d2 [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas 0e5812c49d2 is described below commit 0e5812c49d2552d8779f94fbaad2fc1b69d8a9e8 Author: Yuming Wang AuthorDate: Fri Aug 5 11:25:51 2022 +0800 [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas ### What changes were proposed in this pull request? This PR disables validate default values when parsing Avro schemas. ### Why are the changes needed? Spark will throw exception if upgrade to Spark 3.2. We have fixed the Hive serde tables before: SPARK-34512. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #37191 from wangyum/SPARK-39775. Authored-by: Yuming Wang Signed-off-by: Wenchen Fan (cherry picked from commit 5c1b99f441ec5e178290637a9a9e7902aaa116e1) Signed-off-by: Wenchen Fan --- .../spark/serializer/GenericAvroSerializer.scala | 4 +-- .../serializer/GenericAvroSerializerSuite.scala| 16 +++ .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 3 +- .../org/apache/spark/sql/avro/AvroOptions.scala| 4 +-- .../apache/spark/sql/avro/CatalystDataToAvro.scala | 2 +- .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 32 ++ 6 files changed, 55 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala index c1ef3ee769a..7d2923fdf37 100644 --- a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala @@ -97,7 +97,7 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] } { in.close() } -new Schema.Parser().parse(new String(bytes, StandardCharsets.UTF_8)) +new Schema.Parser().setValidateDefaults(false).parse(new String(bytes, StandardCharsets.UTF_8)) }) /** @@ -137,7 +137,7 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] val fingerprint = input.readLong() schemaCache.getOrElseUpdate(fingerprint, { schemas.get(fingerprint) match { -case Some(s) => new Schema.Parser().parse(s) +case Some(s) => new Schema.Parser().setValidateDefaults(false).parse(s) case None => throw new SparkException( "Error reading attempting to read avro data -- encountered an unknown " + diff --git a/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala b/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala index 54e4aebe544..98493c12f59 100644 --- a/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala +++ b/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala @@ -110,4 +110,20 @@ class GenericAvroSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(rdd.collect() sameElements Array.fill(10)(datum)) } } + + test("SPARK-39775: Disable validate default values when parsing Avro schemas") { +val avroTypeStruct = s""" + |{ + | "type": "record", + | "name": "struct", + | "fields": [ + |{"name": "id", "type": "long", "default": null} + | ] + |} +""".stripMargin +val schema = new Schema.Parser().setValidateDefaults(false).parse(avroTypeStruct) + +val genericSer = new GenericAvroSerializer(conf.getAvroSchema) +assert(schema === genericSer.decompress(ByteBuffer.wrap(genericSer.compress(schema + } } diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala index b4965003ba3..c4a4b16b052 100644 --- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala +++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala @@ -53,7 +53,8 @@ private[avro] case class AvroDataToCatalyst( private lazy val avroOptions = AvroOptions(options) - @transient private lazy val actualSchema = new Schema.Parser().parse(jsonFormatSchema) + @transient private lazy val actualSchema = +new Schema.Parser().setValidateDefaults(false).parse(jsonFormatSchema) @transient private lazy val expectedSchema = avroOptions.schema.getOrElse(actualSchema) diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
[spark] branch branch-3.3 updated: [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new c358ee67615 [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas c358ee67615 is described below commit c358ee6761539b4a4d12dbe36a4dd1a632a0efeb Author: Yuming Wang AuthorDate: Fri Aug 5 11:25:51 2022 +0800 [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas ### What changes were proposed in this pull request? This PR disables validate default values when parsing Avro schemas. ### Why are the changes needed? Spark will throw exception if upgrade to Spark 3.2. We have fixed the Hive serde tables before: SPARK-34512. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #37191 from wangyum/SPARK-39775. Authored-by: Yuming Wang Signed-off-by: Wenchen Fan (cherry picked from commit 5c1b99f441ec5e178290637a9a9e7902aaa116e1) Signed-off-by: Wenchen Fan --- .../spark/serializer/GenericAvroSerializer.scala | 4 +-- .../serializer/GenericAvroSerializerSuite.scala| 16 +++ .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 3 +- .../org/apache/spark/sql/avro/AvroOptions.scala| 4 +-- .../apache/spark/sql/avro/CatalystDataToAvro.scala | 2 +- .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 32 ++ 6 files changed, 55 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala index c1ef3ee769a..7d2923fdf37 100644 --- a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala @@ -97,7 +97,7 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] } { in.close() } -new Schema.Parser().parse(new String(bytes, StandardCharsets.UTF_8)) +new Schema.Parser().setValidateDefaults(false).parse(new String(bytes, StandardCharsets.UTF_8)) }) /** @@ -137,7 +137,7 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] val fingerprint = input.readLong() schemaCache.getOrElseUpdate(fingerprint, { schemas.get(fingerprint) match { -case Some(s) => new Schema.Parser().parse(s) +case Some(s) => new Schema.Parser().setValidateDefaults(false).parse(s) case None => throw new SparkException( "Error reading attempting to read avro data -- encountered an unknown " + diff --git a/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala b/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala index 54e4aebe544..98493c12f59 100644 --- a/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala +++ b/core/src/test/scala/org/apache/spark/serializer/GenericAvroSerializerSuite.scala @@ -110,4 +110,20 @@ class GenericAvroSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(rdd.collect() sameElements Array.fill(10)(datum)) } } + + test("SPARK-39775: Disable validate default values when parsing Avro schemas") { +val avroTypeStruct = s""" + |{ + | "type": "record", + | "name": "struct", + | "fields": [ + |{"name": "id", "type": "long", "default": null} + | ] + |} +""".stripMargin +val schema = new Schema.Parser().setValidateDefaults(false).parse(avroTypeStruct) + +val genericSer = new GenericAvroSerializer(conf.getAvroSchema) +assert(schema === genericSer.decompress(ByteBuffer.wrap(genericSer.compress(schema + } } diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala index b4965003ba3..c4a4b16b052 100644 --- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala +++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala @@ -53,7 +53,8 @@ private[avro] case class AvroDataToCatalyst( private lazy val avroOptions = AvroOptions(options) - @transient private lazy val actualSchema = new Schema.Parser().parse(jsonFormatSchema) + @transient private lazy val actualSchema = +new Schema.Parser().setValidateDefaults(false).parse(jsonFormatSchema) @transient private lazy val expectedSchema = avroOptions.schema.getOrElse(actualSchema) diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
[spark] branch master updated (82dc17cdf7a -> 5c1b99f441e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 82dc17cdf7a [SPARK-39986][PS][DOC] Better example for Co-grouped Map add 5c1b99f441e [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas No new revisions were added by this update. Summary of changes: .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 3 +- .../org/apache/spark/sql/avro/AvroOptions.scala| 4 +-- .../apache/spark/sql/avro/CatalystDataToAvro.scala | 2 +- .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 32 ++ .../spark/serializer/GenericAvroSerializer.scala | 4 +-- .../serializer/GenericAvroSerializerSuite.scala| 16 +++ 6 files changed, 55 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e9e8dcb25fe -> 82dc17cdf7a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e9e8dcb25fe [SPARK-39974][INFRA] Create separate static image tag for infra cache add 82dc17cdf7a [SPARK-39986][PS][DOC] Better example for Co-grouped Map No new revisions were added by this update. Summary of changes: examples/src/main/python/sql/arrow.py | 22 +++--- .../source/getting_started/quickstart_df.ipynb | 8 2 files changed, 15 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39974][INFRA] Create separate static image tag for infra cache
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e9e8dcb25fe [SPARK-39974][INFRA] Create separate static image tag for infra cache e9e8dcb25fe is described below commit e9e8dcb25fe1f5c0d925852c8af5e06ce0935684 Author: Yikun Jiang AuthorDate: Fri Aug 5 08:57:12 2022 +0900 [SPARK-39974][INFRA] Create separate static image tag for infra cache ### What changes were proposed in this pull request? Create separate static image tag for infra static image ### Why are the changes needed? Currently, we put the **static image** and **cache** together in same tag like [`ghcr.io/apache/spark/apache-spark-github-action-image-cache:master`](https://github.com/apache/spark/pkgs/container/spark%2Fapache-spark-github-action-image-cache/versions). Cache and static image occupy separate different image hash and same image tags. this bring some problem in below cases: - **Debug job with static docker images**, they have to find hash. If use cache directly, will raise something like: ``` yikun-x86:~# docker run -ti ghcr.io/yikun/apache-spark-github-action-image-cache:master Unable to find image 'ghcr.io/yikun/apache-spark-github-action-image-cache:master' locally master: Pulling from yikun/apache-spark-github-action-image-cache docker: no matching manifest for linux/amd64 in the manifest list entries. ``` - **Use static image in CI**, such as for some reason we want to switch static image temporarily. - **Easy to see history for last cache**, such as system deps/lib. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Local test: https://github.com/Yikun/spark/pull/144, and static image tag push [passed](https://github.com/Yikun/spark/runs/7664266955?check_suite_focus=true#step:6:212) - Run static image: ``` rootyikun-x86:~# docker run -ti ghcr.io/yikun/apache-spark-github-action-image-cache:master-static Unable to find image 'ghcr.io/yikun/apache-spark-github-action-image-cache:master-static' locally master-static: Pulling from yikun/apache-spark-github-action-image-cache Digest: sha256:5198fd8111c925b7c92d04427268bcb0e5574bb72cef09808076595f3372bf7b Status: Downloaded newer image for ghcr.io/yikun/apache-spark-github-action-image-cache:master-static root3550e09e0e93:/# exit ``` Closes #37402 from Yikun/patch-32. Authored-by: Yikun Jiang Signed-off-by: Hyukjin Kwon --- .github/workflows/build_infra_images_cache.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_infra_images_cache.yml b/.github/workflows/build_infra_images_cache.yml index c35b34e201e..bd5685f69b0 100644 --- a/.github/workflows/build_infra_images_cache.yml +++ b/.github/workflows/build_infra_images_cache.yml @@ -49,7 +49,7 @@ jobs: with: context: ./dev/infra/ push: true - tags: ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} + tags: ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }}-static cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }},mode=max - name: Image digest - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bf8a4c47ac3 [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe bf8a4c47ac3 is described below commit bf8a4c47ac3752edf86f1e14e4050e8d202b34a4 Author: Jiaan Geng AuthorDate: Thu Aug 4 09:59:09 2022 -0700 [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe ### What changes were proposed in this pull request? Currently, DS V2 push-down translate `Cast` only if the ansi mode is true. In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too. This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely. Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`. ### Why are the changes needed? Add the range for DS V2 push down `Cast`. ### Does this PR introduce _any_ user-facing change? 'Yes'. `Cast` could be pushed down to data source in more cases. ### How was this patch tested? Test cases updated. Closes #37388 from beliefer/SPARK-39961. Authored-by: Jiaan Geng Signed-off-by: Dongjoon Hyun --- .../sql/catalyst/util/V2ExpressionBuilder.scala| 3 +- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 45 +- 2 files changed, 20 insertions(+), 28 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala index 41415553729..d451c73b39d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala @@ -88,7 +88,8 @@ class V2ExpressionBuilder(e: Expression, isPredicate: Boolean = false) { } else { None } -case Cast(child, dataType, _, true) => +case Cast(child, dataType, _, ansiEnabled) +if ansiEnabled || Cast.canUpCast(child.dataType, dataType) => generateExpression(child).map(v => new V2Cast(v, dataType)) case Abs(child, true) => generateExpressionWithName("ABS", Seq(child)) case Coalesce(children) => generateExpressionWithName("COALESCE", children) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 3b226d60643..da4f9175cd5 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -1109,7 +1109,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel "CAST(BONUS AS string) LIKE '%30%', CAST(DEPT AS byte) > 1, " + "CAST(DEPT AS short) > 1, CAST(BONUS AS decimal(20,2)) > 1200.00]" } else { - "PushedFilters: [BONUS IS NOT NULL, DEPT IS NOT NULL]," + "PushedFilters: [BONUS IS NOT NULL, DEPT IS NOT NULL, CAST(BONUS AS string) LIKE '%30%']" } checkPushedInfo(df6, expectedPlanFragment6) checkAnswer(df6, Seq(Row(2, "david", 1, 1300, true))) @@ -1199,18 +1199,16 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkPushedInfo(df1, "PushedFilters: [CHAR_LENGTH(NAME) > 2],") checkAnswer(df1, Seq(Row("fred", 1), Row("mary", 2))) - withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { -val df2 = sql( - """ -|SELECT * -|FROM h2.test.people -|WHERE h2.my_strlen(CASE WHEN NAME = 'fred' THEN NAME ELSE "abc" END) > 2 + val df2 = sql( +""" + |SELECT * + |FROM h2.test.people + |WHERE h2.my_strlen(CASE WHEN NAME = 'fred' THEN NAME ELSE "abc" END) > 2 """.stripMargin) -checkFiltersRemoved(df2) -checkPushedInfo(df2, - "PushedFilters: [CHAR_LENGTH(CASE WHEN NAME = 'fred' THEN NAME ELSE 'abc' END) > 2],") -checkAnswer(df2, Seq(Row("fred", 1), Row("mary", 2))) - } + checkFiltersRemoved(df2) + checkPushedInfo(df2, +"PushedFilters: [CHAR_LENGTH(CASE WHEN NAME = 'fred' THEN NAME ELSE 'abc' END) > 2],") + checkAnswer(df2, Seq(Row("fred", 1), Row("mary", 2))) } finally { JdbcDialects.unregisterDialect(testH2Dialect) JdbcDialects.registerDialect(H2Dialect) @@ -2262,24 +2260,17 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: partial push-down AVG with overflow") { -def createDataFrame: DataFrame = spark.read - .option("partitionColumn", "id") -
[GitHub] [spark-website] srowen commented on pull request #410: Change some tags in last commit to .
srowen commented on PR #410: URL: https://github.com/apache/spark-website/pull/410#issuecomment-1205369622 Oh, BTW, have you fixed the source markdown in the Spark distro? I forgot to ask. Normally we don't edit old releases' generated docs unless it's just not working -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] MacrothT opened a new pull request, #410: Change some tags in last commit to .
MacrothT opened a new pull request, #410: URL: https://github.com/apache/spark-website/pull/410 Change level of HTML tags to match original Markdown headings that were ###. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Correct some tags/headings and add missing TOC.
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 36b5a3d4f Correct some tags/headings and add missing TOC. 36b5a3d4f is described below commit 36b5a3d4f29e88ffb3edfddfa52d8fe1c4d7f915 Author: MacrothT <109898529+macro...@users.noreply.github.com> AuthorDate: Thu Aug 4 08:02:50 2022 -0500 Correct some tags/headings and add missing TOC. Correct mal-encoding tags that caused mal-formatted HTML doc. Replace Markdown headings with HTML tags to show proper heading format. Add missing TOC. Author: MacrothT <109898529+macro...@users.noreply.github.com> Closes #409 from MacrothT/patch-1. --- site/docs/3.2.1/running-on-kubernetes.html | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/site/docs/3.2.1/running-on-kubernetes.html b/site/docs/3.2.1/running-on-kubernetes.html index aa43ebbef..039a3acb2 100644 --- a/site/docs/3.2.1/running-on-kubernetes.html +++ b/site/docs/3.2.1/running-on-kubernetes.html @@ -183,8 +183,15 @@ Future Work - Configuration + Configuration + Spark Properties + Pod Template Properties + Pod Metadata + Pod Spec + Container spec + Resource Allocation and Configuration Overview + Stage Level Scheduling Overview @@ -1446,13 +1453,13 @@ using --conf as means 3.0.0 - spark.kubernetes.executor.scheduler.name/td + spark.kubernetes.executor.scheduler.name (none) Specify the scheduler name for each executor pod. 3.0.0 -/tr + spark.kubernetes.configMap.maxSize 1572864 @@ -1571,13 +1578,13 @@ using --conf as means 3.1.3 -/table + - Pod template properties +Pod Template Properties See the below table for the full list of pod specifications that will be overwritten by spark. -### Pod Metadata +Pod Metadata Pod metadata keyModified valueDescription @@ -1613,7 +1620,7 @@ See the below table for the full list of pod specifications that will be overwri -### Pod Spec +Pod Spec Pod spec keyModified valueDescription @@ -1664,7 +1671,7 @@ See the below table for the full list of pod specifications that will be overwri -### Container spec +Container Spec The following affect the driver and executor containers. All other containers in the pod spec will be unaffected. @@ -1721,7 +1728,7 @@ The following affect the driver and executor containers. All other containers in -### Resource Allocation and Configuration Overview +Resource Allocation and Configuration Overview Please make sure to have read the Custom Resource Scheduling and Configuration Overview section on the [configuration page](configuration.html). This section only talks about the Kubernetes specific aspects of resource scheduling. @@ -1731,7 +1738,7 @@ Spark automatically handles translating the Spark configs spark.{driver/ex Kubernetes does not tell Spark the addresses of the resources allocated to each container. For that reason, the user must specify a discovery script that gets run by the executor on startup to discover what resources are available to that executor. You can find an example scripts in `examples/src/main/scripts/getGpusResources.sh`. The script must have execute permissions set and the user should setup permissions to not allow malicious users to modify it. The script should write to STDOUT [...] -### Stage Level Scheduling Overview +Stage Level Scheduling Overview Stage level scheduling is supported on Kubernetes when dynamic allocation is enabled. This also requires spark.dynamicAllocation.shuffleTracking.enabled to be enabled since Kubernetes doesn't support an external shuffle service at this time. The order in which containers for different profiles is requested from Kubernetes is not guaranteed. Note that since dynamic allocation on Kubernetes requires the shuffle tracking feature, this means that executors from previous stages t [...] Note, there is a difference in the way pod template resources are handled between the base default profile and custom ResourceProfiles. Any resources specified in the pod template file will only be used with the base default profile. If you create custom ResourceProfiles be sure to include all necessary resources there since the resources from the template file will not be propagated to custom ResourceProfiles. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #409: Correct some tags/headings and add missing TOC.
srowen closed pull request #409: Correct some tags/headings and add missing TOC. URL: https://github.com/apache/spark-website/pull/409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] MacrothT opened a new pull request, #409: Correct some tags/headings and add missing TOC.
MacrothT opened a new pull request, #409: URL: https://github.com/apache/spark-website/pull/409 Correct mal-encoding tags that caused mal-formatted HTML doc. Replace Markdown headings with HTML tags to show proper heading format. Add missing TOC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (432b667dea7 -> a7cded5dae0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 432b667dea7 [SPARK-39959][BUILD][INFRA] Pin roxygen2 version to 7.2.0 in infra add a7cded5dae0 [SPARK-39913][BUILD] Upgrade to Arrow 9.0.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2-hive-2.3 | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 pom.xml | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org