Re: [PR] [SPARK-47927][SQL]: Fix nullability attribute in UDF decoder [spark]
cloud-fan closed pull request #46156: [SPARK-47927][SQL]: Fix nullability attribute in UDF decoder URL: https://github.com/apache/spark/pull/46156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47927][SQL]: Fix nullability attribute in UDF decoder [spark]
cloud-fan commented on PR #46156: URL: https://github.com/apache/spark/pull/46156#issuecomment-2081341603 thanks, merging to master/3.5/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47927][SQL]: Fix nullability attribute in UDF decoder [spark]
cloud-fan commented on PR #46156: URL: https://github.com/apache/spark/pull/46156#issuecomment-2081341418 good catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write [spark]
cloud-fan commented on PR #46240: URL: https://github.com/apache/spark/pull/46240#issuecomment-2081340902 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48002][PYTHON][SS] Add test for observed metrics in PySpark StreamingQueryListener [spark]
WweiL commented on PR #46237: URL: https://github.com/apache/spark/pull/46237#issuecomment-2081327432 @HyukjinKwon I think we can merge this now : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47292][SS] safeMapToJValue should consider null typed values [spark]
WweiL commented on PR #46260: URL: https://github.com/apache/spark/pull/46260#issuecomment-2081317940 CC @HeartSaVioR PTAL, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47292][SS] safeMapToJValue should consider null typed values [spark]
WweiL opened a new pull request, #46260: URL: https://github.com/apache/spark/pull/46260 ### What changes were proposed in this pull request? Additional null check to the `safeMapToJValue`. Normally we won't create a `StreamingQueryProgress` with map fields as null. It is also very unlikely in Spark Connect, but it is theoretically possible because we send the json directly in Spark Connect, so add this check for additional safety to not crash the server. ### Why are the changes needed? Minor bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args [spark]
panbingkun commented on PR #46259: URL: https://github.com/apache/spark/pull/46259#issuecomment-2081311526 > We can manually verify it through Maven test `build/mvn test -pl mllib-local`: > > Before > > ![image](https://private-user-images.githubusercontent.com/1475305/326227170-1c002f85-175e-4554-a5a5-b05eab244f9c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTQyNzU2NzQsIm5iZiI6MTcxNDI3NTM3NCwicGF0aCI6Ii8xNDc1MzA1LzMyNjIyNzE3MC0xYzAwMmY4NS0xNzVlLTQ1NTQtYTVhNS1iMDVlYWIyNDRmOWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MjhUMDMzNjE0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YTlhOGI1MTJjMWU0MjE3M2Y3MmNjNGVhNGIwYmQyODc3MzJmN2MzZDVmYTJkZDM4NjNlMDViMzFmZTBkY2FmOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ._Ae8O1t1Ofi7WlGtxQmaA_UWJxjm2XvAIiv65vMSfgs) > > there is a WARNING message: `警告: Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS` > > After > > ![image](https://private-user-images.githubusercontent.com/1475305/326227204-a83b89c0-944d-45ce-9b96-572448d5d97e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTQyNzU2NzQsIm5iZiI6MTcxNDI3NTM3NCwicGF0aCI6Ii8xNDc1MzA1LzMyNjIyNzIwNC1hODNiODljMC05NDRkLTQ1Y2UtOWI5Ni01NzI0NDhkNWQ5N2UucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MjhUMDMzNjE0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NDAyMWRmZDc1YzQwZmVlYzNkZmFlMjY1NGZkZGQ3YjMxM2IwZGZlODk0YTI0MTFhYTg1NDFiOTZlMDgzYzZjOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.HjdC3RXp4IUCcRFBr5Oo_viTQzeQTxo4ceNqZHCMaqs) > > no WARNING message related to `VectorBLAS` Yeah -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args [spark]
LuciferYang commented on PR #46259: URL: https://github.com/apache/spark/pull/46259#issuecomment-2081310548 We can manually verify it through Maven test: Before ![image](https://github.com/apache/spark/assets/1475305/1c002f85-175e-4554-a5a5-b05eab244f9c) there is a WARNING message: `警告: Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS` After ![image](https://github.com/apache/spark/assets/1475305/a83b89c0-944d-45ce-9b96-572448d5d97e) no WARNING message related to `VectorBLAS` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48019] Fix incorrect behavior in ColumnVector/ColumnarArray with dictionary and nulls [spark]
cloud-fan closed pull request #46254: [SPARK-48019] Fix incorrect behavior in ColumnVector/ColumnarArray with dictionary and nulls URL: https://github.com/apache/spark/pull/46254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48019] Fix incorrect behavior in ColumnVector/ColumnarArray with dictionary and nulls [spark]
cloud-fan commented on PR #46254: URL: https://github.com/apache/spark/pull/46254#issuecomment-2081305430 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args [spark]
LuciferYang commented on PR #46259: URL: https://github.com/apache/spark/pull/46259#issuecomment-2081303527 Yes, we should keep `JavaModuleOptions`, `extraTestJavaArgs` in `SparkBuild.scala`, and `extraTestJavaArgs` in `pom.xml` consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args [spark]
panbingkun commented on PR #46259: URL: https://github.com/apache/spark/pull/46259#issuecomment-2081303045 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
panbingkun commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2081300766 > @panbingkun we should add `--add-modules=jdk.incubator.vector` to `extraJavaTestArgs ` in `pom.xml` too > > https://github.com/apache/spark/blob/64d321926bbcede05d1c145405d503b3431f185b/pom.xml#L305-L323 Okay, let me do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48011][Core] Store LogKey name as a value to avoid generating new string instances [spark]
LuciferYang commented on PR #46249: URL: https://github.com/apache/spark/pull/46249#issuecomment-2081298871 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
LuciferYang commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2081298219 @panbingkun we should add `--add-modules=jdk.incubator.vector` to `pom.xml` too https://github.com/apache/spark/blob/64d321926bbcede05d1c145405d503b3431f185b/pom.xml#L305-L323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [Only for check Docker Image] Check installed packages on ubuntu 22.04 [spark]
panbingkun opened a new pull request, #46258: URL: https://github.com/apache/spark/pull/46258 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47516][INFRA] Move `remove unused installation package logic` from `each test job` to `create the docker image` [spark]
panbingkun commented on PR #45659: URL: https://github.com/apache/spark/pull/45659#issuecomment-2081287287 > @panbingkun > > Hi, bingkun, when rebuild the image in https://github.com/zhengruifeng/spark/actions/runs/8857365994/job/24324764602 > > I see such warnings: > > ``` > #35 [29/31] RUN apt-get remove --purge -y '^aspnet.*' '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' snapd google-chrome-stable microsoft-edge-stable firefox azure-cli google-cloud-sdk mono-devel powershell libgl1-mesa-dri || true > #35 0.489 Reading package lists... > #35 0.505 Building dependency tree... > #35 0.507 Reading state information... > #35 0.511 E: Unable to locate package ^aspnet.* > #35 0.511 E: Couldn't find any package by glob '^aspnet.*' > #35 0.511 E: Couldn't find any package by regex '^aspnet.*' > #35 0.511 E: Unable to locate package ^dotnet-.* > #35 0.511 E: Couldn't find any package by glob '^dotnet-.*' > #35 0.511 E: Couldn't find any package by regex '^dotnet-.*' > #35 0.511 E: Unable to locate package ^llvm-.* > #35 0.511 E: Couldn't find any package by glob '^llvm-.*' > #35 0.511 E: Couldn't find any package by regex '^llvm-.*' > #35 0.511 E: Unable to locate package ^mongodb-.* > #35 0.511 E: Couldn't find any package by glob '^mongodb-.*' > #35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed > #35 0.511 Package 'php' is not installed, so not removed > #35 0.511 : Couldn't find any package by regex '^mongodb-.*' > #35 0.511 E: Unable to locate package snapd > #35 0.511 E: Unable to locate package google-chrome-stable > #35 0.511 E: Unable to locate package microsoft-edge-stable > #35 0.511 E: Unable to locate package firefox > #35 0.511 E: Unable to locate package azure-cli > #35 0.511 E: Unable to locate package google-cloud-sdk > #35 0.511 E: Unable to locate package mono-devel > #35 0.511 E: Unable to locate package powershell > #35 DONE 0.5s > > #36 [30/31] RUN apt-get autoremove --purge -y > #36 0.063 Reading package lists... > #36 0.079 Building dependency tree... > #36 0.082 Reading state information... > #36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. > #36 DONE 0.4s > ``` > > would you mind help check whether this removal is needed in ubuntu 2024 Sure, let me to do it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48024][PYTHON][CONNECT][TESTS] Enable `UDFParityTests.test_udf_timestamp_ntz` [spark]
zhengruifeng opened a new pull request, #46257: URL: https://github.com/apache/spark/pull/46257 ### What changes were proposed in this pull request? Enable `UDFParityTests.test_udf_timestamp_ntz` ### Why are the changes needed? for test coverage ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? ci and manually test: ``` (spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_udf UDFParityTests.test_udf_timestamp_ntz' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_udf UDFParityTests.test_udf_timestamp_ntz'] python3 python_implementation is CPython python3 version is: Python 3.12.2 Starting test(python3): pyspark.sql.tests.connect.test_parity_udf UDFParityTests.test_udf_timestamp_ntz (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/90afedde-8472-496c-8741-a3fd5792f6e2/python3__pyspark.sql.tests.connect.test_parity_udf_UDFParityTests.test_udf_timestamp_ntz__7yrowv9l.log) Finished test(python3): pyspark.sql.tests.connect.test_parity_udf UDFParityTests.test_udf_timestamp_ntz (10s) Tests passed in 10 seconds ``` ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
dongjoon-hyun commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2081266374 Merged to master. Thank you, @panbingkun and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
dongjoon-hyun closed pull request #46246: [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` URL: https://github.com/apache/spark/pull/46246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' [spark]
dongjoon-hyun commented on PR #46256: URL: https://github.com/apache/spark/pull/46256#issuecomment-2081265499 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
zhengruifeng commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2081262259 also cc @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46744][SPARK-SHELL][SQL][CONNECT][PYTHON][R] Display clear `exit command` for all spark terminal [spark]
github-actions[bot] commented on PR #44769: URL: https://github.com/apache/spark/pull/44769#issuecomment-2081261642 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions [spark]
github-actions[bot] commented on PR #42296: URL: https://github.com/apache/spark/pull/42296#issuecomment-2081261651 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' [spark]
zhengruifeng commented on PR #46256: URL: https://github.com/apache/spark/pull/46256#issuecomment-2081261491 thank you @yaooqinn and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [Don't review, only for test][SPARK-48022][BUILD] Upgrade `jersey` to `3.1.6` [spark]
panbingkun commented on PR #46252: URL: https://github.com/apache/spark/pull/46252#issuecomment-2081261447 > The below MR may give some hints also to this ticket, bumping Jersey to v3.1.x requires all Spark to comply with EE10 standards, as I have tried during the Jetty 12 upgrade. #45500 > > In this particular case, seems Jetty and Jersey is a bundle deal. Thank you for your prompt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [Don't review, only for test][SPARK-48022][BUILD] Upgrade `jersey` to `3.1.6` [spark]
HiuKwok commented on PR #46252: URL: https://github.com/apache/spark/pull/46252#issuecomment-2081188223 The below MR may give some hints also to this ticket, bumping Jersey to v3.1.x requires all Spark to comply with EE10 standards, as I have tried during the Jetty 12 upgrade. https://github.com/apache/spark/pull/45500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
panbingkun commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2080865185 > Thank you for looking into that! Let me know what I should do to update dev.ludovic.netlib further for the needs of Spark Thank all for writing in such `detail` in the previous PR process. Because of this, I can easily analyze and trace the details of history. ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' [spark]
yaooqinn closed pull request #46256: [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' URL: https://github.com/apache/spark/pull/46256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' [spark]
yaooqinn commented on PR #46256: URL: https://github.com/apache/spark/pull/46256#issuecomment-2080844028 Thank you @zhengruifeng @HyukjinKwon Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
luhenry commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2080831209 Thank you for looking into that! Let me know what I should do to update dev.ludovic.netlib further for the needs of Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
panbingkun commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2080813720 > Before this flag was gated on Java 21 - it's OK to set this on earlier versions? OK if so Yes, the JDK version of the above manual test environment (local) is `17`. https://github.com/apache/spark/assets/15246973/dba0297a-e51e-49a2-bdf6-f4268cb51c34;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
srowen commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2080797254 Before this flag was gated on Java 21 - it's OK to set this on earlier versions? OK if so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` [spark]
panbingkun commented on PR #46246: URL: https://github.com/apache/spark/pull/46246#issuecomment-2080762133 cc @luhenry @srowen @zhengruifeng @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]
jshmchenxi commented on code in PR #46149: URL: https://github.com/apache/spark/pull/46149#discussion_r1581765866 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala: ## @@ -35,7 +35,9 @@ import org.apache.spark.util.Utils class BasicDriverFeatureStepSuite extends SparkFunSuite { - private val CUSTOM_DRIVER_LABELS = Map("labelkey" -> "labelvalue") + private val CUSTOM_DRIVER_LABELS = Map( +"labelkey" -> "labelvalue", +"yunikorn.apache.org/app-id" -> "{{APPID}}") Review Comment: Understood. Actually the label key used here can be any string. I've updated it to use a general label key as well as a general annotation key in this test. Also fixed the typo `APPID` -> `APP_ID`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]
jshmchenxi commented on PR #46149: URL: https://github.com/apache/spark/pull/46149#issuecomment-2080412508 It's been a busy week, sorry for the delay. I'll address your comments today, thanks! @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' [spark]
zhengruifeng opened a new pull request, #46256: URL: https://github.com/apache/spark/pull/46256 ### What changes were proposed in this pull request? 1, pin 'pandas==2.2.2' for `pypy3.9` 2, also change `pandas<=2.2.2` to 'pandas==2.2.2' to avoid unexpected version installation (e.g. for pypy3.8 `pandas<=2.2.2` actually installs version 2.0.3) ### Why are the changes needed? pypy had been upgraded ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47516][INFRA] Move `remove unused installation package logic` from `each test job` to `create the docker image` [spark]
zhengruifeng commented on PR #45659: URL: https://github.com/apache/spark/pull/45659#issuecomment-2080386388 @panbingkun Hi, bingkun, when rebuild the image in https://github.com/zhengruifeng/spark/actions/runs/8857365994/job/24324764602 I see such warnings: ``` #35 [29/31] RUN apt-get remove --purge -y '^aspnet.*' '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' snapd google-chrome-stable microsoft-edge-stable firefox azure-cli google-cloud-sdk mono-devel powershell libgl1-mesa-dri || true #35 0.489 Reading package lists... #35 0.505 Building dependency tree... #35 0.507 Reading state information... #35 0.511 E: Unable to locate package ^aspnet.* #35 0.511 E: Couldn't find any package by glob '^aspnet.*' #35 0.511 E: Couldn't find any package by regex '^aspnet.*' #35 0.511 E: Unable to locate package ^dotnet-.* #35 0.511 E: Couldn't find any package by glob '^dotnet-.*' #35 0.511 E: Couldn't find any package by regex '^dotnet-.*' #35 0.511 E: Unable to locate package ^llvm-.* #35 0.511 E: Couldn't find any package by glob '^llvm-.*' #35 0.511 E: Couldn't find any package by regex '^llvm-.*' #35 0.511 E: Unable to locate package ^mongodb-.* #35 0.511 E: Couldn't find any package by glob '^mongodb-.*' #35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed #35 0.511 Package 'php' is not installed, so not removed #35 0.511 : Couldn't find any package by regex '^mongodb-.*' #35 0.511 E: Unable to locate package snapd #35 0.511 E: Unable to locate package google-chrome-stable #35 0.511 E: Unable to locate package microsoft-edge-stable #35 0.511 E: Unable to locate package firefox #35 0.511 E: Unable to locate package azure-cli #35 0.511 E: Unable to locate package google-cloud-sdk #35 0.511 E: Unable to locate package mono-devel #35 0.511 E: Unable to locate package powershell #35 DONE 0.5s #36 [30/31] RUN apt-get autoremove --purge -y #36 0.063 Reading package lists... #36 0.079 Building dependency tree... #36 0.082 Reading state information... #36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. #36 DONE 0.4s ``` would you mind help check whether this removal is needed in ubuntu 2024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle [spark]
szehon-ho commented on PR #46255: URL: https://github.com/apache/spark/pull/46255#issuecomment-2080381160 Some implementation notes. SPARK-41471 works by making the ShuffleExchangeExec side of the join have a KeyGroupedPartitioning, which is created by the other side's KeyGroupedShuffleSpec and is a clone of it (with the other side's partition expression and values). That way both sides of the join have KeyGroupedPartioning and SPJ can work. Code changes: - Remove check in KeyGroupedShuffleSpec::canCreatePartitioning that allows only AttributeReference, and add support for TransformExpression - Implement TransformExpression.eval(), by re-using the code from V2ExpressionUtils. This allows the ShuffleExchangeExec to evaluate the partition key with transform expressions from each row. Some fixes: - normalize the valueMap key type in KeyGroupedPartitioner to use specific Seq implementation class. Previously the partitioner's map are initialized with keys as Vector , but then compared with keys as ArraySeq, and these seem to have different hashcodes, so will always create new entries with new partition ids. - add support in V2ExpressionUtil for Scala 'static' invoke() methods for ScalarFunctions (currently only Java static invoke() method is supported). This was needed, for example, in our test scala YearsTransform. - Change the test YearsTransform to have the same logic as the InMemoryBaseTable. This was pointed out in [SPARK-41471](https://github.com/apache/spark/pull/42194) pr. Limitations: - This feature is disabled if partiallyClustered is enabled. Partiallly clustered implies the partitioned side of the join has multiple partitions with the same value, and does not group them. Not sure at the moment, how the shuffle side partitioner on the shuffle side can handle that. - This feature is disabled if allowJoinKeysLessThanPartitionKeys is enabled and partitions are transform expressions. allowJoinKeysLessThanPartitionKeys feature works by 'grouping' the BatchScanExec's partitions again by join keys. If enabled along with this feature, there is a failure happens when checking that both sides of the join (ShuffleExchangeExec and the partitioned BatchScanExec side) have the same number of partitions. This actually works in the first optimizer pass, as ShuffleExchangeExec's KeyGroupedPartioning is created as a clone of the other side (including partition values). But after that there is a 'grouping' phase triggered here: ``` // Now we need to push-down the common partition information to the scan in each child newLeft = populateCommonPartitionInfo(left, mergedPartValues, leftSpec.joinKeyPositions, leftReducers, applyPartialClustering, replicateLeftSide) newRight = populateCommonPartitionInfo(right, mergedPartValues, rightSpec.joinKeyPositions, rightReducers, applyPartialClustering, replicateRightSide) ``` This updates the number of partitions on the BatchScanExec after the grouping by join key. But it does not update the ShuffleExchangeExec number of partitons. Hence the error in subsequent optimizer pass: ``` requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. java.lang.IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. at scala.Predef$.require(Predef.scala:337) at org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.(partitioning.scala:550) at org.apache.spark.sql.execution.joins.ShuffledJoin.outputPartitioning(ShuffledJoin.scala:49) at org.apache.spark.sql.execution.joins.ShuffledJoin.outputPartitioning$(ShuffledJoin.scala:47) at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputPartitioning(SortMergeJoinExec.scala:39) at org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$ensureDistributionAndOrdering$1(EnsureRequirements.scala:66) at scala.collection.immutable.Vector1.map(Vector.scala:2140) at scala.collection.immutable.Vector1.map(Vector.scala:385) at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:65) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:657) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:632) ``` This can be reproduced by removing this check and running the relevant unit test added in this pr. It needs more investigation to be enabled in follow up pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go
[PR] [SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle [spark]
szehon-ho opened a new pull request, #46255: URL: https://github.com/apache/spark/pull/46255 ### Why are the changes needed? Support SPJ one-side shuffle if other side has partition transform expression ### How was this patch tested? New unit test in KeyGroupedPartitioningSuite ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org