Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697826152 ## .github/workflows/build_and_test.yml: ## @@ -135,6 +135,12 @@ jobs: IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME" echo "image_url=$IMG_URL" >> $GITHUB_OUTPUT + maven-test: Review Comment: will revert after maven test pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697851854 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution.datasources.orc +import java.lang.invoke.MethodHandles +import java.util.{Map => JMap} import java.util.Random -import org.apache.orc.impl.HadoopShimsFactory +import scala.collection.mutable +import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider} + +import org.apache.spark.SparkConf import org.apache.spark.sql.Row import org.apache.spark.sql.test.SharedSparkSession class OrcEncryptionSuite extends OrcTest with SharedSparkSession { import testImplicits._ + override def sparkConf: SparkConf = { +super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", "test:///") + } + + override def beforeAll(): Unit = { +// Backup `CryptoUtils#keyProviderCache` and clear it. +keyProviderCacheRef.entrySet() Review Comment: At this time, the content of `keyProviderCacheRef` might be ``` hadoop.java.security.SecureRandom -> org.apache.orc.impl.NullKeyProvider ```. ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution.datasources.orc +import java.lang.invoke.MethodHandles +import java.util.{Map => JMap} import java.util.Random -import org.apache.orc.impl.HadoopShimsFactory +import scala.collection.mutable +import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider} + +import org.apache.spark.SparkConf import org.apache.spark.sql.Row import org.apache.spark.sql.test.SharedSparkSession class OrcEncryptionSuite extends OrcTest with SharedSparkSession { import testImplicits._ + override def sparkConf: SparkConf = { +super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", "test:///") + } + + override def beforeAll(): Unit = { +// Backup `CryptoUtils#keyProviderCache` and clear it. +keyProviderCacheRef.entrySet() Review Comment: At this time, the content of `keyProviderCacheRef` might be ``` hadoop.java.security.SecureRandom -> org.apache.orc.impl.NullKeyProvider ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution.datasources.orc +import java.lang.invoke.MethodHandles +import java.util.{Map => JMap} import java.util.Random -import org.apache.orc.impl.HadoopShimsFactory +import scala.collection.mutable +import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider} + +import org.apache.spark.SparkConf import org.apache.spark.sql.Row import org.apache.spark.sql.test.SharedSparkSession class OrcEncryptionSuite extends OrcTest with SharedSparkSession { import testImplicits._ + override def sparkConf: SparkConf = { +super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", "test:///") + } + + override def beforeAll(): Unit = { +// Backup `CryptoUtils#keyProviderCache` and clear it. +keyProviderCacheRef.entrySet() + .forEach(e => keyProviderCacheBackup.put(e.getKey, e.getValue)) +keyProviderCacheRef.clear() +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +// Restore `CryptoUtils#keyProviderCache`. +keyProviderCacheRef.clear() Review Comment: At this time, the content of `keyProviderCacheRef` might be ``` memory.java.security.SecureRandom -> org.apache.orc.InMemoryKeystore hadoop.java.security.SecureRandom -> org.apache.orc.impl.KeyProviderImpl ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution.datasources.orc +import java.lang.invoke.MethodHandles +import java.util.{Map => JMap} import java.util.Random -import org.apache.orc.impl.HadoopShimsFactory +import scala.collection.mutable +import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider} + +import org.apache.spark.SparkConf import org.apache.spark.sql.Row import org.apache.spark.sql.test.SharedSparkSession class OrcEncryptionSuite extends OrcTest with SharedSparkSession { import testImplicits._ + override def sparkConf: SparkConf = { +super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", "test:///") + } + + override def beforeAll(): Unit = { +// Backup `CryptoUtils#keyProviderCache` and clear it. +keyProviderCacheRef.entrySet() + .forEach(e => keyProviderCacheBackup.put(e.getKey, e.getValue)) +keyProviderCacheRef.clear() +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +// Restore `CryptoUtils#keyProviderCache`. +keyProviderCacheRef.clear() Review Comment: At this time, the content of `keyProviderCacheRef` might be ``` memory.java.security.SecureRandom -> org.apache.orc.InMemoryKeystore hadoop.java.security.SecureRandom -> org.apache.orc.impl.KeyProviderImpl ``` ![image](https://github.com/user-attachments/assets/e14d70e5-5936-480c-a715-32c466284ce6) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259618173 For other test cases, such as `OrcV1QuerySuite`, the code has been modified to print the contents of `CryptoUtils#keyProviderCache` in `afterAll`. Before this PR, the contents of `CryptoUtils#keyProviderCache` were: ``` hadoop.java.security.SecureRandom -> org.apache.orc.impl.KeyProviderImpl@17fade83 ``` After this PR, the contents of `CryptoUtils#keyProviderCache` are: ``` hadoop.java.security.SecureRandom -> org.apache.orc.impl.NullKeyProvider@54f3abc2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259623836 cc @dongjoon-hyun I'm not sure if the original intention of adding `spark.hadoop.hadoop.security.key.provider.path` in `SparkBuild.scala` was to have it take effect globally, so please let me know if we do not need this pr. Thanks ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259627620 > To achieve this, the pr also refactors `OrcEncryptionSuite`: > 1. Overrides `beforeAll` to back up the contents of `CryptoUtils#keyProviderCache`. > 2. Overrides `afterAll` to restore the contents of `CryptoUtils#keyProviderCache`. > > This ensures that `CryptoUtils#keyProviderCache` is isolated during the test process of `OrcEncryptionSuite`. Without this change, `OrcEncryptionSuite` would fail because, after this PR, the implementation corresponding to key `hadoop.java.security.SecureRandom` in `CryptoUtils#keyProviderCache` has been designated as `org.apache.orc.impl.NullKeyProvider` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
dongjoon-hyun commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259746801 Thank you. Will take a look before the end of this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
JeonDaehong commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749276 Hello, I am Mr. Jeon, a developer in Korea. I would like to become a Committer for Apache Spark. Could you please give me some advice on how to achieve this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749298 Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2260973317 > +1, LGTM, @LuciferYang . Please don't forget to remove `build_and_test.yml` change~ done :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
dongjoon-hyun commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2261208315 The failure of `SorterSuite` is a known flaky one. ``` [info] *** 1 SUITE ABORTED *** [error] Error: Total 4117, Failed 0, Errors 1, Passed 4116, Ignored 8, Canceled 2 [error] Error during tests: [error] org.apache.spark.util.collection.SorterSuite [error] (core / Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 3192 s (53:12), completed Jul 31, 2024, 6:12:53 PM ``` Let me merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
dongjoon-hyun closed pull request #47543: [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` URL: https://github.com/apache/spark/pull/47543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
dongjoon-hyun commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2261211466 Merged to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
dongjoon-hyun commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-226163 This is reverted from branch-3.5 due to the compilation failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2261831205 OK, I will submit a PR for branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org