Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on code in PR #47543:
URL: https://github.com/apache/spark/pull/47543#discussion_r1697826152


##
.github/workflows/build_and_test.yml:
##
@@ -135,6 +135,12 @@ jobs:
 IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME"
 echo "image_url=$IMG_URL" >> $GITHUB_OUTPUT
 
+  maven-test:

Review Comment:
   will revert after maven test pass



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on code in PR #47543:
URL: https://github.com/apache/spark/pull/47543#discussion_r1697851854


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala:
##
@@ -17,20 +17,53 @@
 
 package org.apache.spark.sql.execution.datasources.orc
 
+import java.lang.invoke.MethodHandles
+import java.util.{Map => JMap}
 import java.util.Random
 
-import org.apache.orc.impl.HadoopShimsFactory
+import scala.collection.mutable
 
+import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider}
+
+import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.SharedSparkSession
 
 class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
   import testImplicits._
 
+  override def sparkConf: SparkConf = {
+super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", 
"test:///")
+  }
+
+  override def beforeAll(): Unit = {
+// Backup `CryptoUtils#keyProviderCache` and clear it.
+keyProviderCacheRef.entrySet()

Review Comment:
   At this time, the content of `keyProviderCacheRef` might be 
   
   ```
   hadoop.java.security.SecureRandom -> org.apache.orc.impl.NullKeyProvider
   ```.
   
   



##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala:
##
@@ -17,20 +17,53 @@
 
 package org.apache.spark.sql.execution.datasources.orc
 
+import java.lang.invoke.MethodHandles
+import java.util.{Map => JMap}
 import java.util.Random
 
-import org.apache.orc.impl.HadoopShimsFactory
+import scala.collection.mutable
 
+import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider}
+
+import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.SharedSparkSession
 
 class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
   import testImplicits._
 
+  override def sparkConf: SparkConf = {
+super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", 
"test:///")
+  }
+
+  override def beforeAll(): Unit = {
+// Backup `CryptoUtils#keyProviderCache` and clear it.
+keyProviderCacheRef.entrySet()

Review Comment:
   At this time, the content of `keyProviderCacheRef` might be 
   
   ```
   hadoop.java.security.SecureRandom -> org.apache.orc.impl.NullKeyProvider
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on code in PR #47543:
URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala:
##
@@ -17,20 +17,53 @@
 
 package org.apache.spark.sql.execution.datasources.orc
 
+import java.lang.invoke.MethodHandles
+import java.util.{Map => JMap}
 import java.util.Random
 
-import org.apache.orc.impl.HadoopShimsFactory
+import scala.collection.mutable
 
+import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider}
+
+import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.SharedSparkSession
 
 class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
   import testImplicits._
 
+  override def sparkConf: SparkConf = {
+super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", 
"test:///")
+  }
+
+  override def beforeAll(): Unit = {
+// Backup `CryptoUtils#keyProviderCache` and clear it.
+keyProviderCacheRef.entrySet()
+  .forEach(e => keyProviderCacheBackup.put(e.getKey, e.getValue))
+keyProviderCacheRef.clear()
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+// Restore `CryptoUtils#keyProviderCache`.
+keyProviderCacheRef.clear()

Review Comment:
   At this time, the content of `keyProviderCacheRef` might be
   
   ```
   memory.java.security.SecureRandom -> org.apache.orc.InMemoryKeystore
   hadoop.java.security.SecureRandom -> org.apache.orc.impl.KeyProviderImpl
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on code in PR #47543:
URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala:
##
@@ -17,20 +17,53 @@
 
 package org.apache.spark.sql.execution.datasources.orc
 
+import java.lang.invoke.MethodHandles
+import java.util.{Map => JMap}
 import java.util.Random
 
-import org.apache.orc.impl.HadoopShimsFactory
+import scala.collection.mutable
 
+import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider}
+
+import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.SharedSparkSession
 
 class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
   import testImplicits._
 
+  override def sparkConf: SparkConf = {
+super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", 
"test:///")
+  }
+
+  override def beforeAll(): Unit = {
+// Backup `CryptoUtils#keyProviderCache` and clear it.
+keyProviderCacheRef.entrySet()
+  .forEach(e => keyProviderCacheBackup.put(e.getKey, e.getValue))
+keyProviderCacheRef.clear()
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+// Restore `CryptoUtils#keyProviderCache`.
+keyProviderCacheRef.clear()

Review Comment:
   At this time, the content of `keyProviderCacheRef` might be
   
   ```
   memory.java.security.SecureRandom -> org.apache.orc.InMemoryKeystore
   hadoop.java.security.SecureRandom -> org.apache.orc.impl.KeyProviderImpl
   ```
   
   
![image](https://github.com/user-attachments/assets/e14d70e5-5936-480c-a715-32c466284ce6)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259618173

   For other test cases, such as `OrcV1QuerySuite`, the code has been modified 
to print the contents of `CryptoUtils#keyProviderCache` in `afterAll`.
   
   Before this PR, the contents of `CryptoUtils#keyProviderCache` were:
   
   ```
   hadoop.java.security.SecureRandom -> 
org.apache.orc.impl.KeyProviderImpl@17fade83
   ```
   
   After this PR, the contents of `CryptoUtils#keyProviderCache` are:
   
   ```
   hadoop.java.security.SecureRandom -> 
org.apache.orc.impl.NullKeyProvider@54f3abc2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259623836

   cc @dongjoon-hyun I'm not sure if the original intention of adding 
`spark.hadoop.hadoop.security.key.provider.path` in `SparkBuild.scala` was to 
have it take effect globally, so please let me know if we do not need this pr. 
Thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259627620

   > To achieve this, the pr also refactors `OrcEncryptionSuite`:
   > 1. Overrides `beforeAll` to back up the contents of 
`CryptoUtils#keyProviderCache`.
   > 2. Overrides `afterAll` to restore the contents of 
`CryptoUtils#keyProviderCache`.
   > 
   > This ensures that `CryptoUtils#keyProviderCache` is isolated during the 
test process of `OrcEncryptionSuite`.
   
   Without this change, `OrcEncryptionSuite` would fail because, after this PR, 
the implementation corresponding to key `hadoop.java.security.SecureRandom` in 
`CryptoUtils#keyProviderCache` has been designated as 
`org.apache.orc.impl.NullKeyProvider`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


dongjoon-hyun commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259746801

   Thank you. Will take a look before the end of this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


JeonDaehong commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749276

   Hello, I am Mr. Jeon, a developer in Korea.
   I would like to become a Committer for Apache Spark. Could you please give 
me some advice on how to achieve this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749298

   Thanks @dongjoon-hyun  ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2260973317

   > +1, LGTM, @LuciferYang . Please don't forget to remove 
`build_and_test.yml` change~
   
   done :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


dongjoon-hyun commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2261208315

   The failure of `SorterSuite` is a known flaky one.
   ```
   [info] *** 1 SUITE ABORTED ***
   [error] Error: Total 4117, Failed 0, Errors 1, Passed 4116, Ignored 8, 
Canceled 2
   [error] Error during tests:
   [error]  org.apache.spark.util.collection.SorterSuite
   [error] (core / Test / test) sbt.TestsFailedException: Tests unsuccessful
   [error] Total time: 3192 s (53:12), completed Jul 31, 2024, 6:12:53 PM
   ```
   
   Let me merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


dongjoon-hyun closed pull request #47543: [SPARK-49066][SQL][TESTS] Refactor 
`OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` 
effective only within `OrcEncryptionSuite`
URL: https://github.com/apache/spark/pull/47543


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


dongjoon-hyun commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2261211466

   Merged to master/3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


dongjoon-hyun commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-226163

   This is reverted from branch-3.5 due to the compilation failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-31 Thread via GitHub


LuciferYang commented on PR #47543:
URL: https://github.com/apache/spark/pull/47543#issuecomment-2261831205

   OK, I will submit a PR for branch-3.5
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org