[GitHub] [spark] vmalakhin commented on pull request #34383: SPARK-37102: Hadoop Cloud: removed redundant exclusions

GitBox Mon, 25 Oct 2021 13:33:42 -0700


vmalakhin commented on pull request #34383:
URL: https://github.com/apache/spark/pull/34383#issuecomment-951302403



   > @vmalakhin can you put more details in the PR description?
   > 
   > > Redundant exclusions were removed for hadoop-cloud module
   > 
   > This doesn't fit the description "What changes were proposed in this pull 
request"
   > 
   > > Currently Hadoop ABFS connector (for Azure Data Lake Storage Gen2) is 
broken due to missing dependency.
   > 
   > Hm can you share more details? what missing dependency and how is that 
related to Spark?
   > 
   > > So the only change is inclusion of jackson-mapper-asl-1.9.13.jar.
   > 
   > the PR restores transitive dependency for `jackson-mapper-asl`, 
`jackson-core-asl`, and `jackson-core`. Do we need the other 2?
   > 
   > also cc @steveloughran
   
   OK - there are some details posted under SPARK-37102, but if I try to access 
ADLS Gen2 then following exception happens:
   ```
   >>> df=sqlContext.read.parquet("new_test")                            
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/pyspark/sql/readwriter.py", 
line 361, in parquet
       return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
     File 
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py",
 line 1309, in __call__
     File 
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/pyspark/sql/utils.py", line 
178, in deco
       return f(*a, **kw)
     File 
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py",
 line 326, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o30.parquet.
   : java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.parseListFilesResponse(AbfsHttpOperation.java:508)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:374)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:274)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:205)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:181)
           at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:454)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:179)
           at 
org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:301)
           at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:957)
           at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:927)
           at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:909)
           at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:406)
           at 
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
           at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$1(HadoopFSUtils.scala:95)
           at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
           at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at scala.collection.TraversableLike.map(TraversableLike.scala:286)
           at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
           at scala.collection.AbstractTraversable.map(Traversable.scala:108)
           at 
org.apache.spark.util.HadoopFSUtils$.parallelListLeafFilesInternal(HadoopFSUtils.scala:85)
           at 
org.apache.spark.util.HadoopFSUtils$.parallelListLeafFiles(HadoopFSUtils.scala:69)
           at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.bulkListLeafFiles(InMemoryFileIndex.scala:158)
           at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.listLeafFiles(InMemoryFileIndex.scala:131)
           at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.refresh0(InMemoryFileIndex.scala:94)
           at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:66)
           at 
org.apache.spark.sql.execution.datasources.DataSource.createInMemoryFileIndex(DataSource.scala:567)
           at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:409)
           at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:227)
           at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:209)
           at scala.Option.getOrElse(Option.scala:189)
           at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:209)
           at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:553)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: java.lang.ClassNotFoundException: 
org.codehaus.jackson.map.ObjectMapper
           at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
           at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
           ... 46 more
   ```
   So org.codehaus.jackson.map.ObjectMapper related jar is not presented on 
class path (ie under jars dir). 
   I've compared jars outputs for ```./dev/make-distribution.sh --name 
custom-spark-default --tgz --pip  -Pkubernetes -Phadoop-cloud```  build 
configuration and the only different is just jackson-mapper-asl-1.9.13.jar. So 
I can limit the change only to this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vmalakhin commented on pull request #34383: SPARK-37102: Hadoop Cloud: removed redundant exclusions

Reply via email to