vmalakhin commented on pull request #34383:
URL: https://github.com/apache/spark/pull/34383#issuecomment-951302403
> @vmalakhin can you put more details in the PR description?
>
> > Redundant exclusions were removed for hadoop-cloud module
>
> This doesn't fit the description "What changes were proposed in this pull
request"
>
> > Currently Hadoop ABFS connector (for Azure Data Lake Storage Gen2) is
broken due to missing dependency.
>
> Hm can you share more details? what missing dependency and how is that
related to Spark?
>
> > So the only change is inclusion of jackson-mapper-asl-1.9.13.jar.
>
> the PR restores transitive dependency for `jackson-mapper-asl`,
`jackson-core-asl`, and `jackson-core`. Do we need the other 2?
>
> also cc @steveloughran
OK - there are some details posted under SPARK-37102, but if I try to access
ADLS Gen2 then following exception happens:
```
>>> df=sqlContext.read.parquet("new_test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/pyspark/sql/readwriter.py",
line 361, in parquet
return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
File
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py",
line 1309, in __call__
File
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/pyspark/sql/utils.py", line
178, in deco
return f(*a, **kw)
File
"spark/spark-3.3.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py",
line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o30.parquet.
: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
at
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.parseListFilesResponse(AbfsHttpOperation.java:508)
at
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:374)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:274)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:205)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:181)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:454)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:179)
at
org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:301)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:957)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:927)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:909)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:406)
at
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
at
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$1(HadoopFSUtils.scala:95)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at
org.apache.spark.util.HadoopFSUtils$.parallelListLeafFilesInternal(HadoopFSUtils.scala:85)
at
org.apache.spark.util.HadoopFSUtils$.parallelListLeafFiles(HadoopFSUtils.scala:69)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.bulkListLeafFiles(InMemoryFileIndex.scala:158)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.listLeafFiles(InMemoryFileIndex.scala:131)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.refresh0(InMemoryFileIndex.scala:94)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:66)
at
org.apache.spark.sql.execution.datasources.DataSource.createInMemoryFileIndex(DataSource.scala:567)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:409)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:227)
at
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:209)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:209)
at
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:553)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException:
org.codehaus.jackson.map.ObjectMapper
at
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 46 more
```
So org.codehaus.jackson.map.ObjectMapper related jar is not presented on
class path (ie under jars dir).
I've compared jars outputs for ```./dev/make-distribution.sh --name
custom-spark-default --tgz --pip -Pkubernetes -Phadoop-cloud``` build
configuration and the only different is just jackson-mapper-asl-1.9.13.jar. So
I can limit the change only to this one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]