Hi all I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that. Do I need to add a third party jar to support new comitters?
java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <murat.migdiso...@gmail.com> wrote: > Hello all, > we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard > is enabled. > We are using hadoop 3.2.1 with spark 2.4.5. > > When I try to save a dataframe in parquet format, I get the following > exception: > java.lang.ClassNotFoundException: > com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol > > My relevant spark configurations are as following: > > "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory", > "fs.s3a.committer.name": "magic", > "fs.s3a.committer.magic.enabled": true, > "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem", > > While spark streaming fails with the exception above, apache beam succeeds > writing parquet files. > What might be the problem? > > Thanks in advance > > > -- > "Talkers aren’t good doers. Rest assured that we’re going there to use > our hands, not our tongues." > W. Shakespeare > -- "Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues." W. Shakespeare