Re: Guava dependency issue
we shade guava in our fat jar/assembly jar/application jar On Tue, May 8, 2018 at 12:31 PM, Marcelo Vanzin wrote: > Using a custom Guava version with Spark is not that simple. Spark > shades Guava, but a lot of libraries Spark uses do not - the main one > being all of the Hadoop ones, and they need a quite old Guava. > > So you have two options: shade/relocate Guava in your application, or > use spark.{driver|executor}.userClassPath first. > > There really isn't anything easier until we get shaded Hadoop client > libraries... > > On Tue, May 8, 2018 at 8:44 AM, Stephen Boesch wrote: > > > > I downgraded to spark 2.0.1 and it fixed that particular runtime > exception: > > but then a similar one appears when saving to parquet: > > > > An SOF question on this was created a month ago and today further > details > > plus an open bounty were added to it: > > > > https://stackoverflow.com/questions/49713485/spark- > error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c > > > > The new but similar exception is shown below: > > > > The hack to downgrade to 2.0.1 does help - i.e. execution proceeds > further : > > but then when writing out to parquet the above error does happen. > > > > 8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 > (TID > > 2618) > > java.lang.NoSuchMethodError: > > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/ > CacheLoader;)Lcom/google/common/cache/LoadingCache; > > at > > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) > > at org.apache.hadoop.io.compress.CodecPool.(CodecPool. > java:74) > > at > > org.apache.parquet.hadoop.CodecFactory$BytesCompressor.< > init>(CodecFactory.java:92) > > at > > org.apache.parquet.hadoop.CodecFactory.getCompressor( > CodecFactory.java:169) > > at > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter( > ParquetOutputFormat.java:303) > > at > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter( > ParquetOutputFormat.java:262) > > at > > org.apache.spark.sql.execution.datasources.parquet. > ParquetOutputWriter.(ParquetFileFormat.scala:562) > > at > > org.apache.spark.sql.execution.datasources.parquet. > ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) > > at > > org.apache.spark.sql.execution.datasources.BaseWriterContainer. > newOutputWriter(WriterContainer.scala:131) > > at > > org.apache.spark.sql.execution.datasources.DefaultWriterContainer. > writeRows(WriterContainer.scala:247) > > at > > org.apache.spark.sql.execution.datasources. > InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$ > apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > > at > > org.apache.spark.sql.execution.datasources. > InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$ > apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:70) > > at org.apache.spark.scheduler.Task.run(Task.scala:86) > > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:274) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:6 > > > > > > > > 2018-05-07 10:30 GMT-07:00 Stephen Boesch : > >> > >> I am intermittently running into guava dependency issues across mutiple > >> spark projects. I have tried maven shade / relocate but it does not > resolve > >> the issues. > >> > >> The current project is extremely simple: *no* additional dependencies > >> beyond scala, spark, and scalatest - yet the issues remain (and yes mvn > >> clean was re-applied). > >> > >> Is there a reliable approach to handling the versioning for guava within > >> spark dependency projects? > >> > >> > >> [INFO] > >> > > >> [INFO] Building ccapps_final 1.0-SNAPSHOT > >> [INFO] > >> > > >> [INFO] > >> [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final --- > >> 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop > >> library for your platform... using builtin-java classes where applicable > >> [WARNING] > >> java.lang.NoSuchMethodError: > >> com.google.common.cache.CacheBuilder.refreshAfterWrite(JLjava/util/ > concurrent/TimeUnit;)Lcom/google/common/cache/CacheBuilder; > >> at org.apache.hadoop.security.Groups.(Groups.java:96) > >> at org.apache.hadoop.security.Groups.(Groups.java:73) > >> at > >> org.apache.hadoop.security.Groups.getUserToGroupsMappingService( > Groups.java:293) > >> at > >> org.apache.hadoop.security.UserGroupInformation.initialize( > UserGroupInformation.java:283) > >> at > >> org.apache.hadoop.security.UserGroupInformation.ensureInitialized( > UserGroupInformation.java:260) > >> at > >>
Re: Guava dependency issue
Using a custom Guava version with Spark is not that simple. Spark shades Guava, but a lot of libraries Spark uses do not - the main one being all of the Hadoop ones, and they need a quite old Guava. So you have two options: shade/relocate Guava in your application, or use spark.{driver|executor}.userClassPath first. There really isn't anything easier until we get shaded Hadoop client libraries... On Tue, May 8, 2018 at 8:44 AM, Stephen Boesch wrote: > > I downgraded to spark 2.0.1 and it fixed that particular runtime exception: > but then a similar one appears when saving to parquet: > > An SOF question on this was created a month ago and today further details > plus an open bounty were added to it: > > https://stackoverflow.com/questions/49713485/spark-error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c > > The new but similar exception is shown below: > > The hack to downgrade to 2.0.1 does help - i.e. execution proceeds further : > but then when writing out to parquet the above error does happen. > > 8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID > 2618) > java.lang.NoSuchMethodError: > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; > at > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) > at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) > at > org.apache.parquet.hadoop.CodecFactory$BytesCompressor.(CodecFactory.java:92) > at > org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 > > > > 2018-05-07 10:30 GMT-07:00 Stephen Boesch : >> >> I am intermittently running into guava dependency issues across mutiple >> spark projects. I have tried maven shade / relocate but it does not resolve >> the issues. >> >> The current project is extremely simple: *no* additional dependencies >> beyond scala, spark, and scalatest - yet the issues remain (and yes mvn >> clean was re-applied). >> >> Is there a reliable approach to handling the versioning for guava within >> spark dependency projects? >> >> >> [INFO] >> >> [INFO] Building ccapps_final 1.0-SNAPSHOT >> [INFO] >> >> [INFO] >> [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final --- >> 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> [WARNING] >> java.lang.NoSuchMethodError: >> com.google.common.cache.CacheBuilder.refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)Lcom/google/common/cache/CacheBuilder; >> at org.apache.hadoop.security.Groups.(Groups.java:96) >> at org.apache.hadoop.security.Groups.(Groups.java:73) >> at >> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293) >> at >> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) >> at >> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) >> at >> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789) >> at >> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) >> at >> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) >> at >> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424) >> at
Re: Guava dependency issue
I downgraded to spark 2.0.1 and it fixed that *particular *runtime exception: but then a similar one appears when saving to parquet: An SOF question on this was created a month ago and today further details plus an open bounty were added to it: https://stackoverflow.com/questions/49713485/spark-error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c The new but similar exception is shown below: The hack to downgrade to 2.0.1 does help - i.e. execution proceeds *further* : but then when writing out to *parquet* the above error does happen. 8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618) java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.(CodecFactory.java:92) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 2018-05-07 10:30 GMT-07:00 Stephen Boesch : > I am intermittently running into guava dependency issues across mutiple > spark projects. I have tried maven shade / relocate but it does not > resolve the issues. > > The current project is extremely simple: *no* additional dependencies > beyond scala, spark, and scalatest - yet the issues remain (and yes mvn > clean was re-applied). > > Is there a reliable approach to handling the versioning for guava within > spark dependency projects? > > > [INFO] > > [INFO] Building ccapps_final 1.0-SNAPSHOT > [INFO] > > [INFO] > [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final --- > 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > [WARNING] > java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder. > refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)Lcom/ > google/common/cache/CacheBuilder; > at org.apache.hadoop.security.Groups.(Groups.java:96) > at org.apache.hadoop.security.Groups.(Groups.java:73) > at org.apache.hadoop.security.Groups.getUserToGroupsMappingService( > Groups.java:293) > at org.apache.hadoop.security.UserGroupInformation.initialize( > UserGroupInformation.java:283) > at org.apache.hadoop.security.UserGroupInformation.ensureInitialized( > UserGroupInformation.java:260) > at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject( > UserGroupInformation.java:789) > at org.apache.hadoop.security.UserGroupInformation.getLoginUser( > UserGroupInformation.java:774) > at org.apache.hadoop.security.UserGroupInformation.getCurrentUser( > UserGroupInformation.java:647) > at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1. > apply(Utils.scala:2424) > at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1. > apply(Utils.scala:2424) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424) > at org.apache.spark.SparkContext.(SparkContext.scala:295) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516) > at org.apache.spark.sql.SparkSession$Builder$$anonfun$ > 6.apply(SparkSession.scala:918) > at org.apache.spark.sql.SparkSession$Builder$$anonfun$ > 6.apply(SparkSession.scala:910) > at scala.Option.getOrElse(Option.scala:121) >