Hello I am testing Ozone and facing some problems. The part in Ozone I am interested in is OzoneFS since I work with Hadoop. I installed the latest Cloudera distribution which includes Ozone 0.5, followed the instructions and now I am able to run any hdfs dfs commands against Ozone and it looks great. However, when I try to take it a step further and run YARN application on top of it (MR or Spark jobs) I get errors.
I did add this configuration in spark: spark.yarn.access.hadoopFileSystems=o3fs://[bucket].[volume].[hostname]:[port] I tried to google the error messages but did not find useful information how to make it work. The documentation does not mention any additional configuration that should be done in YARN or Spark. So, I assumed it will work seamlessly. What am I missing ? Can YARN work on top of OzoneFS ? Thank you Guy Shilo Here are the errors: 20/08/04 13:49:50 WARN io.KeyOutputStream: Encountered exception java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.GroupMismatchException: 219b808a-b83b-459b-936f-3c57e9a9aa0e: group-62B2B3BD903A not found. on the pipeline Pipeline[ Id: 3a7640a2-e590-4cf0-8a6a-62b2b3bd903a, Nodes: 0adf0299-2fff-4d8f-acb1-f3eec172a33f{ip: 192.168.171.132, host: cloudera4.lan, networkLocation: /default-rack, certSerialId: null}5df9cf32-f15b-423a-86e1-d704f518422e{ip: 192.168.171.128, host: cloudera2.lan, networkLocation: /default-rack, certSerialId: null}219b808a-b83b-459b-936f-3c57e9a9aa0e{ip: 192.168.171.129, host: cloudera3.lan, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:0adf0299-2fff-4d8f-acb1-f3eec172a33f, CreationTimestamp2020-08-04T10:37:10.893Z]. The last committed block length is 0, uncommitted data length is 576985 retry count 0 20/08/04 13:49:50 INFO io.BlockOutputStreamEntryPool: Allocating block with ExcludeList {datanodes = [], containerIds = [], pipelineIds = [PipelineID=3a7640a2-e590-4cf0-8a6a-62b2b3bd903a]} 20/08/04 13:51:13 WARN io.KeyOutputStream: Encountered exception java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.GroupMismatchException: 219b808a-b83b-459b-936f-3c57e9a9aa0e: group-BD900690DB5D not found. on the pipeline Pipeline[ Id: 64900222-0e69-43c4-99ce-bd900690db5d, Nodes: 0adf0299-2fff-4d8f-acb1-f3eec172a33f{ip: 192.168.171.132, host: cloudera4.lan, networkLocation: /default-rack, certSerialId: null}219b808a-b83b-459b-936f-3c57e9a9aa0e{ip: 192.168.171.129, host: cloudera3.lan, networkLocation: /default-rack, certSerialId: null}5df9cf32-f15b-423a-86e1-d704f518422e{ip: 192.168.171.128, host: cloudera2.lan, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:219b808a-b83b-459b-936f-3c57e9a9aa0e, CreationTimestamp2020-08-04T10:49:13.381Z]. The last committed block length is 0, uncommitted data length is 576985 retry count 0 20/08/04 13:51:13 INFO io.BlockOutputStreamEntryPool: Allocating block with ExcludeList {datanodes = [], containerIds = [], pipelineIds = [PipelineID=3a7640a2-e590-4cf0-8a6a-62b2b3bd903a, PipelineID=64900222-0e69-43c4-99ce-bd900690db5d]} 20/08/04 13:51:13 INFO yarn.Client: Deleted staging directory o3fs://bucket1.tests/user/root/.sparkStaging/application_1596537475392_0001 20/08/04 13:51:13 ERROR spark.SparkContext: Error initializing SparkContext. java.io.IOException: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:402) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:347) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:458) at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:509) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:70) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:415) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:687) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:905) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:180) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185) at org.apache.spark.SparkContext.<init>(SparkContext.scala:505) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2527) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:816) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:848) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208) ... 38 more 20/08/04 13:51:13 INFO server.AbstractConnector: Stopped Spark@10ad20cb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/08/04 13:51:13 INFO ui.SparkUI: Stopped Spark web UI at http://cloudera2.lan:4040 20/08/04 13:51:13 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 20/08/04 13:51:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 20/08/04 13:51:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/08/04 13:51:13 INFO cluster.YarnClientSchedulerBackend: Stopped 20/08/04 13:51:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/08/04 13:51:13 INFO memory.MemoryStore: MemoryStore cleared 20/08/04 13:51:13 INFO storage.BlockManager: BlockManager stopped 20/08/04 13:51:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 20/08/04 13:51:13 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running 20/08/04 13:51:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/08/04 13:51:13 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" java.io.IOException: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:402) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:347) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:458) at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:509) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:70) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:415) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:687) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:905) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:180) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185) at org.apache.spark.SparkContext.<init>(SparkContext.scala:505) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2527) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:816) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:848) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208) ... 38 more 20/08/04 13:51:13 INFO util.ShutdownHookManager: Shutdown hook called