Hi Guy Shilo, Thanks for trying out Ozone!
>From the stack trace, it looks like Ozone Manager is unable to allocate blocks. Can you check if you are able to write to ozone by trying `hdfs dfs -put` on o3fs ? Thanks, Vivek Subramanian On Tue, Aug 4, 2020 at 8:22 AM guy shilo <ni...@hotmail.com> wrote: > Hello > > I am testing Ozone and facing some problems. The part in Ozone I am > interested in is OzoneFS since I work with Hadoop. > I installed the latest Cloudera distribution which includes Ozone 0.5, > followed the instructions and now I am able to run any hdfs dfs commands > against Ozone and it looks great. > However, when I try to take it a step further and run YARN application on > top of it (MR or Spark jobs) I get errors. > > I did add this configuration in spark: > spark.yarn.access.hadoopFileSystems=o3fs://[bucket].[volume].[hostname]:[port] > > I tried to google the error messages but did not find useful information > how to make it work. > The documentation does not mention any additional configuration that > should be done in YARN or Spark. So, I assumed it will work seamlessly. > > What am I missing ? Can YARN work on top of OzoneFS ? > > Thank you > > Guy Shilo > > Here are the errors: > > 20/08/04 13:49:50 WARN io.KeyOutputStream: Encountered exception > java.io.IOException: Unexpected Storage Container Exception: > java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.GroupMismatchException: > 219b808a-b83b-459b-936f-3c57e9a9aa0e: group-62B2B3BD903A not found. on the > pipeline Pipeline[ Id: 3a7640a2-e590-4cf0-8a6a-62b2b3bd903a, Nodes: > 0adf0299-2fff-4d8f-acb1-f3eec172a33f{ip: 192.168.171.132, host: > cloudera4.lan, networkLocation: /default-rack, certSerialId: > null}5df9cf32-f15b-423a-86e1-d704f518422e{ip: 192.168.171.128, host: > cloudera2.lan, networkLocation: /default-rack, certSerialId: > null}219b808a-b83b-459b-936f-3c57e9a9aa0e{ip: 192.168.171.129, host: > cloudera3.lan, networkLocation: /default-rack, certSerialId: null}, > Type:RATIS, Factor:THREE, State:OPEN, > leaderId:0adf0299-2fff-4d8f-acb1-f3eec172a33f, > CreationTimestamp2020-08-04T10:37:10.893Z]. The last committed block length > is 0, uncommitted data length is 576985 retry count 0 > 20/08/04 13:49:50 INFO io.BlockOutputStreamEntryPool: Allocating block > with ExcludeList {datanodes = [], containerIds = [], pipelineIds = > [PipelineID=3a7640a2-e590-4cf0-8a6a-62b2b3bd903a]} > 20/08/04 13:51:13 WARN io.KeyOutputStream: Encountered exception > java.io.IOException: Unexpected Storage Container Exception: > java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.GroupMismatchException: > 219b808a-b83b-459b-936f-3c57e9a9aa0e: group-BD900690DB5D not found. on the > pipeline Pipeline[ Id: 64900222-0e69-43c4-99ce-bd900690db5d, Nodes: > 0adf0299-2fff-4d8f-acb1-f3eec172a33f{ip: 192.168.171.132, host: > cloudera4.lan, networkLocation: /default-rack, certSerialId: > null}219b808a-b83b-459b-936f-3c57e9a9aa0e{ip: 192.168.171.129, host: > cloudera3.lan, networkLocation: /default-rack, certSerialId: > null}5df9cf32-f15b-423a-86e1-d704f518422e{ip: 192.168.171.128, host: > cloudera2.lan, networkLocation: /default-rack, certSerialId: null}, > Type:RATIS, Factor:THREE, State:OPEN, > leaderId:219b808a-b83b-459b-936f-3c57e9a9aa0e, > CreationTimestamp2020-08-04T10:49:13.381Z]. The last committed block length > is 0, uncommitted data length is 576985 retry count 0 > 20/08/04 13:51:13 INFO io.BlockOutputStreamEntryPool: Allocating block > with ExcludeList {datanodes = [], containerIds = [], pipelineIds = > [PipelineID=3a7640a2-e590-4cf0-8a6a-62b2b3bd903a, > PipelineID=64900222-0e69-43c4-99ce-bd900690db5d]} > 20/08/04 13:51:13 INFO yarn.Client: Deleted staging directory > o3fs://bucket1.tests/user/root/.sparkStaging/application_1596537475392_0001 > 20/08/04 13:51:13 ERROR spark.SparkContext: Error initializing > SparkContext. > java.io.IOException: INTERNAL_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: > Allocated 0 blocks. Requested 1 blocks > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:402) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:347) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:458) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:509) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:70) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:415) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:372) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:687) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:905) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:180) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:505) > at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2527) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: > Allocated 0 blocks. Requested 1 blocks > at org.apache.hadoop.ozone.om > .protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:816) > at org.apache.hadoop.ozone.om > .protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:848) > at org.apache.hadoop.ozone.client.io > .BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281) > at org.apache.hadoop.ozone.client.io > .BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208) > ... 38 more > 20/08/04 13:51:13 INFO server.AbstractConnector: Stopped Spark@10ad20cb > {HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 20/08/04 13:51:13 INFO ui.SparkUI: Stopped Spark web UI at > http://cloudera2.lan:4040 > 20/08/04 13:51:13 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: > Attempted to request executors before the AM has registered! > 20/08/04 13:51:13 INFO cluster.YarnClientSchedulerBackend: Shutting down > all executors > 20/08/04 13:51:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: > Asking each executor to shut down > 20/08/04 13:51:13 INFO cluster.YarnClientSchedulerBackend: Stopped > 20/08/04 13:51:13 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 20/08/04 13:51:13 INFO memory.MemoryStore: MemoryStore cleared > 20/08/04 13:51:13 INFO storage.BlockManager: BlockManager stopped > 20/08/04 13:51:13 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 20/08/04 13:51:13 WARN metrics.MetricsSystem: Stopping a MetricsSystem > that is not running > 20/08/04 13:51:13 INFO > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 20/08/04 13:51:13 INFO spark.SparkContext: Successfully stopped > SparkContext > Exception in thread "main" java.io.IOException: INTERNAL_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. > Requested 1 blocks > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:402) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:347) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:458) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:509) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:70) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:415) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:372) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:687) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:905) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:180) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:505) > at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2527) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: > Allocated 0 blocks. Requested 1 blocks > at org.apache.hadoop.ozone.om > .protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:816) > at org.apache.hadoop.ozone.om > .protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:848) > at org.apache.hadoop.ozone.client.io > .BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281) > at org.apache.hadoop.ozone.client.io > .BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208) > ... 38 more > 20/08/04 13:51:13 INFO util.ShutdownHookManager: Shutdown hook called > > >