In the accumulo-env.sh, we are setting the location of HADOOP_CONF_DIR as below
and adding that to classpath.
## Accumulo logs directory. Referenced by logger config.
ACCUMULO_LOG_DIR="${ACCUMULO_LOG_DIR:-${basedir}/logs}"
## Hadoop installation
HADOOP_HOME="${HADOOP_HOME:-/opt/hadoop}"
## Hadoop configuration
HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}"
## Zookeeper installation
ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/opt/zookeeper}"
.
.
CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${ZK_JARS}:${HADOOP_HOME}/share/hadoop/client/*:${HADOOP_HOME}/share/hadoop/common/*:${HADOOP_HOME}/share/hadoop/hdfs/*"
export CLASSPATH
.
.
From: Arvind Shyamsundar <[email protected]>
Date: Friday, January 20, 2023 at 12:55 PM
To: [email protected] <[email protected]>, Samudrala, Ranganath
[USA] <[email protected]>
Subject: RE: [External] Re: Accumulo with S3
Vaguely rings a bell - in case it’s a classpath issue – double check that your
accumulo-site includes the HADOOP_CONF folder in the classpath:
https://github.com/apache/fluo-muchos/blob/3c5d48958b27a6d38226aba286f1fb275aceac90/ansible/roles/accumulo/templates/accumulo-site.xml#L95<https://urldefense.com/v3/__https:/github.com/apache/fluo-muchos/blob/3c5d48958b27a6d38226aba286f1fb275aceac90/ansible/roles/accumulo/templates/accumulo-site.xml*L95__;Iw!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pTNlaPfl$>
Arvind Shyamsundar (HE / HIM)
From: Samudrala, Ranganath [USA] via user <[email protected]>
Sent: Friday, January 20, 2023 9:46 AM
To: [email protected]
Subject: Re: [External] Re: Accumulo with S3
The logic is using “org.apache.hadoop.fs.s3a.S3AFileSystem” as we can see in
the stack trace. Shouldn’t this then be using S3 related configuration in
HADOOP_CONF_DIR? In Hadoop’s core-site.xml, we have the S3 related
configuration parameters as below:
<property>
<name>fs.s3a.endpoint</name>
<value>http://accumulo-minio:9000</value<https://urldefense.com/v3/__http:/accumulo-minio:9000*3c/value__;JQ!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pYwwuuI1$>>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>YYYYYYY</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXXXXXX</value>
</property>
So, why do we need to create AWS credentials file? Where do we create it and
what is the format?
Thanks
Ranga
From: Christopher <[email protected]<mailto:[email protected]>>
Date: Friday, January 20, 2023 at 12:19 PM
To: accumulo-user <[email protected]<mailto:[email protected]>>,
Samudrala, Ranganath [USA]
<[email protected]<mailto:[email protected]>>
Subject: [External] Re: Accumulo with S3
Based on the error message, it looks like you might need to configure each of
the Accumulo nodes with the AWS credentials file.
On Fri, Jan 20, 2023, 11:43 Samudrala, Ranganath [USA] via user
<[email protected]<mailto:[email protected]>> wrote:
Hello again!
Next problem I am facing is configuring Minio S3 with Accumulo. I am referring
to this document:
https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2Faccumulo.apache.org*2Fblog*2F2019*2F09*2F10*2Faccumulo-S3-notes.html__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxmZ5CmUU*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=kTllBwgw*2BHw2zPqRZm9Q*2FKcDHhCq2XqoWTbo72nyQqY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0paNfNZ1j$>
I have already invoked the command “accumulo init” with and without the option
“–upload-accumulo-props” and using accumulo.properties as below:
instance.volumes=
hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
instance.zookeeper.host=accumulo-zookeeper
general.volume.chooser=org.apache.accumulo.core.spi.fs.PreferredVolumeChooser
general.custom.volume.preferred.logger=hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
general.custom.volume.preferred.default=
hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
Next, when I run the command “accumulo init –add-volumes” with
accumulo.properties is as below:
instance.volumes=s3a://minio-s3/accumulo,hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
instance.zookeeper.host=accumulo-zookeeper
general.volume.chooser=org.apache.accumulo.core.spi.fs.PreferredVolumeChooser
general.custom.volume.preferred.logger=hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
general.custom.volume.preferred.default=s3a://minio-s3/accumulo
I see error as below:
ERROR StatusLogger An exception occurred processing Appender MonitorLog
java.lang.RuntimeException: Can't tell if Accumulo is initialized; can't read
instance id at s3a://minio-s3/accumulo/instance_id
at
org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:229)
at
org.apache.accumulo.server.ServerInfo.<init>(ServerInfo.java:102)
at
org.apache.accumulo.server.ServerContext.<init>(ServerContext.java:106)
at
org.apache.accumulo.monitor.util.logging.AccumuloMonitorAppender.lambda$new$1(AccumuloMonitorAppender.java:93)
at
org.apache.accumulo.monitor.util.logging.AccumuloMonitorAppender.append(AccumuloMonitorAppender.java:111)
at
org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
at
org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at
org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at
org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at
org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at
org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at
org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at
org.apache.logging.log4j.core.config.LoggerConfig.logParent(LoggerConfig.java:674)
at
org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:643)
at
org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at
org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:612)
at
org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:98)
at
org.apache.logging.log4j.core.async.AsyncLogger.actualAsyncLog(AsyncLogger.java:488)
at
org.apache.logging.log4j.core.async.RingBufferLogEvent.execute(RingBufferLogEvent.java:156)
at
org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:51)
at
org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:29)
at
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.nio.file.AccessDeniedException:
s3a://minio-s3/accumulo/instance_id: listStatus on
s3a://minio-s3/accumulo/instance_id:
com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you
provided does not exist in our records. (Service: Amazon S3; Status Code: 403;
Error Code: InvalidAccessKeyId; Request ID: 71HC1ZM3D43W0H67; S3 Extended
Request ID:
OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=;
Proxy: null), S3 Extended Request ID:
OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=:InvalidAccessKeyId
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:255)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$21(S3AFileSystem.java:3263)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3262)
at
org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:211)
... 23 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access
Key Id you provided does not exist in our records. (Service: Amazon S3; Status
Code: 403; Error Code: InvalidAccessKeyId; Request ID: 71HC1ZM3D43W0H67; S3
Extended Request ID:
OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=;
Proxy: null), S3 Extended Request ID:
OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$11(S3AFileSystem.java:2595)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2586)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2153)
at
org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
... 1 more
When I invoke commands from HDFS, I see no problems though:
* hdfs dfs -fs s3a://minio-s3 -ls /
2023-01-20 16:38:51,319 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Sanitizing
XML document destined for handler class
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager]:
Connection [id: 0][route:
{}->http://accumulo-minio:9000<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxtBLkmDw*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=ggWzEX3hdXN*2Fx*2FBKB*2BfHhu8MIYjUVPqi2*2Bp8PuBtgrA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pUs3Eihy$>]
can be kept alive for 60.0 seconds
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.thirdparty.apache.http.impl.conn.DefaultManagedHttpClientConnection]:
http-outgoing-0: set socket timeout to 0
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager]:
Connection released: [id: 0][route:
{}->http://accumulo-minio:9000][total<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000**AAtotal__*3BXVs!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxg0_Hq0M*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=2Z3tzhihfDQEOmebKYRmxNPnN1QzgInxwldR6*2FxCsQ4*3D&reserved=0__;JSUlJSUlJSUqKiUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pTRIEhuu$>
available: 1; route allocated: 1 of 128; total allocated: 1 of 128]
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Parsing XML
response document with handler: class
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
2023-01-20 16:38:51,328 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Examining
listing for bucket: minio-s3
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.request]: Received successful response: 200, AWS Request ID:
173C11CC6FEF29A0
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.requestId]: x-amzn-RequestId: not available
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.requestId]: AWS Request ID: 173C11CC6FEF29A0
2023-01-20 16:38:51,338 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1]
[com.amazonaws.latency]: ServiceName=[Amazon S3], StatusCode=[200],
ServiceEndpoint=[http://accumulo-minio:9000<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxtBLkmDw*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=ggWzEX3hdXN*2Fx*2FBKB*2BfHhu8MIYjUVPqi2*2Bp8PuBtgrA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pUs3Eihy$>],
RequestType=[ListObjectsV2Request], AWSRequestID=[173C11CC6FEF29A0],
HttpClientPoolPendingCount=0, RetryCapacityConsumed=0,
HttpClientPoolAvailableCount=0, RequestCount=1, HttpClientPoolLeasedCount=0,
ResponseProcessingTime=[71.198], ClientExecuteTime=[297.496],
HttpClientSendRequestTime=[7.255], HttpRequestTime=[119.87],
ApiCallLatency=[279.779], RequestSigningTime=[56.006],
CredentialsRequestTime=[5.091, 0.015], HttpClientReceiveResponseTime=[12.849]