Hi, I am trying to set up Atlas on a K8s cluster in AWS with HBase backed by S3.
Everything works fine and works when I point the `hbase.rootdir` value to the local filesystem as `file:///tmp/hbase-root`. When I change this to some S3 URI `s3://bucket/path`, this fails with a very generic error when I start Atlas with: ``` 2020-03-27 08:20:55,293 INFO - [main:] ~ Loading atlas-application.properties from file:/apache-atlas-2.0.0/conf/atlas-application.properties (ApplicationProperties:123) 2020-03-27 08:20:55,299 INFO - [main:] ~ Using graphdb backend 'org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase' (ApplicationProperties:273) 2020-03-27 08:20:55,299 INFO - [main:] ~ Using storage backend 'hbase2' (ApplicationProperties:284) 2020-03-27 08:20:55,299 INFO - [main:] ~ Using index backend 'solr' (ApplicationProperties:295) 2020-03-27 08:20:55,300 INFO - [main:] ~ Setting solr-wait-searcher property 'true' (ApplicationProperties:301) 2020-03-27 08:20:55,300 INFO - [main:] ~ Setting index.search.map-name property 'false' (ApplicationProperties:305) 2020-03-27 08:20:55,301 INFO - [main:] ~ Property (set to default) atlas.graph.cache.db-cache = true (ApplicationProperties:318) 2020-03-27 08:20:55,301 INFO - [main:] ~ Property (set to default) atlas.graph.cache.db-cache-clean-wait = 20 (ApplicationProperties:318) 2020-03-27 08:20:55,301 INFO - [main:] ~ Property (set to default) atlas.graph.cache.db-cache-size = 0.5 (ApplicationProperties:318) 2020-03-27 08:20:55,301 INFO - [main:] ~ Property (set to default) atlas.graph.cache.tx-cache-size = 15000 (ApplicationProperties:318) 2020-03-27 08:20:55,301 INFO - [main:] ~ Property (set to default) atlas.graph.cache.tx-dirty-size = 120 (ApplicationProperties:318) 2020-03-27 08:20:55,316 INFO - [main:] ~ ######################################################################################## Atlas Server (STARTUP) project.name: apache-atlas project.description: Metadata Management and Data Governance Platform over Hadoop build.user: root build.epoch: 1585085591537 project.version: 2.0.0 build.version: 2.0.0 vc.revision: release vc.source.url: scm:git:git://git.apache.org/atlas.git/atlas-webapp ######################################################################################## (Atlas:215) 2020-03-27 08:20:55,316 INFO - [main:] ~ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Atlas:216) 2020-03-27 08:20:55,316 INFO - [main:] ~ Server starting with TLS ? false on port 21000 (Atlas:217) 2020-03-27 08:20:55,316 INFO - [main:] ~ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< (Atlas:218) 2020-03-27 08:20:55,961 INFO - [main:] ~ No authentication method configured. Defaulting to simple authentication (LoginProcessor:102) 2020-03-27 08:20:56,078 WARN - [main:] ~ Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (NativeCodeLoader:60) 2020-03-27 08:20:56,100 INFO - [main:] ~ Logged in user root (auth:SIMPLE) (LoginProcessor:77) 2020-03-27 08:20:56,703 INFO - [main:] ~ Not running setup per configuration atlas.server.run.setup.on.start. (SetupSteps$SetupRequired:189) 2020-03-27 08:20:58,679 WARN - [main:] ~ Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties (MetricsConfig:134) 2020-03-27 08:23:08,702 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399) java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getDeployment(HBaseStoreManager.java:358) at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getFeatures(HBaseStoreManager.java:397) at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1256) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:111) at org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraphInstance(AtlasJanusGraphDatabase.java:165) at org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraph(AtlasJanusGraphDatabase.java:263) at org.apache.atlas.repository.graph.AtlasGraphProvider.getGraphInstance(AtlasGraphProvider.java:52) at org.apache.atlas.repository.graph.AtlasGraphProvider.get(AtlasGraphProvider.java:98) at org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$b936b499.CGLIB$get$1(<generated>) at org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$b936b499$$FastClassBySpringCGLIB$$fd3f07c6.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228) ... ... ... at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:98) at org.apache.atlas.Atlas.main(Atlas.java:133) Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:732) at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getLocalKeyPartition(HBaseStoreManager.java:518) at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getDeployment(HBaseStoreManager.java:355) ... 92 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=16, exceptions: Fri Mar 27 08:20:59 UTC 2020, RpcRetryingCaller{globalStartTime=1585297258222, pause=100, maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Ma ster is initializing ... ... ... ``` The main bone of contention I think is this line `2020-03-27 08:20:58,679 WARN - [main:] ~ Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties (MetricsConfig:134)` which I do not see when running with the local rootdir. Please not that I am not running Hadoop/HDFS for now and do not intend to; we will guarantee consistency to S3 in the future using other tech or move to DynamoDB or something else. Please also note that I restart the ZK, Hbase, Atlas instance for testing numerous time during this setup, so the only persistence I have is the HBase initializaion data in stored in S3. However, the HBase master and regionserver logs themselves show no error at all when running with S3 as the rootdir. I have also dropped in an hbase shell and checked the corresponding HBase web interfaces to check the HBase with S3 is working fine. This makes me believe that there is an issue with Janusgraph <-> Hbase interaction, which I am not sure how to debug. I have solf-linked the `/atlas/conf/hbase` directory to the actual HBase conf dir at `/hbase/conf`. Any pointers will be helpful here as I think I am going in blond over here. :) Best, Krish