Hi,

I am trying to set up Atlas on a K8s cluster in AWS with HBase backed by S3.

Everything works fine and works when I point the `hbase.rootdir` value to
the local filesystem as `file:///tmp/hbase-root`.

When I change this to some S3 URI `s3://bucket/path`, this fails with a
very generic error when I start Atlas with:

```

2020-03-27 08:20:55,293 INFO  - [main:] ~ Loading
atlas-application.properties from
file:/apache-atlas-2.0.0/conf/atlas-application.properties
(ApplicationProperties:123)
2020-03-27 08:20:55,299 INFO  - [main:] ~ Using graphdb backend
'org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase'
(ApplicationProperties:273)
2020-03-27 08:20:55,299 INFO  - [main:] ~ Using storage backend 'hbase2'
(ApplicationProperties:284)
2020-03-27 08:20:55,299 INFO  - [main:] ~ Using index backend 'solr'
(ApplicationProperties:295)
2020-03-27 08:20:55,300 INFO  - [main:] ~ Setting solr-wait-searcher
property 'true' (ApplicationProperties:301)
2020-03-27 08:20:55,300 INFO  - [main:] ~ Setting index.search.map-name
property 'false' (ApplicationProperties:305)
2020-03-27 08:20:55,301 INFO  - [main:] ~ Property (set to default)
atlas.graph.cache.db-cache = true (ApplicationProperties:318)
2020-03-27 08:20:55,301 INFO  - [main:] ~ Property (set to default)
atlas.graph.cache.db-cache-clean-wait = 20 (ApplicationProperties:318)
2020-03-27 08:20:55,301 INFO  - [main:] ~ Property (set to default)
atlas.graph.cache.db-cache-size = 0.5 (ApplicationProperties:318)
2020-03-27 08:20:55,301 INFO  - [main:] ~ Property (set to default)
atlas.graph.cache.tx-cache-size = 15000 (ApplicationProperties:318)
2020-03-27 08:20:55,301 INFO  - [main:] ~ Property (set to default)
atlas.graph.cache.tx-dirty-size = 120 (ApplicationProperties:318)
2020-03-27 08:20:55,316 INFO  - [main:] ~
########################################################################################
                               Atlas Server (STARTUP)

        project.name:   apache-atlas
        project.description:    Metadata Management and Data Governance
Platform over Hadoop
        build.user:     root
        build.epoch:    1585085591537
        project.version:        2.0.0
        build.version:  2.0.0
        vc.revision:    release
        vc.source.url:  scm:git:git://git.apache.org/atlas.git/atlas-webapp
########################################################################################
(Atlas:215)
2020-03-27 08:20:55,316 INFO  - [main:] ~ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(Atlas:216)
2020-03-27 08:20:55,316 INFO  - [main:] ~ Server starting with TLS ? false
on port 21000 (Atlas:217)
2020-03-27 08:20:55,316 INFO  - [main:] ~ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
(Atlas:218)
2020-03-27 08:20:55,961 INFO  - [main:] ~ No authentication method
configured.  Defaulting to simple authentication (LoginProcessor:102)
2020-03-27 08:20:56,078 WARN  - [main:] ~ Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
(NativeCodeLoader:60)
2020-03-27 08:20:56,100 INFO  - [main:] ~ Logged in user root (auth:SIMPLE)
(LoginProcessor:77)
2020-03-27 08:20:56,703 INFO  - [main:] ~ Not running setup per
configuration atlas.server.run.setup.on.start.
(SetupSteps$SetupRequired:189)
2020-03-27 08:20:58,679 WARN  - [main:] ~ Cannot locate configuration:
tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
(MetricsConfig:134)
2020-03-27 08:23:08,702 WARN  - [main:] ~ Unexpected exception during
getDeployment() (HBaseStoreManager:399)
java.lang.RuntimeException:
org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in
storage backend
        at
org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getDeployment(HBaseStoreManager.java:358)
        at
org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getFeatures(HBaseStoreManager.java:397)
        at
org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1256)
        at
org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160)
        at
org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131)
        at
org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:111)
        at
org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraphInstance(AtlasJanusGraphDatabase.java:165)
        at
org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraph(AtlasJanusGraphDatabase.java:263)
        at
org.apache.atlas.repository.graph.AtlasGraphProvider.getGraphInstance(AtlasGraphProvider.java:52)
        at
org.apache.atlas.repository.graph.AtlasGraphProvider.get(AtlasGraphProvider.java:98)
        at
org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$b936b499.CGLIB$get$1(<generated>)
        at
org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$b936b499$$FastClassBySpringCGLIB$$fd3f07c6.invoke(<generated>)
        at
org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
...
...
...
        at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
        at
org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:98)
        at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary
failure in storage backend
        at
org.janusgraph.diskstorage.hbase2.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:732)
        at
org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getLocalKeyPartition(HBaseStoreManager.java:518)
        at
org.janusgraph.diskstorage.hbase2.HBaseStoreManager.getDeployment(HBaseStoreManager.java:355)
        ... 92 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=16, exceptions:
Fri Mar 27 08:20:59 UTC 2020,
RpcRetryingCaller{globalStartTime=1585297258222, pause=100,
maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException:
org.apache.hadoop.hbase.PleaseHoldException: Ma
ster is initializing
...
...
...
```

The main bone of contention I think is this line `2020-03-27 08:20:58,679
WARN  - [main:] ~ Cannot locate configuration: tried
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
(MetricsConfig:134)` which I do not see when running with the local rootdir.
Please not that I am not running Hadoop/HDFS for now and do not intend to;
we will guarantee consistency to S3 in the future using other tech or move
to DynamoDB or something else.
Please also note that I restart the ZK, Hbase, Atlas instance for testing
numerous time during this setup, so the only persistence I have is the
HBase initializaion data in stored in S3.

However, the HBase master and regionserver logs themselves show no error at
all when running with S3 as the rootdir. I have also dropped in an hbase
shell and checked the corresponding HBase web interfaces to check the HBase
with S3 is working fine.

This makes me believe that there is an issue with Janusgraph <-> Hbase
interaction, which I am not sure how to debug. I have solf-linked the
`/atlas/conf/hbase` directory to the actual HBase conf dir at `/hbase/conf`.

Any pointers will be helpful here as I think I am going in blond over here.
:)

Best,
Krish

Reply via email to