Minor correction: The HBase jar files are on the classpath, just in a different order.
On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <ni...@basjes.nl> wrote: > I did some more digging. > > I added extra code to print both the environment variables and the > classpath that is used by the HBaseConfiguration to load the resource files. > I call this both locally and during startup of the job (i.e. these logs > arrive in the jobmanager.log on the cluster) > > Summary of that I found locally: > > Environment > 2017-10-24 08:50:15,612 INFO com.bol.bugreports.Main > - HADOOP_CONF_DIR = /etc/hadoop/conf/ > 2017-10-24 08:50:15,613 INFO com.bol.bugreports.Main > - HBASE_CONF_DIR = /etc/hbase/conf/ > 2017-10-24 08:50:15,613 INFO com.bol.bugreports.Main > - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf > 2017-10-24 08:50:15,613 INFO com.bol.bugreports.Main > - HIVE_CONF_DIR = /etc/hive/conf/ > 2017-10-24 08:50:15,613 INFO com.bol.bugreports.Main > - YARN_CONF_DIR = /etc/hadoop/conf/ > > ClassPath > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - --> HBaseConfiguration: URLClassLoader = > sun.misc.Launcher$AppClassLoader@1b6d3586 > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/flink-python_2.11-1.3.2.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/flink-shaded-hadoop2-uber-1.3.2.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/joda-time-2.9.1.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/log4j-1.2.17.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/slf4j-log4j12-1.7.7.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/local/flink-1.3.2/ > lib/flink-dist_2.11-1.3.2.jar > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/home/nbasjes/FlinkHBaseConnect/ > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/etc/hadoop/conf/ > 2017-10-24 08:50:15,614 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/etc/hadoop/conf/ > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/etc/hbase/conf/ > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0- > openjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/ > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/ > hbase/lib/activation-1.1.jar > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/ > hbase/lib/aopalliance-1.0.jar > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/ > hbase/lib/apacheds-i18n-2.0.0-M15.jar > 2017-10-24 08:50:15,615 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/ > hbase/lib/apacheds-kerberos-codec-2.0.0-M > ... > > > > On the cluster node in the jobmanager.log: > > ENVIRONMENT > 2017-10-24 10:50:19,971 INFO com.bol.bugreports.Main > - HADOOP_CONF_DIR = > /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf > 2017-10-24 10:50:19,971 INFO com.bol.bugreports.Main > - TEZ_CONF_DIR = /etc/tez/conf > 2017-10-24 10:50:19,973 INFO com.bol.bugreports.Main > - YARN_CONF_DIR = > /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf > 2017-10-24 10:50:19,973 INFO com.bol.bugreports.Main > - LOG_DIRS = > /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001 > 2017-10-24 10:50:19,973 INFO com.bol.bugreports.Main > - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn > 2017-10-24 10:50:19,974 INFO com.bol.bugreports.Main > - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop > 2017-10-24 10:50:19,975 INFO com.bol.bugreports.Main > - HDP_VERSION = 2.3.4.0-3485 > > And the classpath: > > 2017-10-24 10:50:19,977 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar > 2017-10-24 10:50:19,977 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar > 2017-10-24 10:50:19,977 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/ > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = file:/etc/hadoop/conf/ > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar > 2017-10-24 10:50:19,978 INFO com.bol.bugreports.Main > - ----> ClassPath = > file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar > > > So apparently everything HBase that was specified clientside is missing > when the task is running on my cluster. > > The thing is that when running for example a Pig script I get everything > perfectly fine on this cluster as it is configured right now. > Also the config 'shouldn't' (I think) need anything different because this > application only needs the HBase client (Jar, packaged into application) > and the HBase zookeeper settings (present on the machine where it is > started). > > Niels Basjes > > > > > > On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <pi...@data-artisans.com> > wrote: > >> Till do you have some idea what is going on? I do not see any meaningful >> difference between Niels code and HBaseWriteStreamExample.java. There is >> also a very similar issue on mailing list as well: “Flink can't read hdfs >> namenode logical url” >> >> Piotrek >> >> On 22 Oct 2017, at 12:56, Niels Basjes <ni...@basjes.nl> wrote: >> >> HI, >> >> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that >> contains the correct settings for hbase to find zookeeper. >> That is why adding that files as an additional resource to the >> configuration works. >> I have created a very simple project that reproduces the problem on my >> setup: >> https://github.com/nielsbasjes/FlinkHBaseConnectProblem >> >> Niels Basjes >> >> >> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com> >> wrote: >> >>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the >>> machines? If yes, could you share your code? >>> >>> On 20 Oct 2017, at 16:29, Niels Basjes <ni...@basjes.nl> wrote: >>> >>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually >>> looking in the jobmanager.log of the container running the Flink task. >>> That is where I was able to find these messages . >>> >>> I do the >>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba >>> se-site.xml")); >>> in all places directly after the HBaseConfiguration.create(); >>> That way I simply force the task to look on the actual Hadoop node for >>> the same file it already loaded locally. >>> >>> The reason I'm suspecting Flink is because the clientside part of the >>> Flink application does have the right setting and the task/job actually >>> running in the cluster does not have the same settings. >>> So it seems in the transition into the cluster the application does not >>> copy everything it has available locally for some reason. >>> >>> There is a very high probability I did something wrong, I'm just not >>> seeing it at this moment. >>> >>> Niels >>> >>> >>> >>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <pi...@data-artisans.com >>> > wrote: >>> >>>> Hi, >>>> >>>> What do you mean by saying: >>>> >>>> When I open the logfiles on the Hadoop cluster I see this: >>>> >>>> >>>> The error doesn’t come from Flink? Where do you execute >>>> >>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba >>>> se-site.xml")); >>>> >>>> ? >>>> >>>> To me it seems like it is a problem with misconfigured HBase and not >>>> something related to Flink. >>>> >>>> Piotrek >>>> >>>> On 20 Oct 2017, at 13:44, Niels Basjes <ni...@basjes.nl> wrote: >>>> >>>> To facilitate you guys helping me I put this test project on github: >>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem >>>> >>>> Niels Basjes >>>> >>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <ni...@basjes.nl> wrote: >>>> >>>>> Hi, >>>>> >>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn >>>>> cluster where I need to connect to HBase. >>>>> >>>>> What I have: >>>>> >>>>> In my environment: >>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/ >>>>> HBASE_CONF_DIR=/etc/hbase/conf/ >>>>> HIVE_CONF_DIR=/etc/hive/conf/ >>>>> YARN_CONF_DIR=/etc/hadoop/conf/ >>>>> >>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the >>>>> zookeeper hosts for HBase. >>>>> >>>>> My test code is this: >>>>> >>>>> public class Main { >>>>> private static final Logger LOG = LoggerFactory.getLogger(Main.class); >>>>> >>>>> public static void main(String[] args) throws Exception { >>>>> printZookeeperConfig(); >>>>> final StreamExecutionEnvironment env = >>>>> StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); >>>>> env.createInput(new HBaseSource()).print(); >>>>> env.execute("HBase config problem"); >>>>> } >>>>> >>>>> public static void printZookeeperConfig() { >>>>> String zookeeper = >>>>> HBaseConfiguration.create().get("hbase.zookeeper.quorum"); >>>>> LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", >>>>> zookeeper); >>>>> } >>>>> >>>>> public static class HBaseSource extends >>>>> AbstractTableInputFormat<String> { >>>>> @Override >>>>> public void configure(org.apache.flink.configuration.Configuration >>>>> parameters) { >>>>> table = createTable(); >>>>> if (table != null) { >>>>> scan = getScanner(); >>>>> } >>>>> } >>>>> >>>>> private HTable createTable() { >>>>> LOG.info("Initializing HBaseConfiguration"); >>>>> // Uses files found in the classpath >>>>> org.apache.hadoop.conf.Configuration hConf = >>>>> HBaseConfiguration.create(); >>>>> printZookeeperConfig(); >>>>> >>>>> try { >>>>> return new HTable(hConf, getTableName()); >>>>> } catch (Exception e) { >>>>> LOG.error("Error instantiating a new HTable instance", e); >>>>> } >>>>> return null; >>>>> } >>>>> >>>>> @Override >>>>> public String getTableName() { >>>>> return "bugs:flink"; >>>>> } >>>>> >>>>> @Override >>>>> protected String mapResultToOutType(Result result) { >>>>> return new >>>>> String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8))); >>>>> } >>>>> >>>>> @Override >>>>> protected Scan getScanner() { >>>>> return new Scan(); >>>>> } >>>>> } >>>>> >>>>> } >>>>> >>>>> >>>>> I run this application with this command on my Yarn cluster (note: >>>>> first starting a yarn-cluster and then submitting the job yields the same >>>>> result). >>>>> >>>>> flink \ >>>>> run \ >>>>> -m yarn-cluster \ >>>>> --yarncontainer 1 \ >>>>> --yarnname "Flink on Yarn HBase problem" \ >>>>> --yarnslots 1 \ >>>>> --yarnjobManagerMemory 4000 \ >>>>> --yarntaskManagerMemory 4000 \ >>>>> --yarnstreaming \ >>>>> target/flink-hbase-connect-1.0-SNAPSHOT.jar >>>>> >>>>> Now in the client side logfile >>>>> /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see >>>>> >>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and >>>>> at the end. >>>>> >>>>> 2) The zookeeper settings of my experimental environent have been picked >>>>> up by the software >>>>> >>>>> 2017-10-20 11:17:23,973 INFO com.bol.bugreports.Main >>>>> - ----> Loading HBaseConfiguration: Zookeeper = >>>>> node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181 >>>>> >>>>> >>>>> When I open the logfiles on the Hadoop cluster I see this: >>>>> >>>>> 2017-10-20 13:17:33,250 INFO com.bol.bugreports.Main >>>>> - ----> Loading HBaseConfiguration: Zookeeper = >>>>> *localhost* >>>>> >>>>> >>>>> and as a consequence >>>>> >>>>> 2017-10-20 13:17:33,368 INFO org.apache.zookeeper.ClientCnxn >>>>> - Opening socket connection to server >>>>> localhost.localdomain/127.0.0.1:2181 >>>>> >>>>> 2017-10-20 13:17:33,369 WARN org.apache.zookeeper.ClientCnxn >>>>> - Session 0x0 for server null, unexpected error, >>>>> closing socket connection and attempting reconnect >>>>> >>>>> java.net.ConnectException: Connection refused >>>>> >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>>>> >>>>> at >>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) >>>>> >>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) >>>>> >>>>> 2017-10-20 13:17:33,475 WARN >>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly >>>>> transient ZooKeeper, quorum=localhost:2181, >>>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid >>>>> >>>>> >>>>> >>>>> The value 'localhost:2181' has been defined within the HBase jar in >>>>> the hbase-default.xml as the default value for the zookeeper nodes. >>>>> >>>>> As a workaround I currently put this extra line in my code which I >>>>> know is nasty but "works on my cluster" >>>>> >>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml")); >>>>> >>>>> >>>>> What am I doing wrong? >>>>> >>>>> What is the right way to fix this? >>>>> >>>>> -- >>>>> Best regards / Met vriendelijke groeten, >>>>> >>>>> Niels Basjes >>>>> >>>> >>>> >>>> >>>> -- >>>> Best regards / Met vriendelijke groeten, >>>> >>>> Niels Basjes >>>> >>>> >>>> >>> >>> >>> -- >>> Best regards / Met vriendelijke groeten, >>> >>> Niels Basjes >>> >>> >>> >> >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes >> >> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes