Minor correction: The HBase jar files are on the classpath, just in a
different order.

On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <ni...@basjes.nl> wrote:

> I did some more digging.
>
> I added extra code to print both the environment variables and the
> classpath that is used by the HBaseConfiguration to load the resource files.
> I call this both locally and during startup of the job (i.e. these logs
> arrive in the jobmanager.log on the cluster)
>
> Summary of that I found locally:
>
> Environment
> 2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
>                  - HADOOP_CONF_DIR = /etc/hadoop/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - HBASE_CONF_DIR = /etc/hbase/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - HIVE_CONF_DIR = /etc/hive/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - YARN_CONF_DIR = /etc/hadoop/conf/
>
> ClassPath
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - --> HBaseConfiguration: URLClassLoader =
> sun.misc.Launcher$AppClassLoader@1b6d3586
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-python_2.11-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-shaded-hadoop2-uber-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/joda-time-2.9.1.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/log4j-1.2.17.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/slf4j-log4j12-1.7.7.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-dist_2.11-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/home/nbasjes/FlinkHBaseConnect/
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hbase/conf/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0-
> openjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/activation-1.1.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/aopalliance-1.0.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/apacheds-i18n-2.0.0-M15.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/apacheds-kerberos-codec-2.0.0-M
> ...
>
>
>
> On the cluster node in the jobmanager.log:
>
> ENVIRONMENT
> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                         
>               - HADOOP_CONF_DIR = 
> /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                         
>               - TEZ_CONF_DIR = /etc/tez/conf
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                         
>               - YARN_CONF_DIR = 
> /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                         
>               - LOG_DIRS = 
> /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                         
>               - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn
> 2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main                         
>               - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
> 2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main                         
>               - HDP_VERSION = 2.3.4.0-3485
>
> And the classpath:
>
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                         
>               - ----> ClassPath = 
> file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar
>
>
> So apparently everything HBase that was specified clientside is missing
> when the task is running on my cluster.
>
> The thing is that when running for example a Pig script I get everything
> perfectly fine on this cluster as it is configured right now.
> Also the config 'shouldn't' (I think) need anything different because this
> application only needs the HBase client (Jar, packaged into application)
> and the HBase zookeeper settings (present on the machine where it is
> started).
>
> Niels Basjes
>
>
>
>
>
> On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <pi...@data-artisans.com>
> wrote:
>
>> Till do you have some idea what is going on? I do not see any meaningful
>> difference between Niels code and HBaseWriteStreamExample.java. There is
>> also a very similar issue on mailing list as well: “Flink can't read hdfs
>> namenode logical url”
>>
>> Piotrek
>>
>> On 22 Oct 2017, at 12:56, Niels Basjes <ni...@basjes.nl> wrote:
>>
>> HI,
>>
>> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
>> contains the correct settings for hbase to find zookeeper.
>> That is why adding that files as an additional resource to the
>> configuration works.
>> I have created a very simple project that reproduces the problem on my
>> setup:
>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>
>> Niels Basjes
>>
>>
>> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com>
>> wrote:
>>
>>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>>> machines? If yes, could you share your code?
>>>
>>> On 20 Oct 2017, at 16:29, Niels Basjes <ni...@basjes.nl> wrote:
>>>
>>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
>>> looking in the jobmanager.log of the container running the Flink task.
>>> That is where I was able to find these messages .
>>>
>>> I do the
>>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>> se-site.xml"));
>>> in all places directly after the  HBaseConfiguration.create();
>>> That way I simply force the task to look on the actual Hadoop node for
>>> the same file it already loaded locally.
>>>
>>> The reason I'm suspecting Flink is because the clientside part of the
>>> Flink application does have the right setting and the task/job actually
>>> running in the cluster does not have the same settings.
>>> So it seems in the transition into the cluster the application does not
>>> copy everything it has available locally for some reason.
>>>
>>> There is a very high probability I did something wrong, I'm just not
>>> seeing it at this moment.
>>>
>>> Niels
>>>
>>>
>>>
>>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <pi...@data-artisans.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> What do you mean by saying:
>>>>
>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>
>>>>
>>>> The error doesn’t come from Flink? Where do you execute
>>>>
>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>> se-site.xml"));
>>>>
>>>> ?
>>>>
>>>> To me it seems like it is a problem with misconfigured HBase and not
>>>> something related to Flink.
>>>>
>>>> Piotrek
>>>>
>>>> On 20 Oct 2017, at 13:44, Niels Basjes <ni...@basjes.nl> wrote:
>>>>
>>>> To facilitate you guys helping me I put this test project on github:
>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>
>>>> Niels Basjes
>>>>
>>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>>>>> cluster where I need to connect to HBase.
>>>>>
>>>>> What I have:
>>>>>
>>>>> In my environment:
>>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>>
>>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>>> zookeeper hosts for HBase.
>>>>>
>>>>> My test code is this:
>>>>>
>>>>> public class Main {
>>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>>
>>>>>   public static void main(String[] args) throws Exception {
>>>>>     printZookeeperConfig();
>>>>>     final StreamExecutionEnvironment env = 
>>>>> StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>>     env.createInput(new HBaseSource()).print();
>>>>>     env.execute("HBase config problem");
>>>>>   }
>>>>>
>>>>>   public static void printZookeeperConfig() {
>>>>>     String zookeeper = 
>>>>> HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", 
>>>>> zookeeper);
>>>>>   }
>>>>>
>>>>>   public static class HBaseSource extends 
>>>>> AbstractTableInputFormat<String> {
>>>>>     @Override
>>>>>     public void configure(org.apache.flink.configuration.Configuration 
>>>>> parameters) {
>>>>>       table = createTable();
>>>>>       if (table != null) {
>>>>>         scan = getScanner();
>>>>>       }
>>>>>     }
>>>>>
>>>>>     private HTable createTable() {
>>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>>       // Uses files found in the classpath
>>>>>       org.apache.hadoop.conf.Configuration hConf = 
>>>>> HBaseConfiguration.create();
>>>>>       printZookeeperConfig();
>>>>>
>>>>>       try {
>>>>>         return new HTable(hConf, getTableName());
>>>>>       } catch (Exception e) {
>>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>>       }
>>>>>       return null;
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     public String getTableName() {
>>>>>       return "bugs:flink";
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     protected String mapResultToOutType(Result result) {
>>>>>       return new 
>>>>> String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     protected Scan getScanner() {
>>>>>       return new Scan();
>>>>>     }
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> I run this application with this command on my Yarn cluster (note:
>>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>>> result).
>>>>>
>>>>> flink \
>>>>>     run \
>>>>>     -m yarn-cluster \
>>>>>     --yarncontainer 1 \
>>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>>     --yarnslots                     1     \
>>>>>     --yarnjobManagerMemory          4000  \
>>>>>     --yarntaskManagerMemory         4000  \
>>>>>     --yarnstreaming                       \
>>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>>
>>>>> Now in the client side logfile 
>>>>> /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>>
>>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and 
>>>>> at the end.
>>>>>
>>>>> 2) The zookeeper settings of my experimental environent have been picked 
>>>>> up by the software
>>>>>
>>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                     
>>>>>                   - ----> Loading HBaseConfiguration: Zookeeper = 
>>>>> node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>>
>>>>>
>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>
>>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                     
>>>>>                   - ----> Loading HBaseConfiguration: Zookeeper = 
>>>>> *localhost*
>>>>>
>>>>>
>>>>> and as a consequence
>>>>>
>>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn             
>>>>>                   - Opening socket connection to server 
>>>>> localhost.localdomain/127.0.0.1:2181
>>>>>
>>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn             
>>>>>                   - Session 0x0 for server null, unexpected error, 
>>>>> closing socket connection and attempting reconnect
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>
>>>>>   at 
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>>
>>>>>   at 
>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>
>>>>>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>>
>>>>> 2017-10-20 13:17:33,475 WARN  
>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly 
>>>>> transient ZooKeeper, quorum=localhost:2181, 
>>>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>>
>>>>>
>>>>>
>>>>> The value 'localhost:2181' has been defined within the HBase jar in
>>>>> the hbase-default.xml as the default value for the zookeeper nodes.
>>>>>
>>>>> As a workaround I currently put this extra line in my code which I
>>>>> know is nasty but "works on my cluster"
>>>>>
>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>>
>>>>>
>>>>> What am I doing wrong?
>>>>>
>>>>> What is the right way to fix this?
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to