Did you set the names in the slaves, master, etc. files to the server name from localhost?
On Sat, Apr 11, 2015 at 7:47 PM Ryan <[email protected]> wrote: > Sorry to bring back an old thread but I'm working with Accumulo at a > hackathon and am running into this same issue with the being unable to > connect to zookeeper from the local machine (the program hangs at > inst.getConnector). > > Dave, what did you fix in Hadoop to get it to work? I changed the 5 > mentioned conf files to my server's domain name, deleted the hdfs accumulo > directory, and reinstalled using accumulo init. With the domain name there, > I'm now unable to even start Accumulo. > > Josh, I recall you helped me with this a little while back on RHEL. I've > been pouring through my notes but have yet to find a solution. > > Any help would be greatly appreciated. > > Thanks! > Ryan > > On Tue, Mar 3, 2015 at 5:21 PM, Josh Elser <[email protected]> wrote: > >> Excellent! Happy to hear it. >> >> Simple problem, but multiple places to fix it in :) >> >> David Patterson wrote: >> >>> Josh, I just wanted to close the loop on this problem. I redid the >>> installation making sure there were no references to localhost or >>> 127.0.0.1. There was a problem in Hadoop that I was able to solve with >>> the help of the Hadoop user group. >>> >>> The combo of no localhosts and the correct hadoop configuration and >>> initialization has worked. >>> >>> I am now able to run code from my Windows machine in Eclipse that >>> references the Accumulo store in my cloud machine and get the correct >>> answers back. >>> >>> Thank you for your help. >>> >>> Dave Patterson >>> >>> On Thu, Feb 19, 2015 at 4:29 PM, Josh Elser <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Ah! There's the rub. >>> >>> > At this point, I see that the ThriftTransportKey has a host name: >>> > "localhost" and a port of "9997". >>> >>> Double check your configuration files: >>> $ACCUMULO_CONF_DIR/{masters,__monitor,slaves,gc,tracers} >>> >>> These files control what network interface your Accumulo processes >>> bind on. Because they only bound to localhost, your application >>> worked when run on the same machine, but not on any remote machine. >>> >>> Typically, you want to put the FQDN in these files. >>> >>> David Patterson wrote: >>> >>> Josh and anyone else interested, >>> >>> More data on this problem. >>> >>> I have tried debugging the code in Eclipse (running it on my >>> Windows >>> machine). The ZooKeeperInstance is working fine in this remote >>> mode. I >>> can query the instance, and get the instanceID, instance Name, >>> zookeepers string, and session timeout. >>> >>> I've also tried creating a ZooCache and a UUID object with the >>> long >>> string value of my actual instance identification. If I do >>> String instanceName = ZooKeeperInstance.__lookupInstance( >>> zooCache uuid); >>> It is able to return the string name of the instance. So, that >>> part of >>> the communication seems to be fine. >>> >>> The hang-up is still coming on the instance.getConnector( >>> username, new >>> PasswordToken( password)); >>> >>> It hangs, and when I ran my code in debug mode on Eclipse, I >>> interrupted >>> it while it was doing nothing. >>> >>> I see a long string of calls going from >>> ZooKeeperInstance.getConnector >>> to ConnectorImpl constructor >>> to ServerClient.execute >>> to ServerClient.executeRaw >>> to ServerClient.getConnection(__Instance) >>> to ServerClient.getConnection(__Instance, boolean) >>> to ServerClient.getConnection(__Instance, boolean, long) >>> to >>> ThriftTransportPool.__getAnyTransport(List<__ThriftTransportKey>, >>> boolean) >>> >>> At this point, I see that the ThriftTransportKey has a host name: >>> "localhost" and a port of "9997". >>> >>> From there, it goes to ThriftUtil.__createClientTransport, >>> TTimeoutTransport.create(__HostAndPort), >>> TTimeoutTransport(__SocketAddress, >>> long), >>> SocketAdapter.connect(__SocketAddress), >>> SocketAdapter.connect(__SocketAddress, int), >>> SocketChannelImpl.connect( >>> SocketAddress), >>> Net.connect(FileDescriptor, InetAddress,int), >>> Net.connect(ProtocolFamily,__FileDescriptor, InetAddress, int) >>> and finally >>> Net.connect0(boolean, FileDescriptor, InetAddress, int) >>> >>> I guess I don't understand why this is going into Thrift code. >>> >>> Is there some authorization I need to provide to let me do a >>> remote >>> connection into Accumulo (Zookeeper seems happy to work, but is >>> Accumulo >>> stopping me?)? >>> >>> If anyone wants line numbers, etc. I can supply more info. >>> >>> Dave Patterson >>> >>> On Wed, Feb 18, 2015 at 10:20 AM, Josh Elser >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>> > a) a copy of Zookeeper running on the machine from which I'm >>> calling for data >>> > b) call the "local" zookeeper for data and let it connect to >>> the >>> remote node for the data? >>> >>> No, a ZooKeeper server does not have to be machine local >>> for you to >>> use it. It just has to be reachable on the network. >>> >>> I'm sorry to say, I kind of at a loss. I'm not sure what >>> you are >>> running into. You could try remote debugging your >>> application on the >>> "other" cloud machine to see how exactly your code is converting >>> the >>> instance name into the instanceID (and confirm that the >>> value in the >>> TCredentials object is, in fact, different than what you >>> expect it >>> to be). >>> >>> As for your local windows machine, I know some people have >>> connected >>> to Accumulo from Windows before, but it is a YMMV platform. >>> Hopefully it works just fine because it's Java under the >>> hood, but >>> we have no tests to guarantee that this does work. >>> >>> David Patterson wrote: >>> >>> Josh, thanks for your help. >>> 1) Running on the machine that has the >>> accumulo/hadoop/zookeeper >>> code, >>> in the accumulo shell for the user name "dave" I see >>> the UUID for my >>> instance. >>> 2) Running on the "other" machine, launching the >>> zookeeper client, >>> pointing to the ip address of the server and issuing >>> the get >>> /accumulo/instance/{my-____instance-name}, I see the >>> >>> same UUID for the >>> >>> instance. >>> 3) Running on the "other" machine, when I run my java >>> code to >>> connect to >>> the remote machine with the proper instance name, >>> userid and >>> password, I >>> get the INVALID_INSTANCEID as described in detail above. >>> 4) Running on my normal machine (Windows) running >>> eclipse where I've >>> developed the code, if I run the code as a Java >>> Application, it >>> hangs. >>> 5) Running on my windows machine, if I debug the >>> application, I can >>> interrupt it when it hangs up and it is waiting on the >>> line with >>> Connector connector = instance.getConnector( >>> acUserName, new >>> PasswordToken( acPassword)); >>> >>> Can my application create a connector to a remote >>> machine's >>> ZookeeperInstance and reference it from "afar"? Do I >>> have to have: >>> a) a copy of Zookeeper running on the machine from >>> which I'm >>> calling >>> for data >>> b) call the "local" zookeeper for data and let it >>> connect to the >>> remote >>> node for the data? >>> >>> The code I'm writing receives a row identifier as a >>> String >>> parameter, >>> creates a Scanner, sets the range to a single row (same >>> value >>> for both >>> ends of the range) and iterates over the (one and only) >>> row. >>> >>> I'm using Accumulo 1.6.1, Hadoop 2.6.0, and zookeeper >>> 3.4.6, Java 7 >>> (Oracle). The two cloud machines are running Ubuntu >>> 14.04. >>> >>> Thanks. >>> >>> Dave >>> >>> >>> >>> >>> On Tue, Feb 17, 2015 at 5:24 PM, Josh Elser >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> <mailto:[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>>__> >>> >>> wrote: >>> >>> Oops, sorry. I used '>' to denote the shell >>> prompt. The >>> bits below >>> where it converted them to a quote is just meant >>> to denote >>> commands >>> that are run inside the zkCli :) >>> >>> >>> Josh Elser wrote: >>> >>> If you're using the same exact code on both >>> machines, >>> it sounds >>> like you >>> might have something unexpected going on with >>> your >>> networking. >>> >>> Accumulo can share ZooKeeper and HDFS >>> instances -- it >>> uses the >>> notion of >>> an InstanceID to do this. The InstanceID is a >>> UUID >>> assigned to an >>> Accumulo instance during `accumulo init`. >>> Because a >>> UUID is hard to >>> memorize, and you need to identify the Accumulo >>> instance you want to >>> connect to in the client API, there is also a >>> mapping >>> of some >>> 'easy-to-remember' name to that UUID. For example >>> 'daves_accumulo' maps >>> to '12345678-1234-1234-______123456789012'. >>> >>> The error you're seeing is because the UUID >>> your client >>> found >>> from the >>> `instanceName` is different than the >>> instanceID the >>> Accumulo >>> server has. >>> A quick sanity check is to look at ZooKeeper: >>> >>> zkCli.sh -server your_zk_host:2181 >>> >>> get >>> /accumulo/instances/your_______instance_name >>> >>> >>> >>> Compare the value of that node (first line of >>> output) >>> with the >>> instance >>> ID displayed on the Accumulo monitor (top of >>> the page). >>> They >>> should be >>> the same. >>> >>> I don't think I've ever seen this personally, >>> so I'm >>> not sure >>> what to >>> guess at how it happened. It's possible you >>> might have >>> networking messed >>> up and are talking to a different ZooKeeper >>> than you >>> think you are >>> (common problem if you have misconfigured a >>> quorum and >>> each ZK >>> node is >>> acting independent instead of together). A >>> quick fix >>> would be to >>> change >>> the node in ZK to the correct instance ID. >>> >>> zkCli.sh -server your_zk_host:2181 >>> >>> delete >>> /accumulo/instances/your_______instance_name >>> create >>> /accumulo/instances/your_______instance_name >>> instance_id_from_monitor >>> >>> >>> If that doesn't help, please give us some more >>> information (versions >>> you're using, how you set up the system, >>> anything >>> special you did). >>> >>> David Patterson wrote: >>> >>> I'm running a very simple test >>> configuration with >>> on Ubuntu 14 >>> machine. If I run code on that machine I >>> can read >>> the data >>> I've added. >>> >>> I'm only using column family name, >>> (empty_text for the >>> qualifier) and >>> a value -- no authorizations. >>> >>> When I run the exact same program >>> (identical jar) >>> on another >>> Ubuntu 14 >>> machine, I get >>> >>> >>> >>> org.apache.accumulo.core.______client.______ >>> AccumuloSecurityException: >>> Error >>> INVALID_INSTANCEID for user dave - Unknown >>> security >>> exception >>> at >>> >>> >>> org.apache.accumulo.core.______client.impl.ServerClient.____ >>> __execute(ServerClient.java:63) >>> >>> at >>> >>> >>> org.apache.accumulo.core.______client.impl.ConnectorImpl.<__ >>> ____init>(ConnectorImpl.java:70) >>> >>> at >>> >>> >>> org.apache.accumulo.core.______client.ZooKeeperInstance.____ >>> __getConnector(______ZooKeeperInstance.java:240) >>> >>> at >>> >>> com.iai.diad.data.ImageDAO_A.<______init>(ImageDAO_A.java:123) >>> at >>> com.iai.diad.data.ImageDAO_A._ >>> _____main(ImageDAO_A.java:63) >>> Caused by: >>> ThriftSecurityException(user:______dave, >>> code:INVALID_INSTANCEID) >>> >>> The error occurs on the >>> instance.getConnector call (the >>> second line >>> below) >>> >>> instance = new >>> ZooKeeperInstance(______instanceName, >>> >>> >>> zooServers); >>> connector = instance.getConnector( >>> acUserName, new >>> PasswordToken( >>> acPassword)); >>> >>> One possible source for strangeness is >>> that both of >>> these >>> machines are >>> on a cloud server. Each of them has 2 ip >>> addresses >>> -- one >>> that is >>> available from the outside, and one that is >>> available only >>> inside the >>> cloud. I'm using the outside-the-cloud ip >>> address >>> in the >>> zooServers >>> string. >>> >>> The /etc/hosts file on the machine with the >>> Accumulo data >>> has the >>> external ip address as the name of the >>> machine. It >>> also has >>> 127.0.0.1 >>> defined as localhost. >>> >>> Any suggestions? >>> >>> Dave Patterson >>> >>> >>> >>> >>> >
