Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar with
that term.  A better question would be how can one troubleshoot such a
thing?

btw
I am the sole user on this cluster.

On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <josh.el...@gmail.com> wrote:

> Ok, this record:
>
> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>      LISTEN
>
> Means that your is listening on the correct port on all interfaces.
> There shouldn't be issues connecting to the tserver. This is also
> confirmed by the fact that you authenticated and got a Connector (this
> does an RPC to the tserver).
>
> So, your tserver is up, and your client can communicate with it. The
> real question is why is the scan hanging. Perhaps jstack'ing the
> tserver when your client is blocked waiting for results.
>
> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <threadedb...@gmail.com>
> wrote:
> > "...it's when
> > you make a Connector, and your client will talk to a tabletserver to
> > authenticate, that your program should hang. It would be good to
> > verify that."
> >
> >
> > My program should hang?  Would you expand?  That is exactly what it is
> > doing.  I am able to get a connector.  But when I try to iterate the
> result
> > of a scan, that's when it hangs.
> >
> >
> >
> >
> > Here's what comes from netstat:
> >
> >
> > $ netstat -na | grep 9997
> >
> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
> > LISTEN
> >
> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
> > FIN_WAIT2
> >
> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
> > TIME_WAIT
> >
> >
> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <josh.el...@gmail.com>
> wrote:
> >>
> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
> >> tserver? Assuming you haven't altered tserv.port.client in
> >> accumulo-site.xml, we want the line for port 9997.
> >>
> >> From my laptop running a tserver on localhost:
> >>
> >> $ netstat -na | grep 9997
> >> tcp4       0      0  127.0.0.1.9997         *.*
> LISTEN
> >>
> >> Depending on the tool you use, you can grep out the pid of the tserver
> >> or just that port itself.
> >>
> >> Just so you know, ZK binds to all available interfaces when it starts,
> >> so it should work seamlessly with localhost or the FQDN for the host.
> >> As such, it shouldn't matter what you provide to the
> >> ZooKeeperInstance. That should connect in all cases for you, it's when
> >> you make a Connector, and your client will talk to a tabletserver to
> >> authenticate, that your program should hang. It would be good to
> >> verify that.
> >>
> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
> threadedb...@gmail.com>
> >> wrote:
> >> > All,
> >> >
> >> > Thanks for the responses.
> >> >
> >> > Is this a problem for Accumulo?
> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP
> in
> >> > reverse followed by their domain name, as opposed to my FQDN, which
> what
> >> > I
> >> > use in my config files.
> >> >
> >> > Running Accumulo 1.5.1
> >> > I have only one interface.
> >> > I have the FQDN in both master and slaves files for both Hadoop and
> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
> are
> >> > referenced.
> >> > Also, I am passing in all Zk FQDN when I instantiate
> ZookeeperInstance.
> >> > Forward DNS works
> >> > Reverse DNS... well (See above).
> >> >
> >> >
> >> >
> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <afu...@apache.org>
> wrote:
> >> >>
> >> >> Accumulo tservers typically listen on a single interface. If you
> have a
> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
> >> >> have a
> >> >> problem in which the tablet servers are not listening on externally
> >> >> reachable interfaces. Tablet servers will list the interfaces that
> they
> >> >> are
> >> >> listening to when they boot, and you can also use tools like lsof to
> >> >> find
> >> >> them.
> >> >>
> >> >> If that is indeed the problem, then you might just need to change you
> >> >> conf/slaves file to use <hostname> instead of localhost, and then
> >> >> restart.
> >> >>
> >> >> Adam
> >> >>
> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <threadedb...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>> I have been happily working with Acc, but today things changed.  No
> >> >>> errors
> >> >>>
> >> >>> Until now I ran everything server side, which meant the URL was
> >> >>> localhost:2181, and life was good.  Today tried running some of the
> >> >>> same
> >> >>> code as a remote client, which means <host name>:2181.  Things hang
> >> >>> when
> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
> >> >>> iterate
> >> >>> through a Map.
> >> >>>
> >> >>> Let's focus on the scan part:
> >> >>>
> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
> >> >>> hangs.
> >> >>> for(Entry<Key,Value> entry : scan) {
> >> >>> def row = entry.getKey().getRow();
> >> >>> def value = entry.getValue();
> >> >>> println "value=" + value;
> >> >>> }
> >> >>>
> >> >>> This is what appears in the console :
> >> >>>
> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> <and on and on>
> >> >>>
> >> >>>
> >> >>>
> >> >>> The only difference between success and a hang is a URL change, and
> of
> >> >>> course being remote.
> >> >>>
> >> >>> I don't believe this is a firewall issue.  I shutdown the firewall.
> >> >>>
> >> >>> Am I missing something?
> >> >>>
> >> >>> Thanks all.
> >> >>>
> >> >>> --
> >> >>> There are ways and there are ways,
> >> >>>
> >> >>> Geoffry Roberts
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > There are ways and there are ways,
> >> >
> >> > Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>



-- 
There are ways and there are ways,

Geoffry Roberts

Reply via email to