You can add

set -x

in start-server.sh, and it will show you what the script is trying to do.

-Eric

On Wed, Jul 18, 2012 at 4:12 PM, Cardon, Tejay E
<[email protected]> wrote:
> Eric,
> Good to know about the tracers.  I setup 4 tracers for an 8 node setup, but 
> I'll back that down to just the 1.  As for the .out or .err files on 
> tservers, I've got nothing.  There is no evidence that those servers were 
> ever touched.  I'm thinking the next step would be to execute the 
> start-here.sh script on each tserver and look for errors.  Is that the best 
> approach, and if so, what arguments should I pass?  I'm digging into the 
> start-all.sh script for answers, but if someone already knows what my 
> arguments are.... all the better.
>
> Thanks,
> Tejay
>
> -----Original Message-----
> From: Eric Newton [mailto:[email protected]]
> Sent: Wednesday, July 18, 2012 2:00 PM
> To: [email protected]
> Subject: EXTERNAL: Re: There are no tablet servers
>
> Don't start a tracer on every server.  Just start one on a master server.  
> You won't need more than 1 until you get several hundred servers.
>
> Do you have anything in the .out or .err files on the tserver hosts?
> If the files don't exist, something is failing in the ssh to those hosts.
>
> -Eric
>
> On Wed, Jul 18, 2012 at 2:15 PM, Cardon, Tejay E <[email protected]> 
> wrote:
>> All,
>>
>> I'm running into a strange challenge in my latest Accumulo installation.
>> I've developed some chef recipes for deploying Accumulo, and have
>> tested them on three clusters now with no problems.  Using the same
>> scripts, I recent did another deployment, but I'm having trouble on this one.
>>
>>
>>
>> After installing Accumulo, updating the config files, and setting up
>> passwordless ssh, I ran:
>>
>> ./accumulo init
>>
>>
>>
>> Everything went normally with me setting the instanceID and password
>>
>>
>>
>> Then I ran
>>
>> ./start-all.sh
>>
>>
>> Again, everything went smoothly with the following output:
>>
>> bash-3.2$ ./start-all.sh
>>
>> Starting tablet servers and loggers ....... done
>>
>> Starting tablet server on de8-9a-8f-83-be-52
>>
>> Starting logger on de8-9a-8f-83-be-52
>>
>> Starting tablet server on d04-7d-7b-06-5e-48
>>
>> Starting logger on de8-9a-8f-d3-3e-f8
>>
>> Starting tablet server on d04-7d-7b-06-5d-f4
>>
>> Starting logger on d04-7d-7b-06-5e-48
>>
>> Starting logger on d04-7d-7b-06-5d-f4
>>
>> Starting tablet server on de8-9a-8f-d3-3e-f8
>>
>> 18 12:48:50,970 [server.Accumulo] INFO : Attempting to talk to
>> zookeeper
>>
>> 18 12:48:51,182 [server.Accumulo] INFO : Zookeeper connected and
>> initialized, attemping to talk to HDFS
>>
>> 18 12:48:51,568 [server.Accumulo] INFO : Connected to HDFS
>>
>> Starting master on d04-7d-7b-06-5d-80
>>
>> Starting garbage collector on d04-7d-7b-06-5e-ba
>>
>> Starting monitor on d04-7d-7b-06-5e-ba
>>
>> Starting tracer on d04-7d-7b-06-5d-80
>>
>> Starting tracer on de8-9a-8f-d3-3e-f8
>>
>> Starting tracer on d04-7d-7b-06-5e-48
>>
>>
>>
>> I can also run a stop-all.sh with no complaints from the script.
>>
>>
>>
>> However, if I try to start the Accumulo shell, I get
>>
>>
>>
>> bash-3.2$ ./accumulo shell
>>
>> Enter current password for 'hdfs'@'test4': ******
>>
>> 18 13:00:17,906 [impl.ServerClient] WARN : There are no tablet servers:
>> check that zookeeper and accumulo are running.
>>
>>
>>
>> If I check the tablet server machines I find that they do not have any
>> Accumulo processes running, and the master does not have any tablet
>> server logs.  (it does have the tracer logs, however).
>>
>>
>>
>> I've attached the log files here (without the empty ones).  There is
>> an error trying to "clean up old log sort" and a thrift error.
>>
>> I'm at a loss for where to begin on the debugging for this.  Any
>> thoughts would be greatly appreciated.
>>
>>
>>
>>
>>
>> 18 12:48:54,100 [master.CoordinateRecoveryTask] ERROR: Error cleaning
>> up old Log Sort jobsjava.io.IOException: Call to /10.1.24.65:50030
>> failed on local
>> exception: java.io.EOFException
>>
>>
>>
>> 18 12:48:57,016 [impl.ServerClient] DEBUG: ClientService request
>> failed null, retrying ...
>>
>> org.apache.thrift.transport.TTransportException: Failed to connect to
>> a server
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTranspo
>> rt(ThriftTransportPool.java:437)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server
>> Client.java:145)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server
>> Client.java:123)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerCli
>> ent.java:105)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient
>> .java:71)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ConnectorImpl.<init>(ConnectorImp
>> l.java:75)
>>
>>                 at
>> org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZoo
>> Instance.java:145)
>>
>>                 at
>> org.apache.accumulo.server.trace.TraceServer.<init>(TraceServer.java:1
>> 52)
>>
>>                 at
>> org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:222
>> )
>>
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>
>>                 at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> ava:39)
>>
>>                 at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> orImpl.java:25)
>>
>>                 at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>                 at org.apache.accumulo.start.Main$1.run(Main.java:89)
>>
>>                 at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>

Reply via email to