Re: Workers not running

Alex Coventry Wed, 15 Jan 2014 13:00:06 -0800

This guy<http://lists.zeromq.org/pipermail/zeromq-dev/2013-December/024048.html>
is
reporting a very similar segfault.  I will try the advice there.



On Wed, Jan 15, 2014 at 3:48 PM, Alex Coventry <[email protected]> wrote:

> Thanks, this is great.  If I do that super-quickly, I get a segfault in
> libzmq.so, specifically in zmq::socket_base_t::process_term.  (If I'm slow,
> I get "No such file or directory" errors because the worker's trying to
> touch files in subdirs of /mnt/storm which no longer exist.)
>
> This is a pretty horrible result, but it's progress.
>
> Best regards,
> Alex
>
>
> On Wed, Jan 15, 2014 at 3:21 PM, Jon Logan <[email protected]> wrote:
>
>> These issues are annoying to debug. My best solution has been to look in
>> the supervisor log, it'll print out a long command that it used to launch
>> (java -jar .....). Log into that machine, sudo as the storm user, and paste
>> that command and try to execute it yourself. It tends to be a problem
>> launching the worker, and this usually will display the error message. It's
>> a hack, but it usually works.
>>
>>
>> On Wed, Jan 15, 2014 at 3:18 PM, Alex Coventry <[email protected]>wrote:
>>
>>> Thanks, Jon.  Yes, I have topology.debug set to true in the config map
>>> passed to StormSubmitter/SumbitTopology, and I've changed all the "INFO"
>>> settings to "DEBUG" in logback/cluster.xml.
>>>
>>> I think you might be onto something with checking whether the JVMs are
>>> running.  I noticed in the supervisor log: "Error when trying to kill 8060.
>>> Process is probably already dead."  Nathan Marz says in this 
>>> thread<https://groups.google.com/forum/#!topic/storm-user/UxbMLIfEV1k> that
>>> it suggests native dependencies are not installed correctly.  I'm looking
>>> for ways to test whether this is the case, now.  Any suggestions are
>>> welcome.  Is there a way to kick off a worker thread in the storm repl, to
>>> get some more info about how it's failing?
>>>
>>> Best regards,
>>> Alex
>>>
>>> On Wed, Jan 15, 2014 at 3:04 PM, Jon Logan <[email protected]> wrote:
>>>
>>>> I haven't looked at your logs, but just to clarify, the message you
>>>> describe is probably a debug message. Debug messages say things like the
>>>> emitting and receiving of tuples. I'm not sure if it's enabled by default
>>>> in local mode, but if you want those messages in normal mode, you need to
>>>> set topology.debug: true in the storm.yaml configuration file.
>>>>
>>>>
>>>> You can also look to see if the worker JVMs are actually running on
>>>> other machines. They should be in separate processes.
>>>>
>>>>
>>>> On Wed, Jan 15, 2014 at 1:57 PM, Alex Coventry <[email protected]>wrote:
>>>>
>>>>> Thanks for the confirmation.  It appears that my topology is not
>>>>> running.  When I run the word-count topology in local mode, I see messages
>>>>> like "Emitting: 3 default ["dog"]".  I assume I should be seeing messages
>>>>> like that in the logs when I run the topology in distributed mode, but I'm
>>>>> not.  I'm also not seeing any exceptions.  Log messages related to
>>>>> communication between nimbus, workers and supervisor appear for about 
>>>>> three
>>>>> minutes, then everything related to the topology seems to shut down,
>>>>> including the UI's report that it's running.
>>>>>
>>>>> I'm pretty stumped by this, and I'd be grateful for any help.  I took
>>>>> a copy of the logs for nimbus, the supervisor, and the workers, for the
>>>>> lifecycle of the topology I just described:
>>>>>
>>>>>    https://dl.dropboxusercontent.com/u/6414090/logs.tgz
>>>>>
>>>>> I'd really appreciate it if someone with more experience with storm
>>>>> could take a look at them and tell me where I'm going wrong.
>>>>>
>>>>> I'm using storm-0.9.0, storm-starter from git master.  ZK and nimbus
>>>>> are running on the same machine, the supervisor/workers are running on
>>>>> another machine.  Judging from the logs, they are all able to see each
>>>>> other.
>>>>>
>>>>> Best regards,
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2014 at 7:26 AM, Jason Trost <[email protected]>wrote:
>>>>>
>>>>>> This should show up in one or more worker log in
>>>>>> $STORM_HOME/logs/worker-*.log.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 14, 2014 at 10:25 PM, Alex Coventry 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> If I explicitly throw an exception in the storm-starter clojure
>>>>>>> example, as shown in the diff below, it shows up nicely when I run in 
>>>>>>> local
>>>>>>> mode with
>>>>>>>
>>>>>>>   coventry@samjoko:~/storm-starter$ lein run -m
>>>>>>>  storm.starter.clj.word-count
>>>>>>>
>>>>>>> However, when I run it on a storm cluster with a command like
>>>>>>>
>>>>>>>   coventry@samjoko:~/storm-starter$
>>>>>>> ~/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
>>>>>>> storm.starter.clj.word_count count
>>>>>>>
>>>>>>> I am not sure where these errors are being reported.  Are they
>>>>>>> logged anywhere, and if not, can I get them to be?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Alex
>>>>>>>
>>>>>>> index ce2725d..c82fd0f 100644
>>>>>>> --- a/src/clj/storm/starter/clj/word_count.clj
>>>>>>> +++ b/src/clj/storm/starter/clj/word_count.clj
>>>>>>> @@ -11,6 +11,7 @@
>>>>>>>                     "an apple a day keeps the doctor away"]]
>>>>>>>      (spout
>>>>>>>       (nextTuple []
>>>>>>> +       (throw (Exception. "Where does this show up?"))
>>>>>>>         (Thread/sleep 100)
>>>>>>>         (emit-spout! collector [(rand-nth sentences)])
>>>>>>>         )
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Workers not running

Reply via email to