Thanks again for the help, everyone.  The problem turned out to be an
incomplete installation of jzmq.  I had figured half of the advice
given in this
thread<http://stackoverflow.com/questions/12115160/compiling-jzmq-on-ubuntu>
(the
javac command) on my own, and thought the build completed after I did that,
but it seems that the other half (to touch classdist_noinst.stamp) was also
necessary for some build steps to be triggered.

Best regards,
Alex


On Wed, Jan 15, 2014 at 3:59 PM, Alex Coventry <[email protected]> wrote:

> This 
> guy<http://lists.zeromq.org/pipermail/zeromq-dev/2013-December/024048.html> is
> reporting a very similar segfault.  I will try the advice there.
>
>
> On Wed, Jan 15, 2014 at 3:48 PM, Alex Coventry <[email protected]> wrote:
>
>> Thanks, this is great.  If I do that super-quickly, I get a segfault in
>> libzmq.so, specifically in zmq::socket_base_t::process_term.  (If I'm slow,
>> I get "No such file or directory" errors because the worker's trying to
>> touch files in subdirs of /mnt/storm which no longer exist.)
>>
>> This is a pretty horrible result, but it's progress.
>>
>> Best regards,
>> Alex
>>
>>
>> On Wed, Jan 15, 2014 at 3:21 PM, Jon Logan <[email protected]> wrote:
>>
>>> These issues are annoying to debug. My best solution has been to look in
>>> the supervisor log, it'll print out a long command that it used to launch
>>> (java -jar .....). Log into that machine, sudo as the storm user, and paste
>>> that command and try to execute it yourself. It tends to be a problem
>>> launching the worker, and this usually will display the error message. It's
>>> a hack, but it usually works.
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:18 PM, Alex Coventry <[email protected]>wrote:
>>>
>>>> Thanks, Jon.  Yes, I have topology.debug set to true in the config map
>>>> passed to StormSubmitter/SumbitTopology, and I've changed all the "INFO"
>>>> settings to "DEBUG" in logback/cluster.xml.
>>>>
>>>> I think you might be onto something with checking whether the JVMs are
>>>> running.  I noticed in the supervisor log: "Error when trying to kill 8060.
>>>> Process is probably already dead."  Nathan Marz says in this 
>>>> thread<https://groups.google.com/forum/#!topic/storm-user/UxbMLIfEV1k> that
>>>> it suggests native dependencies are not installed correctly.  I'm looking
>>>> for ways to test whether this is the case, now.  Any suggestions are
>>>> welcome.  Is there a way to kick off a worker thread in the storm repl, to
>>>> get some more info about how it's failing?
>>>>
>>>> Best regards,
>>>> Alex
>>>>
>>>> On Wed, Jan 15, 2014 at 3:04 PM, Jon Logan <[email protected]> wrote:
>>>>
>>>>> I haven't looked at your logs, but just to clarify, the message you
>>>>> describe is probably a debug message. Debug messages say things like the
>>>>> emitting and receiving of tuples. I'm not sure if it's enabled by default
>>>>> in local mode, but if you want those messages in normal mode, you need to
>>>>> set topology.debug: true in the storm.yaml configuration file.
>>>>>
>>>>>
>>>>> You can also look to see if the worker JVMs are actually running on
>>>>> other machines. They should be in separate processes.
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2014 at 1:57 PM, Alex Coventry <[email protected]>wrote:
>>>>>
>>>>>> Thanks for the confirmation.  It appears that my topology is not
>>>>>> running.  When I run the word-count topology in local mode, I see 
>>>>>> messages
>>>>>> like "Emitting: 3 default ["dog"]".  I assume I should be seeing messages
>>>>>> like that in the logs when I run the topology in distributed mode, but 
>>>>>> I'm
>>>>>> not.  I'm also not seeing any exceptions.  Log messages related to
>>>>>> communication between nimbus, workers and supervisor appear for about 
>>>>>> three
>>>>>> minutes, then everything related to the topology seems to shut down,
>>>>>> including the UI's report that it's running.
>>>>>>
>>>>>> I'm pretty stumped by this, and I'd be grateful for any help.  I took
>>>>>> a copy of the logs for nimbus, the supervisor, and the workers, for the
>>>>>> lifecycle of the topology I just described:
>>>>>>
>>>>>>    https://dl.dropboxusercontent.com/u/6414090/logs.tgz
>>>>>>
>>>>>> I'd really appreciate it if someone with more experience with storm
>>>>>> could take a look at them and tell me where I'm going wrong.
>>>>>>
>>>>>> I'm using storm-0.9.0, storm-starter from git master.  ZK and nimbus
>>>>>> are running on the same machine, the supervisor/workers are running on
>>>>>> another machine.  Judging from the logs, they are all able to see each
>>>>>> other.
>>>>>>
>>>>>> Best regards,
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 15, 2014 at 7:26 AM, Jason Trost 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> This should show up in one or more worker log in
>>>>>>> $STORM_HOME/logs/worker-*.log.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 14, 2014 at 10:25 PM, Alex Coventry 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> If I explicitly throw an exception in the storm-starter clojure
>>>>>>>> example, as shown in the diff below, it shows up nicely when I run in 
>>>>>>>> local
>>>>>>>> mode with
>>>>>>>>
>>>>>>>>   coventry@samjoko:~/storm-starter$ lein run -m
>>>>>>>>  storm.starter.clj.word-count
>>>>>>>>
>>>>>>>> However, when I run it on a storm cluster with a command like
>>>>>>>>
>>>>>>>>   coventry@samjoko:~/storm-starter$
>>>>>>>> ~/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
>>>>>>>> storm.starter.clj.word_count count
>>>>>>>>
>>>>>>>> I am not sure where these errors are being reported.  Are they
>>>>>>>> logged anywhere, and if not, can I get them to be?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> index ce2725d..c82fd0f 100644
>>>>>>>> --- a/src/clj/storm/starter/clj/word_count.clj
>>>>>>>> +++ b/src/clj/storm/starter/clj/word_count.clj
>>>>>>>> @@ -11,6 +11,7 @@
>>>>>>>>                     "an apple a day keeps the doctor away"]]
>>>>>>>>      (spout
>>>>>>>>       (nextTuple []
>>>>>>>> +       (throw (Exception. "Where does this show up?"))
>>>>>>>>         (Thread/sleep 100)
>>>>>>>>         (emit-spout! collector [(rand-nth sentences)])
>>>>>>>>         )
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to