Thanks again for the help, everyone. The problem turned out to be an incomplete installation of jzmq. I had figured half of the advice given in this thread<http://stackoverflow.com/questions/12115160/compiling-jzmq-on-ubuntu> (the javac command) on my own, and thought the build completed after I did that, but it seems that the other half (to touch classdist_noinst.stamp) was also necessary for some build steps to be triggered.
Best regards, Alex On Wed, Jan 15, 2014 at 3:59 PM, Alex Coventry <[email protected]> wrote: > This > guy<http://lists.zeromq.org/pipermail/zeromq-dev/2013-December/024048.html> is > reporting a very similar segfault. I will try the advice there. > > > On Wed, Jan 15, 2014 at 3:48 PM, Alex Coventry <[email protected]> wrote: > >> Thanks, this is great. If I do that super-quickly, I get a segfault in >> libzmq.so, specifically in zmq::socket_base_t::process_term. (If I'm slow, >> I get "No such file or directory" errors because the worker's trying to >> touch files in subdirs of /mnt/storm which no longer exist.) >> >> This is a pretty horrible result, but it's progress. >> >> Best regards, >> Alex >> >> >> On Wed, Jan 15, 2014 at 3:21 PM, Jon Logan <[email protected]> wrote: >> >>> These issues are annoying to debug. My best solution has been to look in >>> the supervisor log, it'll print out a long command that it used to launch >>> (java -jar .....). Log into that machine, sudo as the storm user, and paste >>> that command and try to execute it yourself. It tends to be a problem >>> launching the worker, and this usually will display the error message. It's >>> a hack, but it usually works. >>> >>> >>> On Wed, Jan 15, 2014 at 3:18 PM, Alex Coventry <[email protected]>wrote: >>> >>>> Thanks, Jon. Yes, I have topology.debug set to true in the config map >>>> passed to StormSubmitter/SumbitTopology, and I've changed all the "INFO" >>>> settings to "DEBUG" in logback/cluster.xml. >>>> >>>> I think you might be onto something with checking whether the JVMs are >>>> running. I noticed in the supervisor log: "Error when trying to kill 8060. >>>> Process is probably already dead." Nathan Marz says in this >>>> thread<https://groups.google.com/forum/#!topic/storm-user/UxbMLIfEV1k> that >>>> it suggests native dependencies are not installed correctly. I'm looking >>>> for ways to test whether this is the case, now. Any suggestions are >>>> welcome. Is there a way to kick off a worker thread in the storm repl, to >>>> get some more info about how it's failing? >>>> >>>> Best regards, >>>> Alex >>>> >>>> On Wed, Jan 15, 2014 at 3:04 PM, Jon Logan <[email protected]> wrote: >>>> >>>>> I haven't looked at your logs, but just to clarify, the message you >>>>> describe is probably a debug message. Debug messages say things like the >>>>> emitting and receiving of tuples. I'm not sure if it's enabled by default >>>>> in local mode, but if you want those messages in normal mode, you need to >>>>> set topology.debug: true in the storm.yaml configuration file. >>>>> >>>>> >>>>> You can also look to see if the worker JVMs are actually running on >>>>> other machines. They should be in separate processes. >>>>> >>>>> >>>>> On Wed, Jan 15, 2014 at 1:57 PM, Alex Coventry <[email protected]>wrote: >>>>> >>>>>> Thanks for the confirmation. It appears that my topology is not >>>>>> running. When I run the word-count topology in local mode, I see >>>>>> messages >>>>>> like "Emitting: 3 default ["dog"]". I assume I should be seeing messages >>>>>> like that in the logs when I run the topology in distributed mode, but >>>>>> I'm >>>>>> not. I'm also not seeing any exceptions. Log messages related to >>>>>> communication between nimbus, workers and supervisor appear for about >>>>>> three >>>>>> minutes, then everything related to the topology seems to shut down, >>>>>> including the UI's report that it's running. >>>>>> >>>>>> I'm pretty stumped by this, and I'd be grateful for any help. I took >>>>>> a copy of the logs for nimbus, the supervisor, and the workers, for the >>>>>> lifecycle of the topology I just described: >>>>>> >>>>>> https://dl.dropboxusercontent.com/u/6414090/logs.tgz >>>>>> >>>>>> I'd really appreciate it if someone with more experience with storm >>>>>> could take a look at them and tell me where I'm going wrong. >>>>>> >>>>>> I'm using storm-0.9.0, storm-starter from git master. ZK and nimbus >>>>>> are running on the same machine, the supervisor/workers are running on >>>>>> another machine. Judging from the logs, they are all able to see each >>>>>> other. >>>>>> >>>>>> Best regards, >>>>>> Alex >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 15, 2014 at 7:26 AM, Jason Trost >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> This should show up in one or more worker log in >>>>>>> $STORM_HOME/logs/worker-*.log. >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 14, 2014 at 10:25 PM, Alex Coventry >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> If I explicitly throw an exception in the storm-starter clojure >>>>>>>> example, as shown in the diff below, it shows up nicely when I run in >>>>>>>> local >>>>>>>> mode with >>>>>>>> >>>>>>>> coventry@samjoko:~/storm-starter$ lein run -m >>>>>>>> storm.starter.clj.word-count >>>>>>>> >>>>>>>> However, when I run it on a storm cluster with a command like >>>>>>>> >>>>>>>> coventry@samjoko:~/storm-starter$ >>>>>>>> ~/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar >>>>>>>> storm.starter.clj.word_count count >>>>>>>> >>>>>>>> I am not sure where these errors are being reported. Are they >>>>>>>> logged anywhere, and if not, can I get them to be? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Alex >>>>>>>> >>>>>>>> index ce2725d..c82fd0f 100644 >>>>>>>> --- a/src/clj/storm/starter/clj/word_count.clj >>>>>>>> +++ b/src/clj/storm/starter/clj/word_count.clj >>>>>>>> @@ -11,6 +11,7 @@ >>>>>>>> "an apple a day keeps the doctor away"]] >>>>>>>> (spout >>>>>>>> (nextTuple [] >>>>>>>> + (throw (Exception. "Where does this show up?")) >>>>>>>> (Thread/sleep 100) >>>>>>>> (emit-spout! collector [(rand-nth sentences)]) >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
