I have not been able to solve the mystery of Generator.java modifications
taking effect only on the master from which "generate" is run, and not taking
effect on any of the slaves, despite slaves having the same Generator.class,
the same nutch jar and the same nutch .job file as the ones on the master.
Changes to other classes, such as Fetcher2, do appear on slaves.
Is there something about the Generator that would cause this?
To see what I mean:
- edit Generator.java:
add to the beginning of generate(.....) method:
LOG.info("Modified Generator Running");
- compile with ant jar, rebuild .job file with ant job
- bin/stop-all.sh
- put Generator.class, the jar file and the .job file on all nodes
- bin/start-all.sh
- run bin/nutch generate .... on the master
- when generate is done, run this on all nodes:
$ grep "Modified Generator Running" logs/hadoop.log
When I do this, I only see "Modified Generator Running" in hadoop.log on the
master where I run bin/nutch generate ...
Logs on the slave machines do *not* contain this, as if an older version of
Generator is running.
Is it possible that Hadoop is caching Java classes?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: Nutch User List <[email protected]>
Sent: Saturday, April 12, 2008 3:32:10 AM
Subject: Distributing code changes to nodes
Hi,
When a code change is made (e.g. Foo.java is changed and recompiled), does one
need to copy that .class to all other relevant Nutch nodes? Or does one need
to just recompile it locally and rebuild the .job file? Or something else?
I'm asking because I'm observing some strange behaviour around that. I
modified Fetcher2 (for NUTCH-629), copied it to a master node, compiled it,
rebuild the jar, rebuild the .job file and copid all that to all other nodes.
Great, I can see from the logs on all nodes that my changes are indeed running
in tasks on all nodes.
But then I also modified Generator a bit and applied
https://issues.apache.org/jira/browse/NUTCH-570 and did the same recompile, new
jar, new .job copy everywhere procedure. But for some reason I don't see my
changes running on *any* nodes other than my master node, which is the node
where the bin/hadoop generate ... command is issued.
Is there something special about Generator that would cause this?
This is what I do after every code change:
$ cd nutch && bin/stop-all.sh && ant jar job && sh ~/bin/sync-to-slaves.sh &&
bin/start-all.sh
$ cat ~/bin/sync-to-slaves.sh
NUTCH_HOME=~/nutch
for h in foo bar baz do
rsync -az $NUTCH_HOME/src $h:$NUTCH_HOME
rsync -az $NUTCH_HOME/build $h:$NUTCH_HOME
rsync -az $NUTCH_HOME/conf $h:$NUTCH_HOME
done
Again, I see my Fetcher2 changes being executed everywhere, but not my
Generator changes.
Am I doing something wrong?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch