Hi, I’m running Nutch 2.2.1 on a 3-node Hadoop 1.2.1 cluster. I’m using Gora to store the crawl data on Cassandra. Since Gora 0.3 does not support string and null unions on the avro schema, I was advised to use Gora 0.4 SNAPSHOT and bundle it with Nutch to create the job file.
However, upon finally running “ant job” on the NUTCH_HOME directory, the 0.3 version is bundled in the job file and not the 0.4 snapshot. I suppose this is because ant does a full cleanup and copy of libs and also because in ivy/ivy.xml, the cassandra dependancy is mentioned as rev=“0.3”. I changed that to “0.4-SNAPSHOT” and I’m able to build by moving the snapshot artefacts to /home/hduser/.ivy2/local/org.apache.gora/gora-cassandra/0.4-SNAPSHOT/jars (since this is where the system looks for local jars before looking up on maven’s online repo). By doing so, I’m able to build the job file with 0.4 snapshot bundled but I’m not getting other dependencies like thrift etc. Kindly help me with a permanent solution to this problem. -- Manikandan Saravanan Architect - Technology TheSocialPeople

