I was getting it to do parts of the crawl, but it was not pushing the data to Solr (that was before I moved it to https). I had worked on that for two weeks, and was frustrated and needed to make progress with other parts of the project, so I bailed on the newer nutch and just rolled with 1.2, since that was working.
I'll probably just roll back Solr to not be on a secure port, that will take less time (my current constraint) then getting 1.4 to work. Unless -- is 1.2 able to crawl https sites? If it can't do that then I may have to upgrade.... -- Chris On Thu, Feb 23, 2012 at 2:14 PM, Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> wrote: > Yeah I can confirm it was 1.4 > > On Thu, Feb 23, 2012 at 7:05 PM, Christopher Gross <cogr...@gmail.com>wrote: > >> I tried using 1.4, but I couldn't get that to work at all. > > What is wrong with your configuration, if this is all that is preventing > you from migrating to 1.4 I would rather get it sorted out now... up to > yourself? > > >> It didn't >> come with a "runbot.sh" script, > > You would need to write this yourself... this is because we wish to do many > different tasks with the runbot.sh, however a new runbot.sh will replace > crawl.java (I think) in 1.5 > > >> I was about to try just forcing the cert by adding >> "-Djavax.net.ssl.keystore=xxx -Djx.n.s.keypass=xxx" to the nutch line. >> I'll post back if I have any luck, though from what you're saying I >> probably won't. >> >> I'll try looking into 1.3, unless someone comes back and confirms that >> it's only in 1.4.... >> > > See above... > > >> >> Thanks Lewis! >> >> -- Chris >> >> >> >> On Thu, Feb 23, 2012 at 1:59 PM, Lewis John Mcgibbney >> <lewis.mcgibb...@gmail.com> wrote: >> > Hi Christopher, >> > >> > I don't think Nutch 1.2 could be used with a SOlr server running on basic >> > https authentication. >> > >> > Markus committed a nice section of work which address this in 1.3 iirc, >> or >> > maybe 1.4 I can't remember. Look for the solr.auth property in >> > nutch-default.xml [0] I know it might be a pain, but maybe you could try >> > upgrading, either that or you may need to hack 1.2? >> > >> > Can anyone confirm if this is the case? >> > >> > Lewis >> > >> > [0] >> > >> http://svn.apache.org/viewvc/nutch/trunk/conf/nutch-default.xml?view=markup >> > >> > On Thu, Feb 23, 2012 at 6:48 PM, Christopher Gross <cogr...@gmail.com >> >wrote: >> > >> >> Meant to include this...the output from the runbot.sh script. Not >> >> that it really says a whole lot... >> >> >> >> ----- Index (Step 5 of 8) ----- >> >> SolrIndexer: starting at 2012-02-23 18:18:20 >> >> java.io.IOException: Job failed! >> >> >> >> -- Chris >> >> >> >> >> >> >> >> On Thu, Feb 23, 2012 at 1:26 PM, Christopher Gross <cogr...@gmail.com> >> >> wrote: >> >> > I have my Solr set up on a secure port -- and I think that is causing >> >> > a problem for nutch (nothing else changed.) I don't see anything in >> >> > the documentation regarding this. >> >> > >> >> > My nutch version is 1.2, Solr is 3.4. Here's the line from my >> runbot.sh >> >> script: >> >> > >> >> > $NUTCH_HOME/bin/nutch solrindex https://localhost/nutchsolr/ >> >> > $NUTCH_HOME/crawl/crawldb $NUTCH_HOME/crawl/linkdb/ >> >> > $NUTCH_HOME/crawl/segments/* >> >> > >> >> > Is there another argument that I should pass in? Does this just not >> >> > work on a secure port? I'd appreciate any input. >> >> > >> >> > Thanks! >> >> > >> >> > -- Chris >> >> >> > >> > >> > >> > -- >> > *Lewis* >> > > > > -- > *Lewis*