I was getting it to do parts of the crawl, but it was not pushing the
data to Solr (that was before I moved it to https).  I had worked on
that for two weeks, and was frustrated and needed to make progress
with other parts of the project, so I bailed on the newer nutch and
just rolled with 1.2, since that was working.

I'll probably just roll back Solr to not be on a secure port, that
will take less time (my current constraint) then getting 1.4 to work.

Unless -- is 1.2 able to crawl https sites?  If it can't do that then
I may have to upgrade....

-- Chris



On Thu, Feb 23, 2012 at 2:14 PM, Lewis John Mcgibbney
<lewis.mcgibb...@gmail.com> wrote:
> Yeah I can confirm it was 1.4
>
> On Thu, Feb 23, 2012 at 7:05 PM, Christopher Gross <cogr...@gmail.com>wrote:
>
>> I tried using 1.4, but I couldn't get that to work at all.
>
> What is wrong with your configuration, if this is all that is preventing
> you from migrating to 1.4 I would rather get it sorted out now... up to
> yourself?
>
>
>> It didn't
>> come with a "runbot.sh" script,
>
> You would need to write this yourself... this is because we wish to do many
> different tasks with the runbot.sh, however a new runbot.sh will replace
> crawl.java (I think) in 1.5
>
>
>> I was about to try just forcing the cert by adding
>> "-Djavax.net.ssl.keystore=xxx -Djx.n.s.keypass=xxx" to the nutch line.
>>  I'll post back if I have any luck, though from what you're saying I
>> probably won't.
>>
>> I'll try looking into 1.3, unless someone comes back and confirms that
>> it's only in 1.4....
>>
>
> See above...
>
>
>>
>> Thanks Lewis!
>>
>> -- Chris
>>
>>
>>
>> On Thu, Feb 23, 2012 at 1:59 PM, Lewis John Mcgibbney
>> <lewis.mcgibb...@gmail.com> wrote:
>> > Hi Christopher,
>> >
>> > I don't think Nutch 1.2 could be used with a SOlr server running on basic
>> > https authentication.
>> >
>> > Markus committed a nice section of work which address this in 1.3 iirc,
>> or
>> > maybe 1.4 I can't remember. Look for the solr.auth property in
>> > nutch-default.xml [0] I know it might be a pain, but maybe you could try
>> > upgrading, either that or you may need to hack 1.2?
>> >
>> > Can anyone confirm if this is the case?
>> >
>> > Lewis
>> >
>> > [0]
>> >
>> http://svn.apache.org/viewvc/nutch/trunk/conf/nutch-default.xml?view=markup
>> >
>> > On Thu, Feb 23, 2012 at 6:48 PM, Christopher Gross <cogr...@gmail.com
>> >wrote:
>> >
>> >> Meant to include this...the output from the runbot.sh script.  Not
>> >> that it really says a whole lot...
>> >>
>> >> ----- Index (Step 5 of 8) -----
>> >> SolrIndexer: starting at 2012-02-23 18:18:20
>> >> java.io.IOException: Job failed!
>> >>
>> >> -- Chris
>> >>
>> >>
>> >>
>> >> On Thu, Feb 23, 2012 at 1:26 PM, Christopher Gross <cogr...@gmail.com>
>> >> wrote:
>> >> > I have my Solr set up on a secure port -- and I think that is causing
>> >> > a problem for nutch (nothing else changed.)  I don't see anything in
>> >> > the documentation regarding this.
>> >> >
>> >> > My nutch version is 1.2, Solr is 3.4.  Here's the line from my
>> runbot.sh
>> >> script:
>> >> >
>> >> > $NUTCH_HOME/bin/nutch solrindex https://localhost/nutchsolr/
>> >> > $NUTCH_HOME/crawl/crawldb $NUTCH_HOME/crawl/linkdb/
>> >> > $NUTCH_HOME/crawl/segments/*
>> >> >
>> >> > Is there another argument that I should pass in?  Does this just not
>> >> > work on a secure port?  I'd appreciate any input.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > -- Chris
>> >>
>> >
>> >
>> >
>> > --
>> > *Lewis*
>>
>
>
>
> --
> *Lewis*

Reply via email to