Re: Index while crawling

.: Abhishek :. Wed, 09 Feb 2011 00:16:12 -0800

Hi all,

 I am kind of still having problems in figuring this out. I used the
instructions in the following URL,


http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/

 At the end what I see is only the search results from the seed urls that
are passed in. I think I am missing out something here, as per the tutorial
there is no where the depth or threads is specified. I feel that is why only
the seeds are showing up and no other pages are shown while searching in
admin screen of solr.

 Could you please let me know some pointers or advice on whats that I am
missing?

Thanks,
Abi

On Tue, Feb 1, 2011 at 6:25 PM, Markus Jelsma <[email protected]>wrote:

> Get your own fresh copy of Solr 1.4.1 (if you get one of the development
> versions you'll need to upgrade the Solr jar's in Nutch' lib). Unpack and
> find
> the example directory. In there you'll overwrite solr/conf/schema.xml with
> the
> one shipped with Nutch and you're good to go. Java -jar start.jar and it's
> running. I'd might also be a good idea to follow the tutorial first.
>
> > Hi,
> >
> >  I am unable to start Solr for the currently running crawl and when I try
> > to the below, I get messages saying the linkdb and segments do not exist
> > in the file system which is the true case.
> >
> >  So how do I run solr in this case? or Do I have to run Solr seperately
> > instead of starting it from the nutch itself.
> >
> > Thanks,
> > Abhi
> >
> > On Mon, Jan 31, 2011 at 11:51 PM, .: Abhishek :. <[email protected]>
> wrote:
> > > Hi Alexander,
> > >
> > >  Thanks for the response. So I should be starting solr as follows,
> > >
> > > bin/nutch solrindex http://127.0.0.1:8080/solr/ crawl/crawldb
> > > crawl/linkdb crawl/segments/*
> > >
> > >  But while fetching we won't have segments right? So in this case how
> do
> > >  I
> > >
> > > start Solr?
> > >
> > > Thanks,
> > > Abhi
> > >
> > >
> > > On Mon, Jan 31, 2011 at 7:30 PM, Alexander Aristov <
> > >
> > > [email protected]> wrote:
> > >> yes, you can but only if you use nutch + solr.
> > >>
> > >> If you use old nutchfrontend then you might brake index and searching
> > >> after
> > >> merging content or indexes.
> > >>
> > >> If you don't merge then search should work during crawling.
> > >>
> > >> but remember that results don't come available for searching
> immediately
> > >> after fetching. all pages must be fetched andf then indexed first to
> be
> > >> searchable.
> > >>
> > >> Best Regards
> > >> Alexander Aristov
> > >>
> > >> On 31 January 2011 13:17, .: Abhishek :. <[email protected]> wrote:
> > >> > Hi folks,
> > >> >
> > >> >  I should thank you all for the great help you have been offering so
> > >>
> > >> far. I
> > >>
> > >> > am learning about Nutch quite well.
> > >> >
> > >> >  One more beginners question here - Can I search for something while
> > >>
> > >> nutch
> > >>
> > >> > is still crawling an site? I believe this is not possible. However,
> > >> > why
> > >>
> > >> I
> > >>
> > >> > am
> > >> > asking this is - I am crawling a big site and  also the site is
> > >> > updated frequently with a lot of new pages, I just wanted to get
> some
> > >> > quick
> > >>
> > >> results
> > >>
> > >> > while its on the go.
> > >> >
> > >> > Thanks,
> > >> > Abhi
>

Re: Index while crawling

Reply via email to