Nutch 1.4 comes with a indexchecker tool that tells you how fields are sent to Solr for a given URL.
On Friday 30 September 2011 15:53:24 Bai Shen wrote: > Ah. I was hoping to look at the created index before I sent it over to the > solr server. > > On Fri, Sep 30, 2011 at 2:26 AM, Elisabeth Adler > > <[email protected]>wrote: > > Yep, after fetching and parsing the pages, you need to tell Nutch to > > index the data in Solr, like: > > ./nutch solrindex http://localhost:8080/solr/ crawl/crawldb crawl/linkdb > > crawl/segments/* > > > > It's all explained in the wiki: http://wiki.apache.org/nutch/** > > NutchTutorial <http://wiki.apache.org/nutch/NutchTutorial> > > > > Best, > > Elisabeth > > > > On 27.09.2011 15:08, Bai Shen wrote: > >> I'm using Luke 3.3 and Nutch 1.3 > >> > >> I didn't see any fdt files. Are those created when you run the > >> solrindex command? > >> > >> On Mon, Sep 26, 2011 at 10:11 AM, Elisabeth > >> Adler<elisabeth.adler@gmail.* *com <[email protected]> > >> > >>> wrote: > >> Which version of Luke and Nutch are you using? I had the same problem > >> > >>> with > >>> Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 ( > >>> http://code.google.com/p/****luke/ <http://code.google.com/p/**luke/>< > >>> http://code.google.com/**p/luke/ <http://code.google.com/p/luke/>>) and > >>> > >>> it's working without problems now. Btw, you need to select the > >>> directory "data/index" (containing .fdt and more files). > >>> Hope this helps, > >>> Elisabeth > >>> > >>> On 26.09.2011 15:49, Bai Shen wrote: > >>> So I used the tutorial to do some crawling with Nutch and I've done > >>> all > >>> > >>>> the > >>>> way up to Step 4. I want to look at what I've indexed so far before I > >>>> import it into Solr so I can make sure that everything is working > >>>> correctly. > >>>> > >>>> But no matter which directory I use, Luke tells me that there's no > >>>> valid index. Do I need to run the solrindex command? And is there a > >>>> way to do > >>>> it > >>>> without pushing it to my solr install? > >>>> > >>>> Thanks. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

