On Jan 27, 2007, at 1:34 PM, Gilbert Groenendijk wrote: > Hello, > > Today i created a simple index with nutch by command line. After > that i > copied the index to the machine to use it with a lucene > envirionment, no > Nutch. Fetching the URL and title works pretty good but how can i > get the > content? if i tak a look in Luke, the field content is not stored or > tokenized but when i look in nutch-default.xml and nutch-site.xml, > i have > definied: > > <property> > <name>fetcher.store.content</name> > <value>true</value> > <description>If true, fetcher will store content.</description> > </property> > > it doesn't seem to work, any idea's?
I'm pretty sure that just means to store content in the WebDB, not the Lucene index. The stored content in the WebDB is used for the cache and the search summary. The WebDB cannot be directly read by Lucene. You can write Java apps to work with the WebDB APi, fetching the content per page as needed. Or, you could use the OpenSearch servlet to pull out the summaries and cache per URI. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
