Well, using the path ${nutch-crawl.dir}/index gives the following exception,
Why is this so?
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:326)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:149)
at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
at
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:86)
at
org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:45)
at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:112)
at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)
at
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:38)
at LuceneSearch.main(LuceneSearch.java:11)
-----Oprindelig meddelelse-----
Fra: Bruno Patini Furtado [mailto:[EMAIL PROTECTED]
Sendt: 29. november 2005 12:36
Til: [email protected]; [EMAIL PROTECTED]
Emne: Re: Standalone app
The segments dir is Nutch only and has nothing to do with the Lucene Index,
which is found at the ${nutch-crawl.dir}/index.
The following lucene code works for me:
Searcher searcher = new IndexSearcher("${nutch-crawl.dir}/index");
I hope this helps.
On 11/29/05, Kasper Hansen <[EMAIL PROTECTED]> wrote:
>
> I am using dir path: /home/kah/Downloads/nutch-0.7.1/crawl.pdf/
> and getting the following exception!
>
> Exception in thread "main"
> java.io.FileNotFoundException:
> /home/kah/Downloads/nutch-0.7.1/crawl.pdf/segments (Is a directory)
>
>
> Tirsdag 22 november 2005 13:12 skrev Kasper Hansen:
> > Hi,
> > I get an Exception when trying to search my Nutch crawl from a
> standalone
> > java app. How do I search the Nutch crawl? Is the path of the index
> that's
> > wrong? When I remove /index from the path I get:
> > Exception in thread "main"
> > java.io.FileNotFoundException:
> > /home/kah/Downloads/nutch-0.7.1/crawl.pdf/segments (Is a directory)
> >
> > But I also get an Exception when using
> > /home/kah/Downloads/nutch-0.7.1/crawl.pdf/index
> > as path to the crawl
> >
> > import org.apache.lucene.search.IndexSearcher;
> > import org.apache.lucene.search.Query;
> > import org.apache.lucene.queryParser.QueryParser;
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.search.Hits;
> > import org.apache.lucene.document.Document;
> >
> >
> > public class SearchCrawl {
> > public static void main(String[] args) throws Exception {
> >
> > IndexSearcher indexSearcher = new
> > IndexSearcher("/home/kah/Downloads/nutch-0.7.1/crawl.pdf/index");
> > Query query = QueryParser.parse("some search phrase",
> "content", new
> > StandardAnalyzer());
> >
> > Hits hits = indexSearcher.search(query);
> >
> > for(int i = 0; i > hits.length(); i++) {
> > Document doc = hits.doc(i);
> > String title = doc.get(LuceneFieldValues.TITLE);
> > String content = doc.get(LuceneFieldValues.CONTENT
> );
> >
> > System.out.println(title+"\t"+content);
> > }
> > System.out.println("Search done..");
> > }
> > }
>
--
"Minds are like parachutes, they work best when open."
Bruno Patini Furtado
Software Developer
webpage: www.bpfurtado.net
blog: http://www.livejournal.com/users/bpfurtado/
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general