lucene usage without website
I want to create a knowledgebase but it needs to be something that does not require a server to run constantly (like with using jsp). I just needs to run on the Windows platform. Lucene works well with Windows using an applet right? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene and Mysql
You would just take the items from mysql database and create a document for each record. Then index all the documents. -Original Message- From: Stefan Trcko [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 16, 2003 3:31 PM To: [EMAIL PROTECTED] Subject: Lucene and Mysql Hello I'm new to Lucene. I want users can search text which is stored in mysql database. Is there any tutorial how to implement this kind of search feature. Best regards, Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Word Documents
As a spinoff, I was wondering if anyone has been happy with indexing and searching Word docs. What about reading the contents? Any problems? -Original Message- From: Ryan Ackley [mailto:[EMAIL PROTECTED] Sent: Friday, December 12, 2003 5:59 PM To: Zhou, Oliver; Lucene Users List Subject: Re: textmining: document title Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF API. It allows you to extract metadata like Title, Author, etc. from OLE documents. -Ryan - Original Message - From: "Zhou, Oliver" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, December 12, 2003 5:26 PM Subject: textmining: document title > Ryan, > > I'm using textmining and lucene to index word documents but don't know how > to get word document title. Your advice on this matter is appreciated. > > Thanks, > Oliver Zhou > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Unindexed fields
If you don't index something then it's not going to be searched. -Original Message- From: Chong, Herb [mailto:[EMAIL PROTECTED] Sent: Monday, December 08, 2003 11:14 AM To: Lucene Users List Subject: Unindexed fields is there a limit to the size of an UnIndexed field? i changed my code to increase the maximum string size per document from 300 bytes to 10,000 and although the index run completes without errors, i never find any documents while searching. Herb - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Yes it is in the list of arrays that I want searched. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 3:32 PM To: Lucene Users List Subject: Re: Returning one result On Fri, Dec 05, 2003 at 03:14:08PM -0500, Pleasant, Tracy wrote: > What do you mean 'add' in MultiFieldQueryParser? I am using all the > fields Sorry, that was wrong. What I meant to say is are you adding the field to the array of fields that need to be searched? You need to use a MultiFieldQueryParser and pass it the array of fields that you want searched. Dror > > When I index it does > > add (Field.Keyword(..,..)) > > > But I don't want the user to have to type ID: It would be > nice to just type ID Number. On your site if you just put: 11183 in the > search box there are no results. > > well, right now I'll just do it as text and query that field for the id > # to display the document. It can't hurt, right? :) Unless the Keyword > is a better way > > > > -Original Message- > From: Dror Matalon [mailto:[EMAIL PROTECTED] > Sent: Friday, December 05, 2003 3:06 PM > To: Lucene Users List > Subject: Re: Returning one result > > > On Fri, Dec 05, 2003 at 02:45:34PM -0500, Pleasant, Tracy wrote: > > Maybe we are having some communication issues. > > > > At any rate, I did index it as a KEYWORD and when displaying used the > > TermQuery. > > > > The only problem with this though is by storing the ID (i.e. AR345) as > a > > Keyword, if I search for AR345 no results are returned when I use the > > MultiFieldQueryParser . > > > > *sigh* *arg* > > OK. > > Go to http://www.fastbuzz.com/search/index.jsp and type "lucene" without > the quotes and hit search. You get results from different channels/rss > feeds. > > Now type "lucene channel:11183" without the quotes and hit search. You > get results only from Java-Channel. > > We're inserting the field channel as a keyword, and it does what I > understand you want to use AR345. > > I would guess that in MultiFieldQueryParser you are not doing an add() > of the field for AR345 which is why the search fails. > > Regards, > > Dror > > > > > > > > > > -Original Message- > > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > > Sent: Friday, December 05, 2003 2:13 PM > > To: Lucene Users List > > Subject: Re: Returning one result > > > > > > On Friday, December 5, 2003, at 01:25 PM, Pleasant, Tracy wrote: > > > Say ID is Ar3453 .. well the user may want to search for Ar3453, so > in > > > order for it to be searchable then it would have to be indexed and > not > > > > > a > > > keyword. > > > > *arg* - we're having a serious communication issue here. My advice to > > > you is to actually write some simple tests (test-driven learning using > > > JUnit is a wonderful way to experiement with Lucene, especially thanks > > > to the RAMDirectory). Please refer to my articles at java.net as well > > > as the other great Lucene articles out there. > > > > Let me try again a Field.Keyword *IS* indexed! Even Lucene's > > javadocs say this for this method: > > > >/** Constructs a String-valued Field that is not tokenized, but is > > >>>indexed<<< > > and stored. Useful for non-text fields, e.g. date or url. */ > > > > [I added the emphasis there] > > > > > > > So after using > > > TermQuery query = new TermQuery(new Term("id", term)); > > > > > > How would I return the other fields in the document? > > > > > > For instance to display a record it would get the record with the id > # > > > and then display the title, contents, etc. > > > > Umm you'd use *exactly* the same way as if you had used > > QueryParser. QueryParser would create a TermQuery for you, in fact, > > except it would analyze your text first, which is what you want to > > avoid, right? > > > > Hits.doc(n) gives you back a Document. And then > > Document.get("fieldName") gives you back the fields (as long as you > >>> > > stored <<< them in the index too). > > > > Again, please attempt some of these things in code. It is a trivial > > matter to index and search using RAMDirectory and experiment with > > TermQuery, QueryParser, Analy
RE: Returning one result
What do you mean 'add' in MultiFieldQueryParser? I am using all the fields When I index it does add (Field.Keyword(..,..)) But I don't want the user to have to type ID: It would be nice to just type ID Number. On your site if you just put: 11183 in the search box there are no results. well, right now I'll just do it as text and query that field for the id # to display the document. It can't hurt, right? :) Unless the Keyword is a better way -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 3:06 PM To: Lucene Users List Subject: Re: Returning one result On Fri, Dec 05, 2003 at 02:45:34PM -0500, Pleasant, Tracy wrote: > Maybe we are having some communication issues. > > At any rate, I did index it as a KEYWORD and when displaying used the > TermQuery. > > The only problem with this though is by storing the ID (i.e. AR345) as a > Keyword, if I search for AR345 no results are returned when I use the > MultiFieldQueryParser . > > *sigh* *arg* OK. Go to http://www.fastbuzz.com/search/index.jsp and type "lucene" without the quotes and hit search. You get results from different channels/rss feeds. Now type "lucene channel:11183" without the quotes and hit search. You get results only from Java-Channel. We're inserting the field channel as a keyword, and it does what I understand you want to use AR345. I would guess that in MultiFieldQueryParser you are not doing an add() of the field for AR345 which is why the search fails. Regards, Dror > > > > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Friday, December 05, 2003 2:13 PM > To: Lucene Users List > Subject: Re: Returning one result > > > On Friday, December 5, 2003, at 01:25 PM, Pleasant, Tracy wrote: > > Say ID is Ar3453 .. well the user may want to search for Ar3453, so in > > order for it to be searchable then it would have to be indexed and not > > > a > > keyword. > > *arg* - we're having a serious communication issue here. My advice to > you is to actually write some simple tests (test-driven learning using > JUnit is a wonderful way to experiement with Lucene, especially thanks > to the RAMDirectory). Please refer to my articles at java.net as well > as the other great Lucene articles out there. > > Let me try again a Field.Keyword *IS* indexed! Even Lucene's > javadocs say this for this method: > >/** Constructs a String-valued Field that is not tokenized, but is > >>>indexed<<< > and stored. Useful for non-text fields, e.g. date or url. */ > > [I added the emphasis there] > > > > So after using > > TermQuery query = new TermQuery(new Term("id", term)); > > > > How would I return the other fields in the document? > > > > For instance to display a record it would get the record with the id # > > and then display the title, contents, etc. > > Umm you'd use *exactly* the same way as if you had used > QueryParser. QueryParser would create a TermQuery for you, in fact, > except it would analyze your text first, which is what you want to > avoid, right? > > Hits.doc(n) gives you back a Document. And then > Document.get("fieldName") gives you back the fields (as long as you >>> > stored <<< them in the index too). > > Again, please attempt some of these things in code. It is a trivial > matter to index and search using RAMDirectory and experiment with > TermQuery, QueryParser, Analyzers, etc. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Thanks, but using it as a Keyword, it will not get returned with my search results when I use MultiFieldQueryParser. If I could I would use just parse(query) but that is not a static method, only parse(query,field,analyzer) is... So when I do that and use an analyzer, the keyword field isn't searched. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 2:14 PM To: Lucene Users List Subject: Re: Returning one result On Fri, Dec 05, 2003 at 01:25:23PM -0500, Pleasant, Tracy wrote: > What I meant is. > > Say ID is Ar3453 .. well the user may want to search for Ar3453, so in > order for it to be searchable then it would have to be indexed and not a > keyword. No. You should store it as a keyword. >From the javadocs: Keyword(String name, String value) Constructs a String-valued Field that is not tokenized, but is indexed and stored. > > So after using > TermQuery query = new TermQuery(new Term("id", term)); > > How would I return the other fields in the document? > > For instance to display a record it would get the record with the id # > and then display the title, contents, etc. > > > > > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Friday, December 05, 2003 11:32 AM > To: Lucene Users List > Subject: Re: Returning one result > > > On Friday, December 5, 2003, at 10:41 AM, Pleasant, Tracy wrote: > > Maybe I should have been more clear. > > > > static Field Keyword(String name, String value) > > Constructs a String-valued Field that is not tokenized, but > > is > > indexed and stored. > > > > I need to have it tokenized because people will search for that also > > and > > it needs to be searchable. > > Search for *what* also? Tokenized means that it is broken into pieces > which will be separate terms. For example: "see spot" is tokenized > into "see" and "spot", and searching for either of those terms will > match. > > Just try it and see, please! :) > > > Should I have two fields - one as a keyword and one as text? > > Depends on what you're doing... but an "id" field to me indicates > Field.Keyword to me, only. > > > How would I do that when I want to return search results.. > > > > Searcher searcher = new IndexSearcher("index"); > > String term = request.getParameter("id"); > > > Query query = QueryParser.parse(term, "id", new > > StandardAnalyzer()); > > > > Hits hits = searcher.search(query); > > > > Would it have to be something like: > > TermQuery query = ??? > > Yes. TermQuery query = new TermQuery(new Term("id", term)); > > Use searcher.search exactly as you did before. Just don't use > QueryParser to construct a query. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Maybe we are having some communication issues. At any rate, I did index it as a KEYWORD and when displaying used the TermQuery. The only problem with this though is by storing the ID (i.e. AR345) as a Keyword, if I search for AR345 no results are returned when I use the MultiFieldQueryParser . *sigh* *arg* -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 2:13 PM To: Lucene Users List Subject: Re: Returning one result On Friday, December 5, 2003, at 01:25 PM, Pleasant, Tracy wrote: > Say ID is Ar3453 .. well the user may want to search for Ar3453, so in > order for it to be searchable then it would have to be indexed and not > a > keyword. *arg* - we're having a serious communication issue here. My advice to you is to actually write some simple tests (test-driven learning using JUnit is a wonderful way to experiement with Lucene, especially thanks to the RAMDirectory). Please refer to my articles at java.net as well as the other great Lucene articles out there. Let me try again a Field.Keyword *IS* indexed! Even Lucene's javadocs say this for this method: /** Constructs a String-valued Field that is not tokenized, but is >>>indexed<<< and stored. Useful for non-text fields, e.g. date or url. */ [I added the emphasis there] > So after using > TermQuery query = new TermQuery(new Term("id", term)); > > How would I return the other fields in the document? > > For instance to display a record it would get the record with the id # > and then display the title, contents, etc. Umm you'd use *exactly* the same way as if you had used QueryParser. QueryParser would create a TermQuery for you, in fact, except it would analyze your text first, which is what you want to avoid, right? Hits.doc(n) gives you back a Document. And then Document.get("fieldName") gives you back the fields (as long as you >>> stored <<< them in the index too). Again, please attempt some of these things in code. It is a trivial matter to index and search using RAMDirectory and experiment with TermQuery, QueryParser, Analyzers, etc. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Also what I am indexing is not a bunch of separate documents - or then it would be easy to simply have a field called "url" and then the link would go directly do that document. However, there is a text URL with many records During indexing, a function parses each record and puts each into a document with appropriate fields. When I go to display a particular Document (Lucene Document) I just query the index for that unique ID rather than go through and parse through the URL with all the records. Wouldn't querying the index for that unique ID be better than going through that entire page and parsing through it - there is more room for error that way. It's a long story why there isn't a database but it can't be done (don't ask ... long story). -Original Message- From: Pleasant, Tracy Sent: Friday, December 05, 2003 1:25 PM To: Lucene Users List Subject: RE: Returning one result What I meant is. Say ID is Ar3453 .. well the user may want to search for Ar3453, so in order for it to be searchable then it would have to be indexed and not a keyword. So after using TermQuery query = new TermQuery(new Term("id", term)); How would I return the other fields in the document? For instance to display a record it would get the record with the id # and then display the title, contents, etc. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 11:32 AM To: Lucene Users List Subject: Re: Returning one result On Friday, December 5, 2003, at 10:41 AM, Pleasant, Tracy wrote: > Maybe I should have been more clear. > > static Field Keyword(String name, String value) > Constructs a String-valued Field that is not tokenized, but > is > indexed and stored. > > I need to have it tokenized because people will search for that also > and > it needs to be searchable. Search for *what* also? Tokenized means that it is broken into pieces which will be separate terms. For example: "see spot" is tokenized into "see" and "spot", and searching for either of those terms will match. Just try it and see, please! :) > Should I have two fields - one as a keyword and one as text? Depends on what you're doing... but an "id" field to me indicates Field.Keyword to me, only. > How would I do that when I want to return search results.. > > Searcher searcher = new IndexSearcher("index"); > String term = request.getParameter("id"); > Query query = QueryParser.parse(term, "id", new > StandardAnalyzer()); > > Hits hits = searcher.search(query); > > Would it have to be something like: > TermQuery query = ??? Yes. TermQuery query = new TermQuery(new Term("id", term)); Use searcher.search exactly as you did before. Just don't use QueryParser to construct a query. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
What I meant is. Say ID is Ar3453 .. well the user may want to search for Ar3453, so in order for it to be searchable then it would have to be indexed and not a keyword. So after using TermQuery query = new TermQuery(new Term("id", term)); How would I return the other fields in the document? For instance to display a record it would get the record with the id # and then display the title, contents, etc. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2003 11:32 AM To: Lucene Users List Subject: Re: Returning one result On Friday, December 5, 2003, at 10:41 AM, Pleasant, Tracy wrote: > Maybe I should have been more clear. > > static Field Keyword(String name, String value) > Constructs a String-valued Field that is not tokenized, but > is > indexed and stored. > > I need to have it tokenized because people will search for that also > and > it needs to be searchable. Search for *what* also? Tokenized means that it is broken into pieces which will be separate terms. For example: "see spot" is tokenized into "see" and "spot", and searching for either of those terms will match. Just try it and see, please! :) > Should I have two fields - one as a keyword and one as text? Depends on what you're doing... but an "id" field to me indicates Field.Keyword to me, only. > How would I do that when I want to return search results.. > > Searcher searcher = new IndexSearcher("index"); > String term = request.getParameter("id"); > Query query = QueryParser.parse(term, "id", new > StandardAnalyzer()); > > Hits hits = searcher.search(query); > > Would it have to be something like: > TermQuery query = ??? Yes. TermQuery query = new TermQuery(new Term("id", term)); Use searcher.search exactly as you did before. Just don't use QueryParser to construct a query. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Maybe I should have been more clear. static Field Keyword(String name, String value) Constructs a String-valued Field that is not tokenized, but is indexed and stored. I need to have it tokenized because people will search for that also and it needs to be searchable. Should I have two fields - one as a keyword and one as text? How would I do that when I want to return search results.. Right now, in the results page it will have something like Record AR334 Then in display_record.jsp: Searcher searcher = new IndexSearcher("index"); String term = request.getParameter("id"); Query query = QueryParser.parse(term, "id", new StandardAnalyzer()); Hits hits = searcher.search(query); Would it have to be something like: TermQuery query = ??? or Query query = QueryParser.Term("id"); ? ? ? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 6:18 PM To: Lucene Users List Subject: Re: Returning one result You really should use a TermQuery in this case anyway, rather than using QueryParser. You wouldn't have to worry about the analyzer at that point anyway (and I assume you're using Field.Keyword during indexing). Erik On Thursday, December 4, 2003, at 05:01 PM, Pleasant, Tracy wrote: > Ok I realized teh Simple Analyzer does not index numbers, so I switched > back to Standard. > > -Original Message- > From: Pleasant, Tracy > Sent: Thursday, December 04, 2003 4:53 PM > To: Lucene Users List > Subject: Returning one result > > > I am indexing a group of items and one field , id, is unique. When > the > user clicks on a results I want just that one result to show. > > I index and search using SimpleAnalyzer. > > > Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer()); > > It should return only one result but returns 200. > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Actually Erik, no I'm using Field.Text When I used Field.Keyword and tried to get the word for return with search results it would not display correctly... -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 6:18 PM To: Lucene Users List Subject: Re: Returning one result You really should use a TermQuery in this case anyway, rather than using QueryParser. You wouldn't have to worry about the analyzer at that point anyway (and I assume you're using Field.Keyword during indexing). Erik On Thursday, December 4, 2003, at 05:01 PM, Pleasant, Tracy wrote: > Ok I realized teh Simple Analyzer does not index numbers, so I switched > back to Standard. > > -Original Message----- > From: Pleasant, Tracy > Sent: Thursday, December 04, 2003 4:53 PM > To: Lucene Users List > Subject: Returning one result > > > I am indexing a group of items and one field , id, is unique. When > the > user clicks on a results I want just that one result to show. > > I index and search using SimpleAnalyzer. > > > Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer()); > > It should return only one result but returns 200. > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Ok thanks, but still I can't use the Simple analyzer since it won't even index that whole thing. I 'll give TermQuery a try. Thanks. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 6:18 PM To: Lucene Users List Subject: Re: Returning one result You really should use a TermQuery in this case anyway, rather than using QueryParser. You wouldn't have to worry about the analyzer at that point anyway (and I assume you're using Field.Keyword during indexing). Erik On Thursday, December 4, 2003, at 05:01 PM, Pleasant, Tracy wrote: > Ok I realized teh Simple Analyzer does not index numbers, so I switched > back to Standard. > > -Original Message- > From: Pleasant, Tracy > Sent: Thursday, December 04, 2003 4:53 PM > To: Lucene Users List > Subject: Returning one result > > > I am indexing a group of items and one field , id, is unique. When > the > user clicks on a results I want just that one result to show. > > I index and search using SimpleAnalyzer. > > > Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer()); > > It should return only one result but returns 200. > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Returning one result
Ok I realized teh Simple Analyzer does not index numbers, so I switched back to Standard. -Original Message- From: Pleasant, Tracy Sent: Thursday, December 04, 2003 4:53 PM To: Lucene Users List Subject: Returning one result I am indexing a group of items and one field , id, is unique. When the user clicks on a results I want just that one result to show. I index and search using SimpleAnalyzer. Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer()); It should return only one result but returns 200. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Returning one result
I am indexing a group of items and one field , id, is unique. When the user clicks on a results I want just that one result to show. I index and search using SimpleAnalyzer. Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer()); It should return only one result but returns 200. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileDocument.java
Can you give some info in the file type and how you are printing the results. Does the contents display correctly? -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2003 10:59 AM To: Lucene user list Subject: FileDocument.java Hi Lucene experts, Can you help on this? I have included the following code in FileDocument to print out the summary but I have funny output like: The result after searching, the summary is displayed as below: ÐÏࡱá>þÿ UWþÿÿÿTÿ ÿ FileInputStream is = new FileInputStream(f); try { Reader reader = new BufferedReader(new InputStreamReader(is)); char [] buf = new char[512]; reader.read(buf); String a = new String(buf, 0, 510); doc.add(Field.Text("contents", reader)); doc.add(Field.UnIndexed("summary", a ) );// return the document }catch (IOException e) { e.printStackTrace(); } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Eliminating duplicate result
You are searching for the same term and you are searching the same index twice, it will return the same results... I don't get what you are asking. -Original Message- From: Dragan Jotanovic [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 3:19 AM To: Lucene Users List Subject: Re: Eliminating duplicate result > When you are doing two searches are you searching for two different terms? > No, I am searching for the same term. What is the easyest way to eliminate duplicate documents if one is doing two searches on the same index? Have anybody done something similar? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question - not returning desired results
Erik, I think there may be a typo in the website. When I run the AnalyzerDemo : Analzying "xy&z corporation - [EMAIL PROTECTED]" org.apache.lucene.analysis.standard.StandardAnalyzer: [xy&z] [corporation] [EMAIL PROTECTED] Your website says: org.apache.lucene.analysis.standard.StandardAnalyzer: [xy&z] [corporation] [EMAIL PROTECTED] [com] When I run it it keeps the entire email '[EMAIL PROTECTED] but according to your website it separates the '[EMAIL PROTECTED]' from the 'com' Is there a difference between the versions of Lucene? I'm using 1.3rc2. Plus I think what I want is a StandardAnalyzer with a little tweaking. The simple one was fine until I realized that it doesn't do numbers, which I need as part of my search since numbers is important for what I'm doing. The Standard does numbers but I need it to be a little different of course. Thanks for the site. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: > > The documents I have index contain information regarding file names > also. > > For instance 'return_results.pl' or something like that may be in the > document fields. > > I am not understanding Lucene's way of searching: > > 1. If I search for 'return_results', the search does not return > anything > 2. If I search for 'results' or 'return', the search does not return > anything > 3. If I search for 'results.pl', the search does return the document > containg 'return_results.pl' > 4. If I search for 'results~', the search does return the document > containg 'return_results.pl' > 5. If I search for 'return_results~', the search does not return > anything > > What is going on? > > I want it to return the document in all of the situations. > > I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the "tokens" that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question - not returning desired results
It seems like what I should we using is something more like a SimpleAnalyzer or StopAnalyzer. I've changed my code and the query to use SimpleAnalyzer. But now i have another question. Let's say I have 'return_results.pl' in the document in one of the fields. When I search for return_res* or return_res~ it won't return the document. But searching for any of these does return the document: 1. 'return_results' 2. 'results' or 'return' 3. 'results.pl' 4. 'results~' 5. 'return_results~' I guess I have to read more about the '*' and '~'? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: > > The documents I have index contain information regarding file names > also. > > For instance 'return_results.pl' or something like that may be in the > document fields. > > I am not understanding Lucene's way of searching: > > 1. If I search for 'return_results', the search does not return > anything > 2. If I search for 'results' or 'return', the search does not return > anything > 3. If I search for 'results.pl', the search does return the document > containg 'return_results.pl' > 4. If I search for 'results~', the search does return the document > containg 'return_results.pl' > 5. If I search for 'return_results~', the search does not return > anything > > What is going on? > > I want it to return the document in all of the situations. > > I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the "tokens" that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question - not returning desired results
Thanks this helps a lot :) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: > > The documents I have index contain information regarding file names > also. > > For instance 'return_results.pl' or something like that may be in the > document fields. > > I am not understanding Lucene's way of searching: > > 1. If I search for 'return_results', the search does not return > anything > 2. If I search for 'results' or 'return', the search does not return > anything > 3. If I search for 'results.pl', the search does return the document > containg 'return_results.pl' > 4. If I search for 'results~', the search does return the document > containg 'return_results.pl' > 5. If I search for 'return_results~', the search does not return > anything > > What is going on? > > I want it to return the document in all of the situations. > > I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the "tokens" that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Hits Highlighting
I have seen that one, but it doesn't include the source code, only the jar with classes. I need something to actually highlight - like if you took a yellow marker and highlighted,not doing it in bold. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:29 PM To: Lucene Users List Subject: Re: Hits Highlighting Hi, The lucene home page has a lot of resources, including the FAQs, articles, javadocs and contributions. For instance, there's a query hilighter in the contributions page. On Tue, Nov 25, 2003 at 12:17:41PM -0500, Pleasant, Tracy wrote: > > Are there any hits highlighting functions? > > I have a simple one, but it gets complicated with searching multiple > words, having tokens, etc. > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question
Also searching 'red_*' returns nothing, also. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching "red*" will return documents with "read_car" On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: > > If I have words within a document like > > red_car > > If I search for 'red' would it return documents containing 'red_car'? > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question
How come if I search for 'red_car*' it returns nothing. I am using standard analyzer, too. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching "red*" will return documents with "read_car" On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: > > If I have words within a document like > > red_car > > If I search for 'red' would it return documents containing 'red_car'? > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hits Highlighting
Are there any hits highlighting functions? I have a simple one, but it gets complicated with searching multiple words, having tokens, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Question - not returning desired results
The documents I have index contain information regarding file names also. For instance 'return_results.pl' or something like that may be in the document fields. I am not understanding Lucene's way of searching: 1. If I search for 'return_results', the search does not return anything 2. If I search for 'results' or 'return', the search does not return anything 3. If I search for 'results.pl', the search does return the document containg 'return_results.pl' 4. If I search for 'results~', the search does return the document containg 'return_results.pl' 5. If I search for 'return_results~', the search does not return anything What is going on? I want it to return the document in all of the situations. I also don't want to have to use '~' all the time. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Question
If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching different types of words
If I search for "like" I would want the search to return documents containing "like", "liked", "likes", etc.. variations of the word. Is there a way to tell Lucene to do this? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
I vaguely remmeber I had a problem back when I used 0.6.2. I reverted back and used 0.6.1 instead. I haven't had any problems. -Original Message- From: Zhou, Oliver [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 11:03 AM To: 'Lucene Users List' Subject: RE: Lucene refresh index function (incremental indexing). I do have other problems with PDFBox-0.6.4. For one, it has annoying debug information at very low level parsing process. The other, I got infinite loop while indexing pdf files although they say the infinite loop bug has been fixed in their release notes. Anybody knows what's going on? Thanks, Oliver -Original Message- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 9:45 AM To: Lucene Users List Subject: RE: Lucene refresh index function (incremental indexing). Yes, just add the log4j configuration. The easiest way to do that is as a system parameter like this java -Dlog4j.configuration=log4j.xml org.apache.lucene.demo.IndexHTML -create -index c:\\index .. Where log4j.xml is the path to your log4j config, PDFBox has an example one you can use. Ben http://www.pdfbox.org On Tue, 25 Nov 2003, Zhou, Oliver wrote: > Lucene doesn't have pdf parser. In order to index pdf files you have to add > one by your self. PDFBox is a good choice. You may just ignore the warning > for log4j or you can add log4j in your classpath. > > Oliver > > > -Original Message- > From: Tun Lin [mailto:[EMAIL PROTECTED] > Sent: Monday, November 24, 2003 10:07 PM > To: 'Lucene Users List' > Subject: RE: Lucene refresh index function (incremental indexing). > > > Does it support indexing the contents of pdf files? I have found one project > called PDFBox that can be integrated with Lucene to search inside of the pdf > files. Currently, Lucene can only search for the pdf filename. I tried with > PDFBox and I got the following message when I typed the command: java > org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > > log4j:WARN No appenders could be found for logger > (org.pdfbox.pdfparser.PDFParse > r). > log4j:WARN Please initialize the log4j system properly. > > Can anyone advise? > > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 25, 2003 5:01 AM > To: Lucene Users List > Subject: Re: Lucene refresh index function (incremental indexing). > > Tun Lin wrote: > > These are the steps I took: > > > > 1) I compile all the files in a particular directory using the command: > > java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > > , putting all the indexed files in c:\\index. > > 2) Everytime, I added an additional file in that directory. I need to > > reindex/recompile that directory to generate the indexes again. As the > > directory gets larger, the indexing takes a longer time. > > > > My question is how do I generate the indexes automatically everytime a > > new document is added in that directory without me recompiling everytime > manually? > > To update, try removing the '-create' from the command line. The demo code > supports incremental updates. It will re-scan the directory and figure out > which files have changed, what new files have appeared and which previously > existing files have been removed. > > Doug > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Eliminating duplicate result
When you are doing two searches are you searching for two different terms? -Original Message- From: Dragan Jotanovic [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 11:14 AM To: Lucene Users List Subject: Eliminating duplicate result What is the easyest way to eliminate duplicate documents if one is doing two searches on the same index? Have anybody done something similar? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: XML support in Lucene
This may help you: http://www.jguru.com/faq/view.jsp?EID=1074235 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 11:07 AM To: [EMAIL PROTECTED] Subject: XML support in Lucene Hello group, does Lucene offer an effective and flexible way to treat XML files. I know that as soon as an InputStream is provided Lucene can basically index (evtl. after clearning) everything. How is it with XML files? If there is a way is it possbile to have one big XML file with many individual parts in it. This should be considered as docuemnts and the repeative XML tags as fields. Here an example: Tim How are you? Tom Linda bla bla bla Does somebody has already developed classes which go though this XML file, create TWO documents with the fields "From" and "Content" and fill in the text between the tags ? The Indexing business should then be the same since it is abstract against the Document object. The same for the search process. The search process however could be optimised with stuctural information (i.e. only search in "Content")... Cheers, Ralph -- NEU FUR ALLE - GMX MediaCenter - fur Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gru?, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse fur Mail, Message, More! +++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Tokenizing text custom way
Not exactly and answer to the question but I haven't yet used the Token classes/functionality that came with Lucene. Can someone give me an idea of how and why one may use this? -Original Message- From: Dragan Jotanovic [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 6:42 AM To: Lucene Users List Subject: Tokenizing text custom way Hi. I need to tokenize text while indexing but I don't want space to be delimiter. Delimiter should be my custom character (for example comma). I understand that I would probably need to implement my own analyzer, but could someone help me where to start. Is there any other way to do this without writing custom analyzer? This is what I want to achieve. If I have some text that will be indexed like following: man, people, time out, sun and if I enter 'time' as a search word, I don't want to get "time out" in results. I need exact keyword matching. I would achieve this if I tokenize "time out" as one token while idexing. Maybe someone had similar problem? If someone knows how to handle this, please help me. Dragan Jotanovic - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Score
Thanks for your input. I am using the standard analyzer for everything. I haven't created my own analyzer yet. The documents I am using: Plain text PDF Documents (I have two indexes) When I create my index: IndexWriter writer = new IndexWriter(index_name, new StandardAnalyzer(),true); When I search: Analyzer analyzer = new StandardAnalyzer(); query = MultiFieldQueryParser.parse(queryString, fields, analyzer); (where query String is the term to search and fields is the array of fields) When searching it does the one index then it does the other. When you say you use different analyzers for different fields in your index, how would you accomplish that? When I create the index it has a parameter for analyzer.. unless you create different indexes , how do you use two different ones? -Original Message- From: Gerret Apelt [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 3:25 PM To: Lucene Users List Subject: Re: Score Tracey -- it would help if you could give more detail on the types of documents, fields and analyzers you're using. Also what do you mean by "Multi Field Search"? I presume you're using the MultiFieldQueryParser to have query terms in a user-submitted query be searched for in each field in your index. If I am understanding your problem, then it might be the same one I had a few weeks ago -- highly relevant matches would not receive a high ranking. (This paragraph will apply to you only if you use more than just one Analyzer for the set of your fields). I had six fields in my index, most of which were populated with a standard analyzer. I used self-made Analyzers for two of the fields. This turned out to be my problem when using MultiFieldQueryParser: I told my MultiFieldQueryParser instance to use only the standard analyzer. Instead I discovered that I needed to make use of org.apache.lucene.analysis.PerFieldAnalyzerWrapper and feed that to the MultiFieldQueryParser. Unless you do this, your problem is whats described here: http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.in dexing&toc=faq#q15. Most likely, if your scoring is off, you're "doing something wrong" in the way you use the Lucene API -- at least, thats what I've discovered to be the case when my ranking is off. If you're interested in the nitty-gritty of how scoring is done, check this FAQ entry: http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.se arch&toc=faq#q31 cheers, Gerret Pleasant, Tracy wrote: >Hi, > >I'm using the Multi Field Search to search all the fields of my >documents during the search. > >When it returns results the scores are numerically low - .06, .17, etc. >I would think if I searched for "Dog" and there was a doc with "Dog" in >the title and several times in the contents of a document that it would >receive a score more like 1.0 or close to it. > >Is there a way that I can tweak the score? > >I tried using Boost but that did absolutely nothing. > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
I was able to get PDFBox to work with my JSP webpages. I think you will have to in a way write your own code to do the PDF files (while still calling the Lucene functions) doc = LucenePDFDocument.getDocument(file); -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 11:07 PM To: 'Lucene Users List' Subject: RE: Lucene refresh index function (incremental indexing). Does it support indexing the contents of pdf files? I have found one project called PDFBox that can be integrated with Lucene to search inside of the pdf files. Currently, Lucene can only search for the pdf filename. I tried with PDFBox and I got the following message when I typed the command: java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. Can anyone advise? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 5:01 AM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Tun Lin wrote: > These are the steps I took: > > 1) I compile all the files in a particular directory using the command: > java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > , putting all the indexed files in c:\\index. > 2) Everytime, I added an additional file in that directory. I need to > reindex/recompile that directory to generate the indexes again. As the > directory gets larger, the indexing takes a longer time. > > My question is how do I generate the indexes automatically everytime a > new document is added in that directory without me recompiling everytime manually? To update, try removing the '-create' from the command line. The demo code supports incremental updates. It will re-scan the directory and figure out which files have changed, what new files have appeared and which previously existing files have been removed. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Score
Hi, I'm using the Multi Field Search to search all the fields of my documents during the search. When it returns results the scores are numerically low - .06, .17, etc. I would think if I searched for "Dog" and there was a doc with "Dog" in the title and several times in the contents of a document that it would receive a score more like 1.0 or close to it. Is there a way that I can tweak the score? I tried using Boost but that did absolutely nothing. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]