Re: Tool for analyzing analyzers
Hi Erik, Thanks for your reply. Have you tried it on a collection yet? I'd love the get some of your feedback. I have limited knowledge of the underlying capabilities of the lucene library, which is a complement to you, since it was extremely easy to integrate lucene. But I'd like to get more out of lucene, such as incremental indexing, to name one. On the otherhand I'm interested in general requirements and wishes for the app. regards, Michael Franken Erik Hatcher wrote: On May 28, 2004, at 6:50 AM, Zilverline info wrote: But I'd love to build a Lucene demo application that is powerful enough to be used as a foundation for folks to use out-of-the-box. That's just what I thought. Here's one: http://www.zilverline.org Michael - zilverline is nicely done! I downloaded it and dropped it into Tomcat and it came right up. I did not actually configure a collection yet, but from the docs on the website it looks like you have built something quite nice. Maybe you could embed a built-in collection of the zilverline docs so something comes up right away and is searchable :) Nice work. I'll definitely stay tuned into your project. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: a list of matching search term
On Jun 1, 2004, at 9:19 PM, Anson Lau wrote: Further to my previous email: The highlighter package should be able to pick up the matching search terms. Can some experienced highlighter package users tell me if I should look down that line? Yes, Highlighter (available in the sandbox) picks out matching terms. If you used a custom Formatter with Highlighter, you could pick out matching terms and have a list of them. This would not be something you do for every hit, though, as it would take a little time to do for each document. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query Sombody HELP please
Hey Ype/Erick Thx in advance in helping me for the Range of Queries. Finally I was able to trace the wrong process within my code and closed them. I still have 3 small Questions. 1)While creating the Range Query Is it possible for Lucene to do somthing similar.. +(button AND shirt) +filename:[b10181_p100 TO b10181_p200] [Do you think this will work] It's not on returning hits , but it does return hits with either one of them Shirt or button Only. 2)When the indexer start indexing does it do according to alphabetic order or is it some other way... 3)The Field Type Keyword is not accepting name of Files as it indexes [ Try indexing filenames and then do a search on them ,the hits will return u 0 defnitly, lucene1.3-final version ] doc.add(Field.Text(filename,file.getName())) Will return Hits doc.add(Field.Keyword(filename,file.getName())) Will Not return Hits why??? with regards Karthik On Monday 31 May 2004 13:47, Karthik N S wrote: Hey Ype... 1) I switched Off the Multi search Senerio. 2) Changing the Field type from Text to Keyword will fail When I search for the the Field type filename so,I still maintained it to be Text Just make sure the file name is indexed as you show it, ie. the underscore should be in the indexed term. The best way to do that is to index the filename as keyword. Check the output of the analyzer, or use luke to see what is in the index for the filename field. D:\JAVA\lucene\src\demojava org.lucene.src.indexer.search.SearchFiles Search Keyword : b10181_p388 Source path [ E:/po/ ] : e:/indexer3/b10181 Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_ Found document(s) that matched : 'b10181_p388' no of hits :'1' in query Field :'filename' File Name : B10181_P388 3)On Search for range between 2 file names B10181_P702 to B01081_P355 still returns me 0 hits [Included space before the 2nd '+' ] D:\JAVA\lucene\src\demojava org.lucene.src.indexer.search.SearchFiles Search Keyword : +button +filename:[b10181_p702 TO b10181_p355] Could you try this: +button +filename:[b10181_p355 TO b10181_p702] ? If this does not work, please narrow your problem down to a java test program of 10-20 lines, and post the code. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: optimize() is not merging into single file? !!!!!!
I rechecked the results. Here they are: IndexWriter compiled with v.1.4-rc2 generates after optimization _36d.cfs3779 kb IndexWriter compiled with v.1.4-rc3 generates after optimization _36d.cfs 3778 kb _36c.cfs31 kb _35z.cfs14 kb _35o.cfs 14 kb . etc. I both cases segment file contains _36d.cfs Looks like new version just foget to clean up Iouli Golovatyi/X/GP/[EMAIL PROTECTED] 01.06.2004 17:22 Please respond to Lucene Users List To: [EMAIL PROTECTED] cc: Subject:optimeze() is not merging into single file? Category: I optimize and close the index after that, but don't get just one .cvs file as it promised in doc. Instead of it I see something like small segments and a couple of big. This weird behavor seems started since i changed from v 1.4-rc2 to 1.4-rc3. Before I got just one cvs segment . Any ideas? Thanks in advance J.
Re: Range Query Sombody HELP please
On Jun 2, 2004, at 6:20 AM, Karthik N S wrote: Hey Ype/Erick If you're gonna ask for help, the least ya could do is spell my name correctly :) I still have 3 small Questions. 1)While creating the Range Query Is it possible for Lucene to do somthing similar.. +(button AND shirt) +filename:[b10181_p100 TO b10181_p200] [Do you think this will work] It's not on returning hits , but it does return hits with either one of them Shirt or button Only. My guess is you have documents none of your documents in that range have button AND shirt in them. 2)When the indexer start indexing does it do according to alphabetic order or is it some other way... I don't understand the question, sorry. Terms in the index are ordered lexicographically, if that is what you mean. 3)The Field Type Keyword is not accepting name of Files as it indexes [ Try indexing filenames and then do a search on them ,the hits will return u 0 defnitly, lucene1.3-final version ] doc.add(Field.Text(filename,file.getName())) Will return Hits doc.add(Field.Keyword(filename,file.getName())) Will Not return Hits why??? Because of your analyzer. Try indexing as a Keyword and search using a TermQuery. Don't use QueryParser at first - it gets in the way of understanding what is really going on. For fun, look at the .toString of the Query generated by QueryParser if you like. Look at the AnalysisParalysis page on the wiki for more details. Read my java.net articles to get a better understanding. The short answer is that it is analysis that is bogging you down here. You need to decide how to index file names on how you plan on querying for them. We cannot answer this for you. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE : optimize() is not merging into single file? !!!!!!
Hello, I am running a two-week old version of Lucene from the CVS HEAD and seeing the same behavior.? Regards, RBP -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Envoy : mercredi 2 juin 2004 13:53 : Lucene Users List Objet : Re: optimize() is not merging into single file? !! I rechecked the results. Here they are: IndexWriter compiled with v.1.4-rc2 generates after optimization _36d.cfs3779 kb IndexWriter compiled with v.1.4-rc3 generates after optimization _36d.cfs 3778 kb _36c.cfs31 kb _35z.cfs14 kb _35o.cfs 14 kb . etc. I both cases segment file contains _36d.cfs Looks like new version just foget to clean up Iouli Golovatyi/X/GP/[EMAIL PROTECTED] 01.06.2004 17:22 Please respond to Lucene Users List To: [EMAIL PROTECTED] cc: Subject:optimeze() is not merging into single file? Category: I optimize and close the index after that, but don't get just one .cvs file as it promised in doc. Instead of it I see something like small segments and a couple of big. This weird behavor seems started since i changed from v 1.4-rc2 to 1.4-rc3. Before I got just one cvs segment . Any ideas? Thanks in advance J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Tool for analyzing analyzers
Zilverline [EMAIL PROTECTED] wrote: __ get more out of lucene, such as incremental indexing, to name one. On Hello, as far as I know, the incremental indexing could be a real bottleneck if you implemented your system without some knowledge about Lucene internals. The respective test is here: http://www.egothor.org/twiki/bin/view/Know/LuceneIssue Cheers, Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
indexing french text with Lucene
Hi all, Lucene is a very powerful tool for english document indexing. I really wonder if it's that powerful to index french text. In fact, I need to compute the similarity between 2 french texts. So, if somebody has already had the experience of indexing french text, your ideas and recommendation are mostly welcome. Thanks before hand. Uddam - Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger
Re: similarity of two texts
Erik, Could you expand on this just a wee bit, perhaps with an example of how to compute this vector angle? TIA, Terry - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, June 01, 2004 9:39 AM Subject: Re: similarity of two texts On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote: Hey Eric, Eri*K* :) What did you do to calc similarity? I computed the angle between two vectors. The vectors are obtained from IndexReader.getTermFreqVector(docId, field). I haven't had time, but was thinking of ways to add the ability to get the similarity score (as calculated when doing a search) given a term vector (or just a document id). It would be quite compute-intensive to do something like this. This could be done through a custom sort as well, if applying it at the scoring level doesn't work. I haven't given any thought to how this could work for scoring or sorting before, but does sound quite interesting. Any ideas on how to approach this would be appreciated. The scoring in Lucene has always been a bit confusing to me, despite looking at the code several times, especially once you get into boolean queries, etc. No doubt that it is confusing - to me also. But Explanation is your friend. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Can I prevent Sort fields from influencing score?
I have been using the new lucene 1.4 SortField implementation wih some custom fields added to old indexes so that the results can be sorted by them. My problem here is that some of the String fields that I add to the index come up in the search terms, so my results in sort by score order are different. Here's an example: I added the field AUTHOR_SORTABLE to most of the documents in the index. But if one of the AUTHOR_SORTABLE field in a document is set to andy, and i search for andy, this document gets a very different score than it used to. Since my added fields aren't set in stone, I'm interested in a general solution, where all fields containing the text SORTABLE in the name aren't considered for matches, only for sorting. Could I do this by overriding Similarity? I tried doing this to set the lengthNorm() for each of my sortable fields to 0, but it hasnt worked yet. Is there a different way to store the sortable fields that will prevent this? Any help would be greatly appreciated. - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: similarity of two texts
Terry Steichen wrote: Erik, Could you expand on this just a wee bit, perhaps with an example of how to compute this vector angle? I'm tempted to write the code to see how it works, but FYI this doc seems to nicely explain the concepts: http://www.la2600.org/talks/files/20040102/Vector_Space_Search_Engine_Theory.pdf TIA, Terry - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, June 01, 2004 9:39 AM Subject: Re: similarity of two texts On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote: Hey Eric, Eri*K* :) What did you do to calc similarity? I computed the angle between two vectors. The vectors are obtained from IndexReader.getTermFreqVector(docId, field). I haven't had time, but was thinking of ways to add the ability to get the similarity score (as calculated when doing a search) given a term vector (or just a document id). It would be quite compute-intensive to do something like this. This could be done through a custom sort as well, if applying it at the scoring level doesn't work. I haven't given any thought to how this could work for scoring or sorting before, but does sound quite interesting. Any ideas on how to approach this would be appreciated. The scoring in Lucene has always been a bit confusing to me, despite looking at the code several times, especially once you get into boolean queries, etc. No doubt that it is confusing - to me also. But Explanation is your friend. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: similarity of two texts
On Jun 2, 2004, at 1:39 PM, David Spencer wrote: Erik, Could you expand on this just a wee bit, perhaps with an example of how to compute this vector angle? I'm tempted to write the code to see how it works, but FYI this doc seems to nicely explain the concepts: http://www.la2600.org/talks/files/20040102/ Vector_Space_Search_Engine_Theory.pdf This is, in fact, one of the documents I referenced to get a grasp on how to do it. My code has some built-in assumptions on parts of the equation that get short-circuited (there is only 1 of each term in my case, for example) so it would not be a general-purpose algorithm. It's basically just using the TermFreqVector information and plugging it into an equation like found in that PDF - nothing more than that actually. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: similarity of two texts - another question
Hmm, the term vector does not have to consist of only term frequencies, does it? To give weight to rare terms, could you create a term vector of (TF*IDF) values for each term? Then, a distance function would measure how many terms two vectors have in common, giving weight to how many rare terms two vectors have in common. David Spencer [EMAIL PROTECTED] 06/01/04 08:25PM Erik Hatcher wrote: On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote: Well, a question again, how does Lucene compute the score between a document and a query? And I might add, thus, this approach to similarity gives more weight to rare terms that match, which one might want for this kind of similarity measure. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: similarity of two texts - another question
Gerard Sychay wrote: Hmm, the term vector does not have to consist of only term frequencies, does it? To give weight to rare terms, could you create a term vector of (TF*IDF) values for each term? Then, a distance function would measure how many terms two vectors have in common, giving weight to how many rare terms two vectors have in common. Yeah, but if you're gonna do that why not just form a query with all words in the source document, and let the Lucene engine do the idf/tf calculations? I've done this and it seems to work fine. Here's code I've used. It could be done better by avoiding QueryParser, and odds are it could hit that exception for too many clauses in a boolean expression unless you configure lucene from its default, but this is the idea. srch is the entire body of the source document. public static Query formSimilarQuery( String srch, Analyzer a) throws org.apache.lucene.queryParser.ParseException, IOException { StringBuffer sb = new StringBuffer(); TokenStream ts = a.tokenStream( foo, new StringReader( srch)); org.apache.lucene.analysis.Token t; while ( (t = ts.next()) != null) { sb.append( t.termText() + ); } return QueryParser.parse( sb.toString(),DFields.CONTENTS, a); } David Spencer [EMAIL PROTECTED] 06/01/04 08:25PM Erik Hatcher wrote: On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote: Well, a question again, how does Lucene compute the score between a document and a query? And I might add, thus, this approach to similarity gives more weight to rare terms that match, which one might want for this kind of similarity measure. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range Query Sombody HELP please
On Wednesday 02 June 2004 14:46, Erik Hatcher wrote: On Jun 2, 2004, at 6:20 AM, Karthik N S wrote: ... I still have 3 small Questions. 1)While creating the Range Query Is it possible for Lucene to do somthing similar.. +(button AND shirt) +filename:[b10181_p100 TO b10181_p200] [Do you think this will work] It's not on returning hits , but it does return hits with either one of them Shirt or button Only. My guess is you have documents none of your documents in that range have button AND shirt in them. You can also try this: +button +shirt +filename:[b10181_p100 TO b10181_p200] I never got to completely understand the way the query parser deals with AND and OR, so I prefer to avoid them. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Can I prevent Sort fields from influencing score?
This seems like it would be determined by how you generate your query - if your query doesn't search in the sorted fields, they shouldn't affect the scoring of your documents ... -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 12:22 PM To: [EMAIL PROTECTED] Subject: Can I prevent Sort fields from influencing score? I have been using the new lucene 1.4 SortField implementation wih some custom fields added to old indexes so that the results can be sorted by them. My problem here is that some of the String fields that I add to the index come up in the search terms, so my results in sort by score order are different. Here's an example: I added the field AUTHOR_SORTABLE to most of the documents in the index. But if one of the AUTHOR_SORTABLE field in a document is set to andy, and i search for andy, this document gets a very different score than it used to. Since my added fields aren't set in stone, I'm interested in a general solution, where all fields containing the text SORTABLE in the name aren't considered for matches, only for sorting. Could I do this by overriding Similarity? I tried doing this to set the lengthNorm() for each of my sortable fields to 0, but it hasnt worked yet. Is there a different way to store the sortable fields that will prevent this? Any help would be greatly appreciated. - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Can I prevent Sort fields from influencing score?
thanks that was my problem, i had code extending the search out to all the fields, now it only extends the search out to the fields i'm interested in. - andy g On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote: This seems like it would be determined by how you generate your query - if your query doesn't search in the sorted fields, they shouldn't affect the scoring of your documents ... -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 12:22 PM To: [EMAIL PROTECTED] Subject: Can I prevent Sort fields from influencing score? I have been using the new lucene 1.4 SortField implementation wih some custom fields added to old indexes so that the results can be sorted by them. My problem here is that some of the String fields that I add to the index come up in the search terms, so my results in sort by score order are different. Here's an example: I added the field AUTHOR_SORTABLE to most of the documents in the index. But if one of the AUTHOR_SORTABLE field in a document is set to andy, and i search for andy, this document gets a very different score than it used to. Since my added fields aren't set in stone, I'm interested in a general solution, where all fields containing the text SORTABLE in the name aren't considered for matches, only for sorting. Could I do this by overriding Similarity? I tried doing this to set the lengthNorm() for each of my sortable fields to 0, but it hasnt worked yet. Is there a different way to store the sortable fields that will prevent this? Any help would be greatly appreciated. - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Can I prevent Sort fields from influencing score?
Just curious, Are you building your query or using a particular Query Parser? which one? Are you using MultiFieldQueryParser? I had problems with MFQP before and was looking for other solutions besides dumping fields into a massive content field. TIA, -Gus -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 1:30 PM To: Lucene Users List Subject: Re: Can I prevent Sort fields from influencing score? thanks that was my problem, i had code extending the search out to all the fields, now it only extends the search out to the fields i'm interested in. - andy g On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote: This seems like it would be determined by how you generate your query - if your query doesn't search in the sorted fields, they shouldn't affect the scoring of your documents ... -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 12:22 PM To: [EMAIL PROTECTED] Subject: Can I prevent Sort fields from influencing score? I have been using the new lucene 1.4 SortField implementation wih some custom fields added to old indexes so that the results can be sorted by them. My problem here is that some of the String fields that I add to the index come up in the search terms, so my results in sort by score order are different. Here's an example: I added the field AUTHOR_SORTABLE to most of the documents in the index. But if one of the AUTHOR_SORTABLE field in a document is set to andy, and i search for andy, this document gets a very different score than it used to. Since my added fields aren't set in stone, I'm interested in a general solution, where all fields containing the text SORTABLE in the name aren't considered for matches, only for sorting. Could I do this by overriding Similarity? I tried doing this to set the lengthNorm() for each of my sortable fields to 0, but it hasnt worked yet. Is there a different way to store the sortable fields that will prevent this? Any help would be greatly appreciated. - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
help needed in starting lucene
Hi, I am just a beginner. I installed lucene according to the intsructions provided. I did all the changed to the environment variables when i try to run the test program for building indexes using the following command: java org.apache.lucene.demo.IndexFiles test/Doc I am getting the following exception Exception in thread main class java.lang.ExceptionInInitializerError: java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5: Class not found. Yahoo! India Matrimony: Find your partner online.
RE: help needed in starting lucene
It sounds to me like you need a newer version of Java. -Original Message- From: milind honrao [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 5:36 PM To: [EMAIL PROTECTED] Subject: help needed in starting lucene Hi, I am just a beginner. I installed lucene according to the intsructions provided. I did all the changed to the environment variables when i try to run the test program for building indexes using the following command: java org.apache.lucene.demo.IndexFiles test/Doc I am getting the following exception Exception in thread main class java.lang.ExceptionInInitializerError: java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5: Class not found. Yahoo! India Matrimony: Find your partner online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Can I prevent Sort fields from influencing score?
I build the query myself, its really easy, I just use the normal query parser with IndexReader.getFieldNames(true) and loop through all of them to search everything at once. You can either make a really big BooleanQuery or make a bunch of small queries and merge the results, depending on what kind of results you are looking for. It's probably not as fast as the one big data field method, but speed is not an issue yet for anything i've done, whereas code maintenance is a pain, witness my question that started this thread. - andy g On Wed, 2 Jun 2004 13:43:41 -0700 , Gus Kormeier [EMAIL PROTECTED] wrote: Just curious, Are you building your query or using a particular Query Parser? which one? Are you using MultiFieldQueryParser? I had problems with MFQP before and was looking for other solutions besides dumping fields into a massive content field. TIA, -Gus -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 1:30 PM To: Lucene Users List Subject: Re: Can I prevent Sort fields from influencing score? thanks that was my problem, i had code extending the search out to all the fields, now it only extends the search out to the fields i'm interested in. - andy g On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote: This seems like it would be determined by how you generate your query - if your query doesn't search in the sorted fields, they shouldn't affect the scoring of your documents ... -Original Message- From: Andy Goodell [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 12:22 PM To: [EMAIL PROTECTED] Subject: Can I prevent Sort fields from influencing score? I have been using the new lucene 1.4 SortField implementation wih some custom fields added to old indexes so that the results can be sorted by them. My problem here is that some of the String fields that I add to the index come up in the search terms, so my results in sort by score order are different. Here's an example: I added the field AUTHOR_SORTABLE to most of the documents in the index. But if one of the AUTHOR_SORTABLE field in a document is set to andy, and i search for andy, this document gets a very different score than it used to. Since my added fields aren't set in stone, I'm interested in a general solution, where all fields containing the text SORTABLE in the name aren't considered for matches, only for sorting. Could I do this by overriding Similarity? I tried doing this to set the lengthNorm() for each of my sortable fields to 0, but it hasnt worked yet. Is there a different way to store the sortable fields that will prevent this? Any help would be greatly appreciated. - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Marten Senkel/IS/EUROPE/SIALEUROPE is out of the office.
I will be out of the office starting 2004-06-02 and will not return until 2004-06-04. Please contact Nicolas Guala-Molino for any request. Thanks!
building custom-stemmer
Hi, I have a fairly decent idea of using Lucene. I need to use it with some non-European, Indian and CJK languages. There are some languages among these that do not currently have a stemmer (I've looked in Snowball). I was wondering how I could write my own stemmer, say for e.g. for Hindi. Regards, Anil
RE: a list of matching search term
Thanks Erik I'll give that a try. Anson -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 7:28 PM To: Lucene Users List Subject: Re: a list of matching search term On Jun 1, 2004, at 9:19 PM, Anson Lau wrote: Further to my previous email: The highlighter package should be able to pick up the matching search terms. Can some experienced highlighter package users tell me if I should look down that line? Yes, Highlighter (available in the sandbox) picks out matching terms. If you used a custom Formatter with Highlighter, you could pick out matching terms and have a list of them. This would not be something you do for every hit, though, as it would take a little time to do for each document. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: help needed in starting lucene
Hey I think u have a file path problem in there try giving the full path java org.apache.lucene.demo.IndexFiles e:/lucene/../test/Doc Also set classpath for lucene1.3-final.jar or lucene-1.4-rc2.jar before start indexing with regards Karthik -Original Message- From: milind honrao [mailto:[EMAIL PROTECTED] Sent: Thursday, June 03, 2004 3:06 AM To: [EMAIL PROTECTED] Subject: help needed in starting lucene Hi, I am just a beginner. I installed lucene according to the intsructions provided. I did all the changed to the environment variables when i try to run the test program for building indexes using the following command: java org.apache.lucene.demo.IndexFiles test/Doc I am getting the following exception Exception in thread main class java.lang.ExceptionInInitializerError: java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5: Class not found. Yahoo! India Matrimony: Find your partner online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
problems with lucene in multithreaded environment
We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search. What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances. Please note that we have created a single object for the searcher (IndexSearcher) and all queries are passed to this searcher only. We are using a P4 dual processor machine with 6 gb of ram. We need results at the rate of about 60 queries/second at peak load. Is there a way to optimize lucene to get this performance from this machine? What other ways can i optimize lucene for this output? Regards Jayant Yahoo! India Matrimony: Find your partner online. http://yahoo.shaadi.com/india-matrimony/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Jayant Kumar wrote: We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search. That sounds slow, unless your queries are very complex. What are your queries like? What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances. Multiple queries executed from different threads using a single searcher should not queue, but should run in parallel. A technique to find out where threads are queueing is to get a thread dump and see where all of the threads are stuck. In Solaris and Linux, sending the JVM a SIGQUIT will give a thread dump. On Windows, use Control-Break. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query Sombody HELP please
Hey Ype the Query of range +button +shirt +filename:[b10181_p100 TO b10181_p200] did not work for me but on other way around +(button OR shirt) +filename:[b10181_p100 TO b10181_p200] resulted to me in 2 hits with either one term button / shirt in each page,but not both of them I found from the Html file that both words are present in more then 2 files, Are there any other possibilities for getting both words. with regards Karthik -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Thursday, June 03, 2004 12:26 AM To: [EMAIL PROTECTED] Subject: Re: Range Query Sombody HELP please On Wednesday 02 June 2004 14:46, Erik Hatcher wrote: On Jun 2, 2004, at 6:20 AM, Karthik N S wrote: ... I still have 3 small Questions. 1)While creating the Range Query Is it possible for Lucene to do somthing similar.. +(button AND shirt) +filename:[b10181_p100 TO b10181_p200] [Do you think this will work] It's not on returning hits , but it does return hits with either one of them Shirt or button Only. My guess is you have documents none of your documents in that range have button AND shirt in them. You can also try this: +button +shirt +filename:[b10181_p100 TO b10181_p200] I never got to completely understand the way the query parser deals with AND and OR, so I prefer to avoid them. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]