What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
See the migration guide: "If you previously used Document.setBoost, you must now pre-multiply the document boost into each Field.setBoost. If you have a multi-valued field, you should do this only for the first Field instance (ie, subsequent Field instance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: > What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
AUTO: Gili Nachum is out of the office (returning 20/02/2013)
I am out of the office until 20/02/2013. For Search/CCM - Noga Tor For AS-Search/Social People Typeahead - Sharon Krisher Or my manager Eitan Shapiro. Note: This is an automated response to your message "What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?" sent on 18/02/2013 14:17:03. This is the only notification you will receive while this person is away. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
On 18/02/2013 13:41, Ian Lea wrote: See the migration guide: "If you previously used Document.setBoost, you must now pre-multiply the document boost into each Field.setBoost. If you have a multi-valued field, you should do this only for the first Field instance (ie, subsequent Field instance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Thanks, so its more difficult now sounds like a regression to me. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
It is not a regression, as per-Document boost were never working correctly. If you want to boost documents in a consistent way (and make their scores in search results really using that factor), you should index a DocValues field and use that in a CustomScoreQuery to boost the results with that docvalues field. In Lucene 4.0 (together with other changes) we dropped the "old-style", confusing, and incorrect feature. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Monday, February 18, 2013 4:54 PM > To: Ian Lea > Cc: java-user@lucene.apache.org > Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 > inLucene 4.1 ? > > On 18/02/2013 13:41, Ian Lea wrote: > > See the migration guide: > > > > "If you previously used Document.setBoost, you must now pre-multiply > > the document boost into each Field.setBoost. If you have a > > multi-valued field, you should do this only for the first Field > > instance (ie, subsequent Field instance sharing the same field name > > should only include their per-field boost and not the document level > > boost) as the boost for multi-valued field instances are multiplied > > together by Lucene." > > > > > > -- > > Ian. > > > > > > On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor > wrote: > >> What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 > ? > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > Thanks, so its more difficult now sounds like a regression to me. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
On 18/02/2013 16:04, Uwe Schindler wrote: It is not a regression, as per-Document boost were never working correctly. If you want to boost documents in a consistent way (and make their scores in search results really using that factor), you should index a DocValues field and use that in a CustomScoreQuery to boost the results with that docvalues field. In Lucene 4.0 (together with other changes) we dropped the "old-style", confusing, and incorrect feature. Well per-Document boost seemed to worked for me with my tests. This new method your propose sounds more complex and is different to what the migration guide says so I don't see that as an improvement. Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, February 18, 2013 4:54 PM To: Ian Lea Cc: java-user@lucene.apache.org Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? On 18/02/2013 13:41, Ian Lea wrote: See the migration guide: "If you previously used Document.setBoost, you must now pre-multiply the document boost into each Field.setBoost. If you have a multi-valued field, you should do this only for the first Field instance (ie, subsequent Field instance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Thanks, so its more difficult now sounds like a regression to me. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Grouping and tokens
Please clarify exactly what you want to group by - give a specific example that makes it clear what terms should affect grouping and which shouldn't. -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 6:12 AM To: java-user@lucene.apache.org Subject: Grouping and tokens Hello all, From the grouping javadoc, I read that fields that are supposed to be grouped should not be tokenized. I have an use case where the user has the freedom to group by any field during search time. Now that only tokenized fields are eligible for grouping, this is creating an issue with my search. Say for instance the book "*Fifty shades of grey*" when tokenized and searched for "*shades*" turns up in the result. However this is not the case when I have it as a non-tokenized field (using StandardAnalyzer-Version4.1). How do I go about this? Is indexing a tokenized and non-tokenized version of the same field the only go? I am afraid its way too costly! Thanks in advance for your valuable inputs. -- With Thanks and Regards, Ramprakash Ramamoorthy, India, +91 9626975420 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
The problem is: Lucene has never supported *real* per-document boosts. Those boosts were always per-field. As we only work per-field, it depends on the query how your results score. If you have a TermQuery, the per-field boost is used (the one from the field queried), but e.g. if you have another query (like MultiTermQuery) the boost is ignored completely. As it is always per-field, the results of this per-document boosting differ depending on the number of terms in your query, so it is not easy to make it consistent. To boost a document in Lucene 1.x, 2.x, 3.x, and also 4.x, you have to use a function query with a per-document value that you have indexed as a separate (ideally as docvalues) field. In previous Lucene versions, FieldCache was the way to go. This code is a simple wrapper around your query with CustomScoreQuery and a ValueSource referring to the DocValues field, 5 lines of code -> and it will return consistent results! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Monday, February 18, 2013 5:08 PM > To: Uwe Schindler > Cc: java-user@lucene.apache.org > Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 > inLucene 4.1 ? > > On 18/02/2013 16:04, Uwe Schindler wrote: > > It is not a regression, as per-Document boost were never working correctly. > If you want to boost documents in a consistent way (and make their scores in > search results really using that factor), you should index a DocValues field > and use that in a CustomScoreQuery to boost the results with that docvalues > field. In Lucene 4.0 (together with other changes) we dropped the "old- > style", confusing, and incorrect feature. > Well per-Document boost seemed to worked for me with my tests. This new > method your propose sounds more complex and is different to what the > migration guide says so I don't see that as an improvement. > > Paul > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -Original Message- > >> From: Paul Taylor [mailto:paul_t...@fastmail.fm] > >> Sent: Monday, February 18, 2013 4:54 PM > >> To: Ian Lea > >> Cc: java-user@lucene.apache.org > >> Subject: Re: What is equivalent to Document.setBoost() from Lucene > >> 3.6 inLucene 4.1 ? > >> > >> On 18/02/2013 13:41, Ian Lea wrote: > >>> See the migration guide: > >>> > >>> "If you previously used Document.setBoost, you must now pre-multiply > >>> the document boost into each Field.setBoost. If you have a > >>> multi-valued field, you should do this only for the first Field > >>> instance (ie, subsequent Field instance sharing the same field name > >>> should only include their per-field boost and not the document level > >>> boost) as the boost for multi-valued field instances are multiplied > >>> together by Lucene." > >>> > >>> > >>> -- > >>> Ian. > >>> > >>> > >>> On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor > >>> > >> wrote: > What is equivalent to Document.setBoost() from Lucene 3.6 inLucene > 4.1 > >> ? > --- > -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> Thanks, so its more difficult now sounds like a regression to me. > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
On 18/02/2013 16:26, Uwe Schindler wrote: The problem is: Lucene has never supported *real* per-document boosts. Those boosts were always per-field. As we only work per-field, it depends on the query how your results score. If you have a TermQuery, the per-field boost is used (the one from the field queried), but e.g. if you have another query (like MultiTermQuery) the boost is ignored completely. As it is always per-field, the results of this per-document boosting differ depending on the number of terms in your query, so it is not easy to make it consistent. To boost a document in Lucene 1.x, 2.x, 3.x, and also 4.x, you have to use a function query with a per-document value that you have indexed as a separate (ideally as docvalues) field. In previous Lucene versions, FieldCache was the way to go. This code is a simple wrapper around your query with CustomScoreQuery and a ValueSource referring to the DocValues field, 5 lines of code -> and it will return consistent results! Uwe Thanks bit clearer now, but 5 line example would be nice And if this is the way to do things isnt the migration doc incorrect Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, February 18, 2013 5:08 PM To: Uwe Schindler Cc: java-user@lucene.apache.org Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? On 18/02/2013 16:04, Uwe Schindler wrote: It is not a regression, as per-Document boost were never working correctly. If you want to boost documents in a consistent way (and make their scores in search results really using that factor), you should index a DocValues field and use that in a CustomScoreQuery to boost the results with that docvalues field. In Lucene 4.0 (together with other changes) we dropped the "old- style", confusing, and incorrect feature. Well per-Document boost seemed to worked for me with my tests. This new method your propose sounds more complex and is different to what the migration guide says so I don't see that as an improvement. Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, February 18, 2013 4:54 PM To: Ian Lea Cc: java-user@lucene.apache.org Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? On 18/02/2013 13:41, Ian Lea wrote: See the migration guide: "If you previously used Document.setBoost, you must now pre-multiply the document boost into each Field.setBoost. If you have a multi-valued field, you should do this only for the first Field instance (ie, subsequent Field instance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Thanks, so its more difficult now sounds like a regression to me. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
IndexSearcher.close() removed in 4.0
I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close() is no-op operation but not very clear on why it is a no-op? Could someone shed some light on this? We were using this method in the older versions and is it safe now to remove this call. Just want to understand the consequences before we make any changes? Is there any alternative that we need to use here? Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
CJK evaluation. Standardanalyzer and Querytime.
Hello community, i am doing an evaluation in the context of CJK. I compare some indexing strategies like "unigram", "bigram", "unigram + bigram" and "word based" indexing. 1. I used the Standardanalyzer for "unigram". I think it works for chinese but it is doing some other staff for Japanese and Korean. In Japanese some characters get combined and for Korean it works like a WhiteSpaceAnalyzer, right? Which Analyzer would you prefer for "unigrams" in Japanese and Korean? Is there any flag in the CJKAnalyzer to output "unigrams" only? 2. I used the CJKAnalyzer for "bigrams" and "unigrams + bigrams". I think it works correct, but i have some performance issues. The Querytime for "unigram + bigram" is about 8-20 times higher than "bigram" only. Any ideas? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/CJK-evaluation-Standardanalyzer-and-Querytime-tp4041190.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher.close() removed in 4.0
On Mon, Feb 18, 2013 at 7:32 PM, saisantoshi wrote: > I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close() > is no-op operation but not very clear on why it is a no-op? Could someone > shed some light on this? We were using this method in the older versions and > is it safe now to remove this call. Just want to understand the consequences > before we make any changes? Is there any alternative that we need to use > here? Hey, previous version had a constructor that accepted a directory [1] if you used this constructor IndexSearcher#close did also close the index reader that was created. Since we removed this constructor we also removed close since it's a no-op. IndexSearcher is just a wrapper to add some functionality on top of the reader. You can ignore the IS#close() if you closing the IndexReader properly. simon [1] http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.store.Directory) > > Thanks, > Sai > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Need Help:How to Get the enumeration of Terms Ending with a given word
On Thu, Feb 14, 2013 at 11:42 AM, VIGNESH S wrote: > Hi, > > I have two questions > > 1.How to Get the enumeration of Terms Ending with a given word > I saw we can get enumerations of word starting at a given word by > Indexreader.terms(term())) method unless you want to iterate all terms and check each if it ends with a given string I think you need to index the terms in reversed order and get a prefix terms enum from that field. Look at [1] [1] http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html > > 2.Actually I am doing an multiphrase query.In that I do do a suffix > query on first word..How can i do please kindly help.. > > > -- > Thanks and Regards > Vignesh Srinivasan > 9739135640 > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher.close() removed in 4.0
Hi, Why not having the IS#close() calling the wrapped IR#close() ? I would be happier having to only deal with the Searcher once created and forget it wraps a Reader: I create a Searcher, I close it. Thx, Eric On 18/02/2013 22:20, Simon Willnauer wrote: On Mon, Feb 18, 2013 at 7:32 PM, saisantoshi wrote: I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close() is no-op operation but not very clear on why it is a no-op? Could someone shed some light on this? We were using this method in the older versions and is it safe now to remove this call. Just want to understand the consequences before we make any changes? Is there any alternative that we need to use here? Hey, previous version had a constructor that accepted a directory [1] if you used this constructor IndexSearcher#close did also close the index reader that was created. Since we removed this constructor we also removed close since it's a no-op. IndexSearcher is just a wrapper to add some functionality on top of the reader. You can ignore the IS#close() if you closing the IndexReader properly. simon [1] http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.store.Directory) Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Grouping and tokens
On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky wrote: > Please clarify exactly what you want to group by - give a specific example > that makes it clear what terms should affect grouping and which shouldn't. > Assume I am indexing a library data. Say there are the following fields for a particular book. 1. Published 2. Language 3. Genre 4. Author 5. Title 6. ISBN While search time, the user can ask to group by any of the above fields, which means all of them are not supposed to be tokenized. So as I had told earlier, there is a book titled "Fifty shades of gray" and the user searches for "shades". The result turns up in case the field is tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear? In a nutshell, how do I use a groupby on a field that is also tokenized? > > -- Jack Krupansky > > -Original Message- From: Ramprakash Ramamoorthy > Sent: Monday, February 18, 2013 6:12 AM > To: java-user@lucene.apache.org > Subject: Grouping and tokens > > > Hello all, > > From the grouping javadoc, I read that fields that are supposed to be > grouped should not be tokenized. I have an use case where the user has the > freedom to group by any field during search time. > > Now that only tokenized fields are eligible for grouping, this is > creating an issue with my search. Say for instance the book "*Fifty shades > of grey*" when tokenized and searched for "*shades*" turns up in the > > result. However this is not the case when I have it as a non-tokenized > field (using StandardAnalyzer-Version4.1). > > How do I go about this? Is indexing a tokenized and non-tokenized > version of the same field the only go? I am afraid its way too costly! > Thanks in advance for your valuable inputs. > > -- > With Thanks and Regards, > Ramprakash Ramamoorthy, > India, > +91 9626975420 > > --**--**- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org > For additional commands, e-mail: > java-user-help@lucene.apache.**org > > -- With Thanks and Regards, Ramprakash Ramamoorthy, India. +91 9626975420
Re: Grouping and tokens
Okay, so, fields that would normally need to be tokenized must be stored as both raw strings for grouping and tokenized text for keyword search. Simply use copyField to copy from one to the other. -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 11:13 PM To: java-user@lucene.apache.org Subject: Re: Grouping and tokens On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky wrote: Please clarify exactly what you want to group by - give a specific example that makes it clear what terms should affect grouping and which shouldn't. Assume I am indexing a library data. Say there are the following fields for a particular book. 1. Published 2. Language 3. Genre 4. Author 5. Title 6. ISBN While search time, the user can ask to group by any of the above fields, which means all of them are not supposed to be tokenized. So as I had told earlier, there is a book titled "Fifty shades of gray" and the user searches for "shades". The result turns up in case the field is tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear? In a nutshell, how do I use a groupby on a field that is also tokenized? -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 6:12 AM To: java-user@lucene.apache.org Subject: Grouping and tokens Hello all, From the grouping javadoc, I read that fields that are supposed to be grouped should not be tokenized. I have an use case where the user has the freedom to group by any field during search time. Now that only tokenized fields are eligible for grouping, this is creating an issue with my search. Say for instance the book "*Fifty shades of grey*" when tokenized and searched for "*shades*" turns up in the result. However this is not the case when I have it as a non-tokenized field (using StandardAnalyzer-Version4.1). How do I go about this? Is indexing a tokenized and non-tokenized version of the same field the only go? I am afraid its way too costly! Thanks in advance for your valuable inputs. -- With Thanks and Regards, Ramprakash Ramamoorthy, India, +91 9626975420 --**--**- To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org For additional commands, e-mail: java-user-help@lucene.apache.**org -- With Thanks and Regards, Ramprakash Ramamoorthy, India. +91 9626975420 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Grouping and tokens
Oops, sorry for the "Solr" answer. In Lucene you need to simply index the same value, once as a raw string and a second time as a tokenized text field. Grouping would use the raw string version of the data. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, February 18, 2013 11:21 PM To: java-user@lucene.apache.org Subject: Re: Grouping and tokens Okay, so, fields that would normally need to be tokenized must be stored as both raw strings for grouping and tokenized text for keyword search. Simply use copyField to copy from one to the other. -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 11:13 PM To: java-user@lucene.apache.org Subject: Re: Grouping and tokens On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky wrote: Please clarify exactly what you want to group by - give a specific example that makes it clear what terms should affect grouping and which shouldn't. Assume I am indexing a library data. Say there are the following fields for a particular book. 1. Published 2. Language 3. Genre 4. Author 5. Title 6. ISBN While search time, the user can ask to group by any of the above fields, which means all of them are not supposed to be tokenized. So as I had told earlier, there is a book titled "Fifty shades of gray" and the user searches for "shades". The result turns up in case the field is tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear? In a nutshell, how do I use a groupby on a field that is also tokenized? -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 6:12 AM To: java-user@lucene.apache.org Subject: Grouping and tokens Hello all, From the grouping javadoc, I read that fields that are supposed to be grouped should not be tokenized. I have an use case where the user has the freedom to group by any field during search time. Now that only tokenized fields are eligible for grouping, this is creating an issue with my search. Say for instance the book "*Fifty shades of grey*" when tokenized and searched for "*shades*" turns up in the result. However this is not the case when I have it as a non-tokenized field (using StandardAnalyzer-Version4.1). How do I go about this? Is indexing a tokenized and non-tokenized version of the same field the only go? I am afraid its way too costly! Thanks in advance for your valuable inputs. -- With Thanks and Regards, Ramprakash Ramamoorthy, India, +91 9626975420 --**--**- To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org For additional commands, e-mail: java-user-help@lucene.apache.**org -- With Thanks and Regards, Ramprakash Ramamoorthy, India. +91 9626975420 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org