date:20130218

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor


What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Ian Lea

See the migration guide:

"If you previously used Document.setBoost, you must now pre-multiply
the document boost into each Field.setBoost. If you have a
multi-valued field, you should do this only for the first Field
instance (ie, subsequent Field instance sharing the same field name
should only include their per-field boost and not the document level
boost) as the boost for multi-valued field instances are multiplied
together by Lucene."

--
Ian.

On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor  wrote:
> What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

AUTO: Gili Nachum is out of the office (returning 20/02/2013)

2013-02-18 Thread Gili Nachum


I am out of the office until 20/02/2013.

For Search/CCM - Noga Tor
For AS-Search/Social People Typeahead - Sharon Krisher
Or my manager Eitan Shapiro.


Note: This is an automated response to your message  "What is equivalent to
Document.setBoost() from Lucene 3.6 inLucene 4.1 ?" sent on 18/02/2013
14:17:03.

This is the only notification you will receive while this person is away.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor


On 18/02/2013 13:41, Ian Lea wrote:

See the migration guide:

"If you previously used Document.setBoost, you must now pre-multiply
the document boost into each Field.setBoost. If you have a
multi-valued field, you should do this only for the first Field
instance (ie, subsequent Field instance sharing the same field name
should only include their per-field boost and not the document level
boost) as the boost for multi-valued field instances are multiplied
together by Lucene."


--
Ian.


On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor  wrote:

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Thanks, so its more difficult now sounds like a regression to me.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Uwe Schindler

It is not a regression, as per-Document boost were never working correctly. If 
you want to boost documents in a consistent way (and make their scores in 
search results really using that factor), you should index a DocValues field 
and use that in a CustomScoreQuery to boost the results with that docvalues 
field. In Lucene 4.0 (together with other changes) we dropped the "old-style", 
confusing, and incorrect feature.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Monday, February 18, 2013 4:54 PM
> To: Ian Lea
> Cc: java-user@lucene.apache.org
> Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6
> inLucene 4.1 ?
> 
> On 18/02/2013 13:41, Ian Lea wrote:
> > See the migration guide:
> >
> > "If you previously used Document.setBoost, you must now pre-multiply
> > the document boost into each Field.setBoost. If you have a
> > multi-valued field, you should do this only for the first Field
> > instance (ie, subsequent Field instance sharing the same field name
> > should only include their per-field boost and not the document level
> > boost) as the boost for multi-valued field instances are multiplied
> > together by Lucene."
> >
> >
> > --
> > Ian.
> >
> >
> > On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor 
> wrote:
> >> What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1
> ?
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> Thanks, so its more difficult now sounds like a regression to me.
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor


On 18/02/2013 16:04, Uwe Schindler wrote:

It is not a regression, as per-Document boost were never working correctly. If you want 
to boost documents in a consistent way (and make their scores in search results really 
using that factor), you should index a DocValues field and use that in a CustomScoreQuery 
to boost the results with that docvalues field. In Lucene 4.0 (together with other 
changes) we dropped the "old-style", confusing, and incorrect feature.
Well per-Document boost seemed to worked for me with my tests. This new 
method your propose sounds more complex and is different to what the 
migration guide says so I don't see that as an improvement.


Paul

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Monday, February 18, 2013 4:54 PM
To: Ian Lea
Cc: java-user@lucene.apache.org
Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6
inLucene 4.1 ?

On 18/02/2013 13:41, Ian Lea wrote:

See the migration guide:

"If you previously used Document.setBoost, you must now pre-multiply
the document boost into each Field.setBoost. If you have a
multi-valued field, you should do this only for the first Field
instance (ie, subsequent Field instance sharing the same field name
should only include their per-field boost and not the document level
boost) as the boost for multi-valued field instances are multiplied
together by Lucene."


--
Ian.


On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor 

wrote:

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1

?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Thanks, so its more difficult now sounds like a regression to me.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky

Please clarify exactly what you want to group by - give a specific example 
that makes it clear what terms should affect grouping and which shouldn't.


-- Jack Krupansky

-Original Message- 
From: Ramprakash Ramamoorthy

Sent: Monday, February 18, 2013 6:12 AM
To: java-user@lucene.apache.org
Subject: Grouping and tokens

Hello all,

From the grouping javadoc, I read that fields that are supposed to be
grouped should not be tokenized. I have an use case where the user has the
freedom to group by any field during search time.

Now that only tokenized fields are eligible for grouping, this is
creating an issue with my search. Say for instance the book "*Fifty shades
of grey*" when tokenized and searched for "*shades*" turns up in the
result. However this is not the case when I have it as a non-tokenized
field (using StandardAnalyzer-Version4.1).

How do I go about this? Is indexing a tokenized and non-tokenized
version of the same field the only go? I am afraid its way too costly!
Thanks in advance for your valuable inputs.

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Uwe Schindler

The problem is:
Lucene has never supported *real* per-document boosts. Those boosts were always 
per-field. As we only work per-field, it depends on the query how your results 
score. If you have a TermQuery, the per-field boost is used (the one from the 
field queried), but e.g. if you have another query (like MultiTermQuery) the 
boost is ignored completely. As it is always per-field, the results of this 
per-document boosting differ depending on the number of terms in your query, so 
it is not easy to make it consistent.
To boost a document in Lucene 1.x, 2.x, 3.x, and also 4.x, you have to use a 
function query with a per-document value that you have indexed as a separate 
(ideally as docvalues) field. In previous Lucene versions, FieldCache was the 
way to go. This code is a simple wrapper around your query with 
CustomScoreQuery and a ValueSource referring to the DocValues field, 5 lines of 
code -> and it will return consistent results!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Monday, February 18, 2013 5:08 PM
> To: Uwe Schindler
> Cc: java-user@lucene.apache.org
> Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6
> inLucene 4.1 ?
> 
> On 18/02/2013 16:04, Uwe Schindler wrote:
> > It is not a regression, as per-Document boost were never working correctly.
> If you want to boost documents in a consistent way (and make their scores in
> search results really using that factor), you should index a DocValues field
> and use that in a CustomScoreQuery to boost the results with that docvalues
> field. In Lucene 4.0 (together with other changes) we dropped the "old-
> style", confusing, and incorrect feature.
> Well per-Document boost seemed to worked for me with my tests. This new
> method your propose sounds more complex and is different to what the
> migration guide says so I don't see that as an improvement.
> 
> Paul
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> >> Sent: Monday, February 18, 2013 4:54 PM
> >> To: Ian Lea
> >> Cc: java-user@lucene.apache.org
> >> Subject: Re: What is equivalent to Document.setBoost() from Lucene
> >> 3.6 inLucene 4.1 ?
> >>
> >> On 18/02/2013 13:41, Ian Lea wrote:
> >>> See the migration guide:
> >>>
> >>> "If you previously used Document.setBoost, you must now pre-multiply
> >>> the document boost into each Field.setBoost. If you have a
> >>> multi-valued field, you should do this only for the first Field
> >>> instance (ie, subsequent Field instance sharing the same field name
> >>> should only include their per-field boost and not the document level
> >>> boost) as the boost for multi-valued field instances are multiplied
> >>> together by Lucene."
> >>>
> >>>
> >>> --
> >>> Ian.
> >>>
> >>>
> >>> On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor
> >>> 
> >> wrote:
>  What is equivalent to Document.setBoost() from Lucene 3.6 inLucene
>  4.1
> >> ?
>  ---
>  -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>  For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> >> Thanks, so its more difficult now sounds like a regression to me.
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor


On 18/02/2013 16:26, Uwe Schindler wrote:

The problem is:
Lucene has never supported *real* per-document boosts. Those boosts were always 
per-field. As we only work per-field, it depends on the query how your results 
score. If you have a TermQuery, the per-field boost is used (the one from the 
field queried), but e.g. if you have another query (like MultiTermQuery) the 
boost is ignored completely. As it is always per-field, the results of this 
per-document boosting differ depending on the number of terms in your query, so 
it is not easy to make it consistent.
To boost a document in Lucene 1.x, 2.x, 3.x, and also 4.x, you have to use a 
function query with a per-document value that you have indexed as a separate 
(ideally as docvalues) field. In previous Lucene versions, FieldCache was the way 
to go. This code is a simple wrapper around your query with CustomScoreQuery and a 
ValueSource referring to the DocValues field, 5 lines of code -> and it will 
return consistent results!

Uwe

Thanks bit clearer now, but 5 line example would be nice
And if this is the way to do things isnt the migration doc incorrect

Paul

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Monday, February 18, 2013 5:08 PM
To: Uwe Schindler
Cc: java-user@lucene.apache.org
Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6
inLucene 4.1 ?

On 18/02/2013 16:04, Uwe Schindler wrote:

It is not a regression, as per-Document boost were never working correctly.

If you want to boost documents in a consistent way (and make their scores in
search results really using that factor), you should index a DocValues field
and use that in a CustomScoreQuery to boost the results with that docvalues
field. In Lucene 4.0 (together with other changes) we dropped the "old-
style", confusing, and incorrect feature.
Well per-Document boost seemed to worked for me with my tests. This new
method your propose sounds more complex and is different to what the
migration guide says so I don't see that as an improvement.

Paul

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Monday, February 18, 2013 4:54 PM
To: Ian Lea
Cc: java-user@lucene.apache.org
Subject: Re: What is equivalent to Document.setBoost() from Lucene
3.6 inLucene 4.1 ?

On 18/02/2013 13:41, Ian Lea wrote:

See the migration guide:

"If you previously used Document.setBoost, you must now pre-multiply
the document boost into each Field.setBoost. If you have a
multi-valued field, you should do this only for the first Field
instance (ie, subsequent Field instance sharing the same field name
should only include their per-field boost and not the document level
boost) as the boost for multi-valued field instances are multiplied
together by Lucene."


--
Ian.


On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor


wrote:

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene
4.1

?

---
-- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Thanks, so its more difficult now sounds like a regression to me.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

IndexSearcher.close() removed in 4.0

2013-02-18 Thread saisantoshi

I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close()
is no-op operation but not very clear on why it is a no-op? Could someone
shed some light on this? We were using this method in the older versions and
is it safe now to remove this call. Just want to understand the consequences
before we make any changes? Is there any alternative that we need to use
here?

Thanks,
Sai



--
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

CJK evaluation. Standardanalyzer and Querytime.

2013-02-18 Thread Lucenius

Hello community,

i am doing an evaluation in the context of CJK. I compare some indexing
strategies like "unigram", "bigram", "unigram + bigram" and "word based"
indexing.

1.
I used the Standardanalyzer for "unigram". I think it works for chinese but
it is doing some other staff for Japanese and Korean. In Japanese some
characters get combined and for Korean it works like a WhiteSpaceAnalyzer,
right? Which Analyzer would you prefer for "unigrams" in Japanese and
Korean? Is there any flag in the CJKAnalyzer to output "unigrams" only?

2.
I used the CJKAnalyzer for "bigrams" and "unigrams + bigrams". I think it
works correct, but i have some performance issues. The Querytime for
"unigram + bigram" is about 8-20 times higher than "bigram" only. Any ideas?

Thank you.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/CJK-evaluation-Standardanalyzer-and-Querytime-tp4041190.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: IndexSearcher.close() removed in 4.0

2013-02-18 Thread Simon Willnauer

On Mon, Feb 18, 2013 at 7:32 PM, saisantoshi  wrote:
> I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close()
> is no-op operation but not very clear on why it is a no-op? Could someone
> shed some light on this? We were using this method in the older versions and
> is it safe now to remove this call. Just want to understand the consequences
> before we make any changes? Is there any alternative that we need to use
> here?

Hey,

previous version had a constructor that accepted a directory [1] if
you used this constructor IndexSearcher#close did also close the index
reader that was created. Since we removed this constructor we also
removed close since it's a no-op. IndexSearcher is just a wrapper to
add some functionality on top of the reader. You can ignore the
IS#close() if you closing the IndexReader properly.

simon
[1] 
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.store.Directory)
>
> Thanks,
> Sai
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Need Help:How to Get the enumeration of Terms Ending with a given word

2013-02-18 Thread Simon Willnauer

On Thu, Feb 14, 2013 at 11:42 AM, VIGNESH S  wrote:
> Hi,
>
> I have two questions
>
> 1.How to Get the enumeration of Terms Ending with a given word
> I saw we can get enumerations of word starting at a given word by
> Indexreader.terms(term())) method

unless you want to iterate all terms and check each if it ends with a
given string I think you need to index the terms in reversed order and
get a prefix terms enum from that field. Look at [1]

[1] 
http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
>
> 2.Actually I am doing an multiphrase query.In that I do do a suffix
> query on first word..How can i do please kindly help..
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: IndexSearcher.close() removed in 4.0

2013-02-18 Thread Eric Charles

Hi,
Why not having the IS#close() calling the wrapped IR#close() ?

I would be happier having to only deal with the Searcher once created
and forget it wraps a Reader: I create a Searcher, I close it.

Thx, Eric

On 18/02/2013 22:20, Simon Willnauer wrote:

On Mon, Feb 18, 2013 at 7:32 PM, saisantoshi wrote:

I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close()
is no-op operation but not very clear on why it is a no-op? Could someone
shed some light on this? We were using this method in the older versions and
is it safe now to remove this call. Just want to understand the consequences
before we make any changes? Is there any alternative that we need to use
here?

Hey,

previous version had a constructor that accepted a directory [1] if
you used this constructor IndexSearcher#close did also close the index
reader that was created. Since we removed this constructor we also
removed close since it's a no-op. IndexSearcher is just a wrapper to
add some functionality on top of the reader. You can ignore the
IS#close() if you closing the IndexReader properly.

simon
[1]
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.store.Directory)

Thanks,
Sai

--
View this message in context:
http://lucene.472066.n3.nabble.com/IndexSearcher-close-removed-in-4-0-tp4041177.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Grouping and tokens

2013-02-18 Thread Ramprakash Ramamoorthy

On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky wrote:

> Please clarify exactly what you want to group by - give a specific example
> that makes it clear what terms should affect grouping and which shouldn't.
>

Assume I am indexing a library data. Say there are the following fields for
a particular book.
1. Published
2. Language
3. Genre
4. Author
5. Title
6. ISBN

 While search time, the user can ask to group by any of the above
fields, which means all of them are not supposed to be tokenized. So as I
had told earlier, there is a book titled "Fifty shades of gray" and the
user searches for "shades". The result turns up in case the field is
tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear?

 In a nutshell, how do I use a groupby on a field that is also
tokenized?

>
> -- Jack Krupansky
>
> -Original Message- From: Ramprakash Ramamoorthy
> Sent: Monday, February 18, 2013 6:12 AM
> To: java-user@lucene.apache.org
> Subject: Grouping and tokens
>
>
> Hello all,
>
> From the grouping javadoc, I read that fields that are supposed to be
> grouped should not be tokenized. I have an use case where the user has the
> freedom to group by any field during search time.
>
> Now that only tokenized fields are eligible for grouping, this is
> creating an issue with my search. Say for instance the book "*Fifty shades
> of grey*" when tokenized and searched for "*shades*" turns up in the
>
> result. However this is not the case when I have it as a non-tokenized
> field (using StandardAnalyzer-Version4.1).
>
> How do I go about this? Is indexing a tokenized and non-tokenized
> version of the same field the only go? I am afraid its way too costly!
> Thanks in advance for your valuable inputs.
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> India,
> +91 9626975420
>
> --**--**-
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org
>
>

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India.
+91 9626975420

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky

Okay, so, fields that would normally need to be tokenized must be stored as 
both raw strings for grouping and tokenized text for keyword search. Simply 
use copyField to copy from one to the other.


-- Jack Krupansky

-Original Message- 
From: Ramprakash Ramamoorthy

Sent: Monday, February 18, 2013 11:13 PM
To: java-user@lucene.apache.org
Subject: Re: Grouping and tokens

On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky 
wrote:



Please clarify exactly what you want to group by - give a specific example
that makes it clear what terms should affect grouping and which shouldn't.



Assume I am indexing a library data. Say there are the following fields for
a particular book.
1. Published
2. Language
3. Genre
4. Author
5. Title
6. ISBN

While search time, the user can ask to group by any of the above
fields, which means all of them are not supposed to be tokenized. So as I
had told earlier, there is a book titled "Fifty shades of gray" and the
user searches for "shades". The result turns up in case the field is
tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear?

In a nutshell, how do I use a groupby on a field that is also
tokenized?



-- Jack Krupansky

-Original Message- From: Ramprakash Ramamoorthy
Sent: Monday, February 18, 2013 6:12 AM
To: java-user@lucene.apache.org
Subject: Grouping and tokens


Hello all,

From the grouping javadoc, I read that fields that are supposed to be
grouped should not be tokenized. I have an use case where the user has the
freedom to group by any field during search time.

Now that only tokenized fields are eligible for grouping, this is
creating an issue with my search. Say for instance the book "*Fifty shades
of grey*" when tokenized and searched for "*shades*" turns up in the

result. However this is not the case when I have it as a non-tokenized
field (using StandardAnalyzer-Version4.1).

How do I go about this? Is indexing a tokenized and non-tokenized
version of the same field the only go? I am afraid its way too costly!
Thanks in advance for your valuable inputs.

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420

--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org






--
With Thanks and Regards,
Ramprakash Ramamoorthy,
India.
+91 9626975420 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky

Oops, sorry for the "Solr" answer. In Lucene you need to simply index the 
same value, once as a raw string and a second time as a tokenized text 
field. Grouping would use the raw string version of the data.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Monday, February 18, 2013 11:21 PM
To: java-user@lucene.apache.org
Subject: Re: Grouping and tokens

Okay, so, fields that would normally need to be tokenized must be stored as
both raw strings for grouping and tokenized text for keyword search. Simply
use copyField to copy from one to the other.

-- Jack Krupansky

-Original Message- 
From: Ramprakash Ramamoorthy

Sent: Monday, February 18, 2013 11:13 PM
To: java-user@lucene.apache.org
Subject: Re: Grouping and tokens

On Mon, Feb 18, 2013 at 9:47 PM, Jack Krupansky
wrote:


Please clarify exactly what you want to group by - give a specific example
that makes it clear what terms should affect grouping and which shouldn't.



Assume I am indexing a library data. Say there are the following fields for
a particular book.
1. Published
2. Language
3. Genre
4. Author
5. Title
6. ISBN

While search time, the user can ask to group by any of the above
fields, which means all of them are not supposed to be tokenized. So as I
had told earlier, there is a book titled "Fifty shades of gray" and the
user searches for "shades". The result turns up in case the field is
tokenized. But here it doesn't, since it isn't tokenized. Hope I am clear?

In a nutshell, how do I use a groupby on a field that is also
tokenized?



-- Jack Krupansky

-Original Message- From: Ramprakash Ramamoorthy
Sent: Monday, February 18, 2013 6:12 AM
To: java-user@lucene.apache.org
Subject: Grouping and tokens


Hello all,

From the grouping javadoc, I read that fields that are supposed to be
grouped should not be tokenized. I have an use case where the user has the
freedom to group by any field during search time.

Now that only tokenized fields are eligible for grouping, this is
creating an issue with my search. Say for instance the book "*Fifty shades
of grey*" when tokenized and searched for "*shades*" turns up in the

result. However this is not the case when I have it as a non-tokenized
field (using StandardAnalyzer-Version4.1).

How do I go about this? Is indexing a tokenized and non-tokenized
version of the same field the only go? I am afraid its way too costly!
Thanks in advance for your valuable inputs.

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420

--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org






--
With Thanks and Regards,
Ramprakash Ramamoorthy,
India.
+91 9626975420


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

AUTO: Gili Nachum is out of the office (returning 20/02/2013)

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

Re: Grouping and tokens

RE: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

IndexSearcher.close() removed in 4.0

CJK evaluation. Standardanalyzer and Querytime.

Re: IndexSearcher.close() removed in 4.0

Re: Need Help:How to Get the enumeration of Terms Ending with a given word

Re: IndexSearcher.close() removed in 4.0

Re: Grouping and tokens

Re: Grouping and tokens

Re: Grouping and tokens

17 matches

Site Navigation

Mail list logo

Footer information