Get matching fields from a BooleanQuery

2017-06-19 Thread Frederik Van Hoyweghen
Hey everyone,

To start, we are using Lucene 4.3.

To search, we prepare several queries and combine these into a BooleanQuery.
What we are looking for is a way to determine on which specific fields a
certain document matched.
For example, I create 2 queries: one to search in the "Name" field, and
another to search in the "Description" field.

Combining these into a BooleanQuery and running it will return the matching
documents,
but we'd like to know for each document returned whether there was a match
in the Name field or in the Description field.

It seems to me that something like the highlighter would need to know this
too but highlighting isn't a goal currently. I've also looked at
indexsearcher.explain() but the doc says that this is as expensive as
running the query against the entire index, so I'd obviously like to avoid
running the same queries mutliple times :).

Kind regards,
Frederik


Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Joe Ye
Hi,

Could anyone help with my issue described below? If I'm not posting on the
right mailing list please direct me to the correct one.

Many thanks,
Joe


On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye  wrote:

> Hi,
>
> I have a few NumericDocValuesField fields and also added separate
> StoredField fields to store the values so that I can access them in query
> results. I used IndexWriter.updateNumericDocValue to update the value of
> a DocValues field. Then I firstly called SearcherManager.maybeRefresh to
> ensure SearcherManager.acquire will return refreshed instances and used 
> DocValuesNumbersQuery
> with the updated value. I did get the matching document in the query
> result but when I tried to access its value using  Document.get, it's still
> the old value. It appears that updating the DocValues field doesn't update
> its associated StoredField value. What do I miss here?
>
>
> I would highly appreciate your help!
>
>
> Regards,
>
> Joe
>


Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Michael McCandless
Updating the doc value will not update the stored field (what document.get
returns).  If you need to change stored fields you have to use the
IW.updateDocuments API, where the old document is deleted and a new
document is indexed, atomically (to refresh).

But also see Erick's solr-specific response (to the list) a week ago.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 19, 2017 at 5:41 AM, Joe Ye  wrote:

> Hi,
>
> Could anyone help with my issue described below? If I'm not posting on the
> right mailing list please direct me to the correct one.
>
> Many thanks,
> Joe
>
>
> On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye  wrote:
>
> > Hi,
> >
> > I have a few NumericDocValuesField fields and also added separate
> > StoredField fields to store the values so that I can access them in query
> > results. I used IndexWriter.updateNumericDocValue to update the value of
> > a DocValues field. Then I firstly called SearcherManager.maybeRefresh to
> > ensure SearcherManager.acquire will return refreshed instances and used
> DocValuesNumbersQuery
> > with the updated value. I did get the matching document in the query
> > result but when I tried to access its value using  Document.get, it's
> still
> > the old value. It appears that updating the DocValues field doesn't
> update
> > its associated StoredField value. What do I miss here?
> >
> >
> > I would highly appreciate your help!
> >
> >
> > Regards,
> >
> > Joe
> >
>


Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Joe Ye
Thanks Mike! My colleague only forwarded Erick's Solr reply today as it
seems I didn't get any emails and may have been taken off the mailing list
for some reason?

We're using Lucene core only (version 6.2.1 at the moment). So there's no
link between the docValue and its associated stored field? Is there
anything similar/equivalent to useDocValuesAsStored in Lucene core? We're
trying to use docValues to avoid a full update (delete + create new)...
Yet, we still need to retrieve the updated values.

Regards,
Joe

On Mon, Jun 19, 2017 at 4:16 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Updating the doc value will not update the stored field (what document.get
> returns).  If you need to change stored fields you have to use the
> IW.updateDocuments API, where the old document is deleted and a new
> document is indexed, atomically (to refresh).
>
> But also see Erick's solr-specific response (to the list) a week ago.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Jun 19, 2017 at 5:41 AM, Joe Ye  wrote:
>
>> Hi,
>>
>> Could anyone help with my issue described below? If I'm not posting on the
>> right mailing list please direct me to the correct one.
>>
>> Many thanks,
>> Joe
>>
>>
>> On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye  wrote:
>>
>> > Hi,
>> >
>> > I have a few NumericDocValuesField fields and also added separate
>> > StoredField fields to store the values so that I can access them in
>> query
>> > results. I used IndexWriter.updateNumericDocValue to update the value
>> of
>> > a DocValues field. Then I firstly called SearcherManager.maybeRefresh to
>> > ensure SearcherManager.acquire will return refreshed instances and used
>> DocValuesNumbersQuery
>> > with the updated value. I did get the matching document in the query
>> > result but when I tried to access its value using  Document.get, it's
>> still
>> > the old value. It appears that updating the DocValues field doesn't
>> update
>> > its associated StoredField value. What do I miss here?
>> >
>> >
>> > I would highly appreciate your help!
>> >
>> >
>> > Regards,
>> >
>> > Joe
>> >
>>
>
>


Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Erick Erickson
Joe:

I have no reason to believe you were taken off the user's list
intentionally. Maybe your spam filter is over-zealous or something? Or
perhaps you registered with some no-longer-valid mail address and
could register again?

Erick

On Mon, Jun 19, 2017 at 8:50 AM, Joe Ye  wrote:
> Thanks Mike! My colleague only forwarded Erick's Solr reply today as it
> seems I didn't get any emails and may have been taken off the mailing list
> for some reason?
>
> We're using Lucene core only (version 6.2.1 at the moment). So there's no
> link between the docValue and its associated stored field? Is there
> anything similar/equivalent to useDocValuesAsStored in Lucene core? We're
> trying to use docValues to avoid a full update (delete + create new)...
> Yet, we still need to retrieve the updated values.
>
> Regards,
> Joe
>
> On Mon, Jun 19, 2017 at 4:16 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Updating the doc value will not update the stored field (what document.get
>> returns).  If you need to change stored fields you have to use the
>> IW.updateDocuments API, where the old document is deleted and a new
>> document is indexed, atomically (to refresh).
>>
>> But also see Erick's solr-specific response (to the list) a week ago.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Mon, Jun 19, 2017 at 5:41 AM, Joe Ye  wrote:
>>
>>> Hi,
>>>
>>> Could anyone help with my issue described below? If I'm not posting on the
>>> right mailing list please direct me to the correct one.
>>>
>>> Many thanks,
>>> Joe
>>>
>>>
>>> On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye  wrote:
>>>
>>> > Hi,
>>> >
>>> > I have a few NumericDocValuesField fields and also added separate
>>> > StoredField fields to store the values so that I can access them in
>>> query
>>> > results. I used IndexWriter.updateNumericDocValue to update the value
>>> of
>>> > a DocValues field. Then I firstly called SearcherManager.maybeRefresh to
>>> > ensure SearcherManager.acquire will return refreshed instances and used
>>> DocValuesNumbersQuery
>>> > with the updated value. I did get the matching document in the query
>>> > result but when I tried to access its value using  Document.get, it's
>>> still
>>> > the old value. It appears that updating the DocValues field doesn't
>>> update
>>> > its associated StoredField value. What do I miss here?
>>> >
>>> >
>>> > I would highly appreciate your help!
>>> >
>>> >
>>> > Regards,
>>> >
>>> > Joe
>>> >
>>>
>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



sub

2017-06-19 Thread kenny mcfarland



Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Michael McCandless
In pure Lucene you could just pull the doc values for the docIDs in your
set of search results; MultiDocValues can be helpful sugar here, unless you
need SORTED or SORTED_SET in which case it's best to go per-segment.

Or just track down where Solr does this and poach those sources.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 19, 2017 at 11:50 AM, Joe Ye  wrote:

> Thanks Mike! My colleague only forwarded Erick's Solr reply today as it
> seems I didn't get any emails and may have been taken off the mailing list
> for some reason?
>
> We're using Lucene core only (version 6.2.1 at the moment). So there's no
> link between the docValue and its associated stored field? Is there
> anything similar/equivalent to useDocValuesAsStored in Lucene core? We're
> trying to use docValues to avoid a full update (delete + create new)...
> Yet, we still need to retrieve the updated values.
>
> Regards,
> Joe
>
> On Mon, Jun 19, 2017 at 4:16 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Updating the doc value will not update the stored field (what
>> document.get returns).  If you need to change stored fields you have to use
>> the IW.updateDocuments API, where the old document is deleted and a new
>> document is indexed, atomically (to refresh).
>>
>> But also see Erick's solr-specific response (to the list) a week ago.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Mon, Jun 19, 2017 at 5:41 AM, Joe Ye  wrote:
>>
>>> Hi,
>>>
>>> Could anyone help with my issue described below? If I'm not posting on
>>> the
>>> right mailing list please direct me to the correct one.
>>>
>>> Many thanks,
>>> Joe
>>>
>>>
>>> On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye  wrote:
>>>
>>> > Hi,
>>> >
>>> > I have a few NumericDocValuesField fields and also added separate
>>> > StoredField fields to store the values so that I can access them in
>>> query
>>> > results. I used IndexWriter.updateNumericDocValue to update the value
>>> of
>>> > a DocValues field. Then I firstly called SearcherManager.maybeRefresh
>>> to
>>> > ensure SearcherManager.acquire will return refreshed instances and
>>> used DocValuesNumbersQuery
>>> > with the updated value. I did get the matching document in the query
>>> > result but when I tried to access its value using  Document.get, it's
>>> still
>>> > the old value. It appears that updating the DocValues field doesn't
>>> update
>>> > its associated StoredField value. What do I miss here?
>>> >
>>> >
>>> > I would highly appreciate your help!
>>> >
>>> >
>>> > Regards,
>>> >
>>> > Joe
>>> >
>>>
>>
>>
>


Re: email field - analyzed and not analyzed in single field using custom analyzer

2017-06-19 Thread Kumaran Ramasubramanian
Hi Steve

Thanks for the input. How to apply WordDelimiterGraphFilter
/ WordDelimiterFilter for email tokens alone using email regex ? i want to
have only analyzed tokens for other tokens with other type of special
characters...


--
Kumaran R






On Thu, Jun 15, 2017 at 7:43 PM, Steve Rowe  wrote:

> Hi Kumaran,
>
> WordDelimiterGraphFilter with PRESERVE_ORIGINAL should do what you want: <
> http://lucene.apache.org/core/6_6_0/analyzers-common/
> org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html>.
>
> Here’s a test I added to TestWordDelimiterGraphFilter.java that passed
> for me:
>
> -
> public void testEmail() throws Exception {
>   final int flags = GENERATE_WORD_PARTS | GENERATE_NUMBER_PARTS |
> SPLIT_ON_CASE_CHANGE | SPLIT_ON_NUMERICS | PRESERVE_ORIGINAL;
>   Analyzer a = new Analyzer() {
> @Override public TokenStreamComponents createComponents(String field) {
>   Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE,
> false);
>   return new TokenStreamComponents(tokenizer, new
> WordDelimiterGraphFilter(tokenizer, flags, null));
> }
>   };
>   assertAnalyzesTo(a, "will.sm...@yahoo.com",
>   new String[] { "will.sm...@yahoo.com", "will", "smith", "yahoo",
> "com" },
>   null, null, null,
>   new int[] { 1, 0, 1, 1, 1 },
>   null, false);
>   a.close();
> }
> -
>
> --
> Steve
> www.lucidworks.com
>
> > On Jun 15, 2017, at 8:53 AM, Kumaran Ramasubramanian 
> wrote:
> >
> > Hi All,
> >
> > i want to index email fields as both analyzed and not analyzed using
> custom
> > analyzer.
> >
> > for example,
> > sm...@yahoo.com
> > will.sm...@yahoo.com
> >
> > that is,  indexing sm...@yahoo.com as single token as well as analyzed
> > tokens in same email field...
> >
> >
> > My existing custom analyzer,
> >
> > public class CustomSearchAnalyzer extends StopwordAnalyzerBase
> > {
> >
> >public CustomSearchAnalyzer(Version matchVersion, Reader stopwords)
> > throws Exception
> >{
> >super(matchVersion, loadStopwordSet(stopwords, matchVersion));
> >}
> >
> >@Override
> >protected Analyzer.TokenStreamComponents createComponents(final String
> > fieldName, final Reader reader)
> >{
> >final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
> > reader);
> >src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> >TokenStream tok = new ClassicFilter(src);
> >tok = new LowerCaseFilter(getVersion(), tok);
> >tok = new StopFilter(getVersion(), tok, stopwords);
> >tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
> > search
> >
> >return new Analyzer.TokenStreamComponents(src, tok)
> >{
> >@Override
> >protected void setReader(final Reader reader) throws
> IOException
> >{
> >
> > src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> >super.setReader(reader);
> >}
> >};
> >}
> > }
> >
> >
> > And so i want to achieve like,
> >
> > 1.if i search using query "sm...@yahoo.com", records with
> > will.sm...@yahoo.com should not come...
> > 2.Also i should be able to search using query "smith" in that field
> > 3.if possible, should be able to detect email values in all other fields
> > and apply the same type of tokenization
> >
> > How to achieve point 1 and 2 using UAX29URLEmailTokenizer? how to add
> > UAX29URLEmailTokenizer in my existing custom analyzer without using email
> > analyzer ( perfieldanalyzer )  for email field.. And so i can apply this
> > tokenizer for email terms of all fields..
> >
> >
> >
> > -
> > Kumaran R
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


SpanNearQuery Class issue

2017-06-19 Thread Ranganath B N
Hi,

This is regarding the search limit of  SpanNearQuery Class.  I create a 
lucene index  consisting of 2 billion documents  and search the index using  
SpanNearQuery class object  in  Searcher.search(Query query, int n).  But the 
search method returns
Results only if search terms are within first  6 crore   inserted  documents.   
Am I missing  anything during initialization so that search is getting 
restricted or is this a limitation issue with  SpanNearQuery Class?
I am using Apache lucene 6.5.0 version.  Please let me know about this since I 
am using this for a critical project?

Thanks,
Ranganath B. N.