Re: Search only for single value of Solr multivalue field (part 2)

2018-12-17 Thread Nicolas Paris
On Sun, Dec 16, 2018 at 05:44:30PM -0800, Erick Erickson wrote:
> No, the idea is that you have N single valued fields, one for each of
> the MV entries you have. The copyField dest would be MV, and only used
> in those cases you wanted to match across values. Not saying that's a
> great solution, or if it would even necessarily work but thought it
> worth mentioning.

Ok, then my initial document with MV fields:
> "content_txt":["001 first","002 second"]
would become:
> "content1_t":"001 first"
> "content2_t":"002 second"
> "_copiedfield_":["001 first","002 second"]

And then the initial user query:
> content_txt:(first AND second)
would become:
> content1_t:(first AND second) OR content2_t:(first AND second)


Depending on the length of the initial array, each document will have a
different number of contentx_t. This means some management like a layer between
the user and the parser, to extend the query with the maximum possible
contentx_t fields in the collection. (with max=100 for performance reason?)


QUESTION:

is the MV limitation a *solr parser* limitation, or a *lucene* limitation. If
it is the latter, writing my own parser would be an option isn't ?


-- 
nicolas


Re: Search only for single value of Solr multivalue field (part 2)

2018-12-16 Thread Erick Erickson
bq. multiple fields acts as a MV field

No, the idea is that you have N single valued fields, one for each of
the MV entries you have. The copyField dest would be MV, and only used
in those cases you wanted to match across values. Not saying that's a
great solution, or if it would even necessarily work but thought it
worth mentioning.


Best,
Erick
On Sun, Dec 16, 2018 at 1:14 PM Nicolas Paris  wrote:
>
> On Sun, Dec 16, 2018 at 09:30:33AM -0800, Erick Erickson wrote:
> > Have you looked at ComplexPhraseQueryParser here?
> > https://lucene.apache.org/solr/guide/6_6/other-parsers.html
>
> Sure. However, I am using multi-word synonyms and so far the
> complexphrase does not handle them. (maybe soon ?)
>
> > Depending on how many of these you have, you could do something with
> > dynamic fields. Rather than use a single MV field, use N fields. You'd
> > probably have to copyField or some such to a catch-all field for
> > searches that you wanted to ignore the "mv nature" of the field.
>
> Problem with copyField from multiple fields acts as a MV field. So the
> problem remains: dealing with MV fields. Isn't ?
>
> Thanks
>
> --
> nicolas


Re: Search only for single value of Solr multivalue field (part 2)

2018-12-16 Thread Nicolas Paris
On Sun, Dec 16, 2018 at 09:30:33AM -0800, Erick Erickson wrote:
> Have you looked at ComplexPhraseQueryParser here?
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html

Sure. However, I am using multi-word synonyms and so far the
complexphrase does not handle them. (maybe soon ?)

> Depending on how many of these you have, you could do something with
> dynamic fields. Rather than use a single MV field, use N fields. You'd
> probably have to copyField or some such to a catch-all field for
> searches that you wanted to ignore the "mv nature" of the field. 

Problem with copyField from multiple fields acts as a MV field. So the
problem remains: dealing with MV fields. Isn't ?

Thanks

-- 
nicolas


Re: Search only for single value of Solr multivalue field (part 2)

2018-12-16 Thread Erick Erickson
Have you looked at ComplexPhraseQueryParser here?
https://lucene.apache.org/solr/guide/6_6/other-parsers.html

But no, there are no plans that I know of to include something that
has the notion of searching within MV fields.

Depending on how many of these you have, you could do something with
dynamic fields. Rather than use a single
MV field, use N fields. You'd probably have to copyField or some such
to a catch-all field for searches that you
wanted to ignore the "mv nature" of the field. I'd be nervous as the
number of such fields got into the hundreds
however.

Best,
Erick
On Sun, Dec 16, 2018 at 2:54 AM Nicolas Paris  wrote:
>
> hi
>
> This question is highly related to a previous one found on the
> mailing-list archive [1].
>
> I have this document:
>
> "content_txt":["001 first","002 second"]
> I d'like the below query return nothing:
> > q=content_txt:(first AND second)
>
> The method proposed ([1]) by Erick works ok to look for a single value
> having BOTH first AND second by setting the field positionIncrementGap
> high enough:
>
> This query returns nothing as expected:
> > q=content_txt:("first second"~99)
>
>
> However, this is based on *phrase search*. Phrase search does not allow
> to use the below simple query parser features. That's a _HUGE_ limitation!
> - regexp
> - fuzzy
> - whildcard
> - ranges
>
> So the query below does won't match the first field:
> > q=content_txt:("[000 TO 001] first"~99)
> While this one does match the second and shouldn't!
> > q=content_txt:([000 TO 001] AND "second")
>
> QUESTION:
> -
> Is there a chance such feature will be developed in future SolR version ? I 
> mean something
> allowing considering multivalued fields independently ? A new field
> attribute such independentMultivalued=true would be ok ?
>
> Thanks,
>
>
> [1]: 
> http://lucene.472066.n3.nabble.com/Search-only-for-single-value-of-Solr-multivalue-field-td4309850.html#a4309893
>
> --
> nicolas


Search only for single value of Solr multivalue field (part 2)

2018-12-16 Thread Nicolas Paris
hi

This question is highly related to a previous one found on the
mailing-list archive [1].

I have this document:

"content_txt":["001 first","002 second"]
I d'like the below query return nothing:
> q=content_txt:(first AND second)

The method proposed ([1]) by Erick works ok to look for a single value
having BOTH first AND second by setting the field positionIncrementGap
high enough:

This query returns nothing as expected:
> q=content_txt:("first second"~99)


However, this is based on *phrase search*. Phrase search does not allow
to use the below simple query parser features. That's a _HUGE_ limitation!
- regexp
- fuzzy
- whildcard
- ranges

So the query below does won't match the first field:
> q=content_txt:("[000 TO 001] first"~99)
While this one does match the second and shouldn't!
> q=content_txt:([000 TO 001] AND "second")

QUESTION:
-
Is there a chance such feature will be developed in future SolR version ? I 
mean something
allowing considering multivalued fields independently ? A new field
attribute such independentMultivalued=true would be ok ?

Thanks,


[1]: 
http://lucene.472066.n3.nabble.com/Search-only-for-single-value-of-Solr-multivalue-field-td4309850.html#a4309893

-- 
nicolas


Re: Search only for single value of Solr multivalue field

2016-12-16 Thread Leo BRUVRY-LAGADEC

Hi Dorian,

Firstly thanks for your response, but it does not seems to work.

Here is another example, I want to search document with affiliations 
contains the NHM (Natural History Museum) of India. So, I want to only 
get the document with id=2 :



1
NHM, Austria
Annamalai Univ, India



2
NHM, India
IRD, FRANCE


If I implement your solution, ((NMH in affilliation OR India in 
affilliation) AND NOT (NMH in affilliation AND India in affilliation) it 
doesn't return any document. did I have missed something in you 
explanation ?


In the prvious version of my application I used and had a solution with 
Oracle Full Text, it seem weird that SOLR cannot provide a solution for 
that.


Best regards,
Léo.

Le 15/12/2016 12:44, Dorian Hoxha a écrit :

You should be able to filter "(word1 in field OR word2 in field) AND
NOT(word1 in field AND word2 in field)". Translate that into the right
syntax.
I don't know if lucene is smart enough to execute the filter only once (it
should be i guess).
Makes sense ?

On Thu, Dec 15, 2016 at 12:12 PM, Leo BRUVRY-LAGADEC  wrote:


Hi,

I have a multivalued field in my schema called "idx_affilliation".

IFREMER, Ctr Brest, DRO Geosci Marines,
F-29280 Plouzane, France.
Univ Lisbon, Ctr Geofis, P-1269102 Lisbon,
Portugal.
Univ Bretagne Occidentale, Inst Univ
Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
Total Explorat Prod Geosci Projets Nouveaux
Exper, F-92078 Paris, France.

I want to be able to do a query like: idx_affilliation:(IFREMER Portugal)
and not have this document returned. In other words, I do not want queries
to span individual values for the field.


---

Here are some further examples using the document above of how I want this
to work:

idx_affilliation:(IFREMER France) --> Returns it.
idx_affilliation:(IFREMER Plouzane) --> Returns it.
idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.

Does someone known if it's possible to do this ?

Best regards,
Leo.





Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Erick Erickson
Phrase queries and slop and positionIncrementGap ;)

The fieldType has a positionIncrementGap. This is the token delta
between the end token of one entry and the beginning of the next.

so the first entry: IFREMER, Ctr Brest, DRO Geosci Marines, F-29280
Plouzane, France
IFREMER would have a position of 1 and France would have a position of 9 or so.
If the positionIncrementGap was 100 then this entry:
Univ Lisbon, Ctr Geofis, P-1269102 Lisbon, Portugal.
Univ would have a position of 110.

Now if I seach "IFREMER France"~99 it'd match the first one
but searching "IFREMER Lisbon"~99 it would not match since the
positions are > 99 apart.

So you configure the positionIncrementGap to be greater than the
longest number of tokens you ever expect to have in a single entry.

HTH
Erick

On Thu, Dec 15, 2016 at 3:44 AM, Dorian Hoxha  wrote:
> You should be able to filter "(word1 in field OR word2 in field) AND
> NOT(word1 in field AND word2 in field)". Translate that into the right
> syntax.
> I don't know if lucene is smart enough to execute the filter only once (it
> should be i guess).
> Makes sense ?
>
> On Thu, Dec 15, 2016 at 12:12 PM, Leo BRUVRY-LAGADEC  partenaire-exterieur.ifremer.fr> wrote:
>
>> Hi,
>>
>> I have a multivalued field in my schema called "idx_affilliation".
>>
>> IFREMER, Ctr Brest, DRO Geosci Marines,
>> F-29280 Plouzane, France.
>> Univ Lisbon, Ctr Geofis, P-1269102 Lisbon,
>> Portugal.
>> Univ Bretagne Occidentale, Inst Univ
>> Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
>> Total Explorat Prod Geosci Projets Nouveaux
>> Exper, F-92078 Paris, France.
>>
>> I want to be able to do a query like: idx_affilliation:(IFREMER Portugal)
>> and not have this document returned. In other words, I do not want queries
>> to span individual values for the field.
>>
>> 
>> ---
>>
>> Here are some further examples using the document above of how I want this
>> to work:
>>
>> idx_affilliation:(IFREMER France) --> Returns it.
>> idx_affilliation:(IFREMER Plouzane) --> Returns it.
>> idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
>> idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
>> idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.
>>
>> Does someone known if it's possible to do this ?
>>
>> Best regards,
>> Leo.
>>


Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Dorian Hoxha
You should be able to filter "(word1 in field OR word2 in field) AND
NOT(word1 in field AND word2 in field)". Translate that into the right
syntax.
I don't know if lucene is smart enough to execute the filter only once (it
should be i guess).
Makes sense ?

On Thu, Dec 15, 2016 at 12:12 PM, Leo BRUVRY-LAGADEC  wrote:

> Hi,
>
> I have a multivalued field in my schema called "idx_affilliation".
>
> IFREMER, Ctr Brest, DRO Geosci Marines,
> F-29280 Plouzane, France.
> Univ Lisbon, Ctr Geofis, P-1269102 Lisbon,
> Portugal.
> Univ Bretagne Occidentale, Inst Univ
> Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
> Total Explorat Prod Geosci Projets Nouveaux
> Exper, F-92078 Paris, France.
>
> I want to be able to do a query like: idx_affilliation:(IFREMER Portugal)
> and not have this document returned. In other words, I do not want queries
> to span individual values for the field.
>
> 
> ---
>
> Here are some further examples using the document above of how I want this
> to work:
>
> idx_affilliation:(IFREMER France) --> Returns it.
> idx_affilliation:(IFREMER Plouzane) --> Returns it.
> idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
> idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
> idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.
>
> Does someone known if it's possible to do this ?
>
> Best regards,
> Leo.
>


Search only for single value of Solr multivalue field

2016-12-15 Thread Leo BRUVRY-LAGADEC

Hi,

I have a multivalued field in my schema called "idx_affilliation".

IFREMER, Ctr Brest, DRO Geosci Marines, 
F-29280 Plouzane, France.
Univ Lisbon, Ctr Geofis, P-1269102 
Lisbon, Portugal.
Univ Bretagne Occidentale, Inst Univ 
Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
Total Explorat Prod Geosci Projets 
Nouveaux Exper, F-92078 Paris, France.


I want to be able to do a query like: idx_affilliation:(IFREMER 
Portugal) and not have this document returned. In other words, I do not 
want queries to span individual values for the field.


---

Here are some further examples using the document above of how I want 
this to work:


idx_affilliation:(IFREMER France) --> Returns it.
idx_affilliation:(IFREMER Plouzane) --> Returns it.
idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.

Does someone known if it's possible to do this ?

Best regards,
Leo.