RE: question regarding wildcard-searches

2018-03-19 Thread Paesen Roel
Hi,

The goal is to provide a google-like search field for our databases, (one 
simple searchfield on a webpage) that is why we copy everything into the _text_ 
field, so that everything is searchable. (is there a better way to achieve 
something like this?)

I should have been more clear before, but the different numbers I gave as 
example are all different solr-documents, with only 1 number per solr-document, 
so there is no need (for this field) to be multi-valued. Sorry about that.

Here is my text_general definition (which is a direct copy from the DIH-example 
that comes with solr 7.2.1):
---8<

  




  
  




  

---8<

In the analysis screen, I see that indeed the text gets broken down to 'EO' 
(alphanumeric), and '1954.53.1' (numeric).
Searching without wildcard also returns zero results...

As I mentioned before: we are testing this all, so we are not really up to 
speed with the why-does-this-do-that, although I am trying to learn.

Thanks for any other pointers you can provide.
Greetings,
Roel

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: vrijdag 16 maart 2018 17:08
To: solr-user
Subject: Re: question regarding wildcard-searches

If you goal is to search prefixes only, I'd go away from the _text_ field all 
together and use a "string" type. This will mean you need to
1> make it multiValued=true
2> split this up (either on your client or use a
FieldMutatingUpdateProcessor, probably RegexReplaceProcessorFactory) into 
separate entries, i.e.
'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3'
becomes three separate entries in the field 'EO.1954.53.1'
'EO.1954.53.2'
'EO.1954.53.3'

At that point, searches like: 'EO.1954.53.*'

will work just fine. NOTE: String types do zero analysis, so you have to handle 
things like casing yourself. That is, 'eO.1954.53.*' would _not_ match. You can 
probably use something like KeywordTokenizerFactory + LowerCaseFilterFactory in 
that case.

All this makes _much_ more sense if you use the admin UI>>analysis page 
(probably uncheck the "verbose" checkbox, there'll be less clutter").

Best,
Erick

On Fri, Mar 16, 2018 at 8:35 AM, Emir Arnautović <emir.arnauto...@sematext.com> 
wrote:
> Hi Roel,
> As mentioned, _text_ field probably does not contain complete “EO.1954.53.1” 
> but only its parts. You can verify that using snalysis screen in admin 
> console. What you can try is searching for phrase without wildcard 
> “EO.1954.53” or if you are using WordDelimiterTokenFilter in your analysis 
> chain, you can set preserveOriginal=“1” and reindex.
>
> Can you share how your text_general looks like.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 16 Mar 2018, at 14:05, Paesen Roel <roel.pae...@africamuseum.be> wrote:
>>
>> Hi,
>>
>> Unfortunately that also gives no results (and it would not be 
>> practical, as for this example the numbering only goes up till 19 but 
>> others go up into the thousands etc)
>>
>> Anybody with a pointer on this?
>>
>> Thanks already,
>> Roel
>>
>>
>> -Original Message-
>> From: jagdish vasani [mailto:jagdisht.vas...@gmail.com]
>> Sent: vrijdag 16 maart 2018 12:41
>> To: solr-user@lucene.apache.org
>> Subject: Re: question regarding wildcard-searches
>>
>> Hi paesen,
>>
>> Value - EO.1954.53.1 is indexed as below Eo
>> 1954
>> 53
>> 1
>> Dot is removed.try with wildcard -?
>> Like EO.1954.53.?? If you have 2 digits only in last..
>>
>> I have not tried but you just check it.
>> Hope it will solve your problem.
>>
>> Thanks,
>> Jagdish
>> On 16-Mar-2018 3:51 pm, "Paesen Roel" <roel.pae...@africamuseum.be> wrote:
>>
>>> Hi everybody,
>>>
>>> We are experimenting with solr, and I have a (I think) basic-level
>>> question:
>>> we have a multiple fields, all copied into a generic field so we can 
>>> search everything at once.
>>> However we have a (for us) strange situation doing wildcard searches 
>>> for the contents of one specific field.
>>>
>>> Given in the schema:
>>>
>>> >> multiValued="true"/>
>>>
>>> >> stored="true"/>
>>>  
>>> and lot of other fields exactly like 'genormaliseerdInventarisnummer'.
>>>
>>>
>>> Now, we are certain that the field 'genormaliseerdInventarisnummer'
>>> contains entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3', 
>>> all the way up to '.19', we can query these directly by passing 
>>> these exact texts to the query on field '_text_' (our default search field).
>>> Problem is: wildcard searches for these don't work, like 'EO.1954.53.*'
>>> for example returns zero results.
>>>
>>> Why is that?
>>> What needs to be adjusted? (and how?)
>>>
>>> Thanks already,
>>> Roel
>>>
>>>
>


RE: question regarding wildcard-searches

2018-03-16 Thread Paesen Roel
Hi,

Unfortunately that also gives no results (and it would not be practical, as for 
this example the numbering only goes up till 19 but others go up into the 
thousands etc)

Anybody with a pointer on this?

Thanks already,
Roel


-Original Message-
From: jagdish vasani [mailto:jagdisht.vas...@gmail.com] 
Sent: vrijdag 16 maart 2018 12:41
To: solr-user@lucene.apache.org
Subject: Re: question regarding wildcard-searches

Hi paesen,

Value - EO.1954.53.1 is indexed as below Eo
1954
53
1
Dot is removed.try with wildcard -?
Like EO.1954.53.?? If you have 2 digits only in last..

I have not tried but you just check it.
Hope it will solve your problem.

Thanks,
Jagdish
On 16-Mar-2018 3:51 pm, "Paesen Roel" <roel.pae...@africamuseum.be> wrote:

> Hi everybody,
>
> We are experimenting with solr, and I have a (I think) basic-level
> question:
> we have a multiple fields, all copied into a generic field so we can 
> search everything at once.
> However we have a (for us) strange situation doing wildcard searches 
> for the contents of one specific field.
>
> Given in the schema:
>
>  multiValued="true"/>
>
>  stored="true"/>
>  
> and lot of other fields exactly like 'genormaliseerdInventarisnummer'.
>
>
> Now, we are certain that the field 'genormaliseerdInventarisnummer'
> contains entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3', 
> all the way up to '.19', we can query these directly by passing these 
> exact texts to the query on field '_text_' (our default search field).
> Problem is: wildcard searches for these don't work, like 'EO.1954.53.*'
> for example returns zero results.
>
> Why is that?
> What needs to be adjusted? (and how?)
>
> Thanks already,
> Roel
>
>


question regarding wildcard-searches

2018-03-16 Thread Paesen Roel
Hi everybody,

We are experimenting with solr, and I have a (I think) basic-level question:
we have a multiple fields, all copied into a generic field so we can search 
everything at once.
However we have a (for us) strange situation doing wildcard searches for the 
contents of one specific field.

Given in the schema:





and lot of other fields exactly like 'genormaliseerdInventarisnummer'.


Now, we are certain that the field 'genormaliseerdInventarisnummer' contains 
entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3', all the way up to 
'.19', we can query these directly by passing these exact texts to the query on 
field '_text_' (our default search field).
Problem is: wildcard searches for these don't work, like 'EO.1954.53.*' for 
example returns zero results.

Why is that?
What needs to be adjusted? (and how?)

Thanks already,
Roel