On 11.04.2012 16:35, Ludmila Marian wrote:

Hello Ludmila!


perform_request_search(p1='plo', m1='r', f1='author', cc='People')

This indeed is more restrictive since it searches only the author index
but is more broad because is doing a REGEXP search.

Though I do not use any regexping here.

m1='r' means regular expression search, so the Invenio
search engine automatically transforms it in a REGEXP
search (even if you are not using any regexp syntax).

I understood that. The 'r' was mainly something like "it
could be that it will probably be necessary". I just wasn't
aware of the fact that I also win the substring.

The query in this case would be: select <something> from <author_index>
where value REGEXP 'plo';

That's what I understood. From a feeling I would guess that this is the
same as searching for 'plo' in the first case...

and this will match also the words that contain 'plo' as a substring (so
'fooplobar' would be a match) - as when doing a substring/phrase search.

... as I did NOT search for .*plo.*

I understand it works like the match operator, right? Something like

hit = 1 if str =~ m/plo/;

in perlspeak. So, in selecting regexp search I automagically win
"left/right truncation" to the search string which itself is handled as
phrase, right? Something like "a b c" in regexp search would search for
.*a b c.* (again something like =~ m/a b c/) and not for "a or b or c"
in simple search?

Exactly. This is the default behavior of REGEXP operator
in mysql. About the phrase vs string search you are again
right. All the m='r' searches are done on the phrase index
(and not the word index).

Ok.

(FYI (in connection with the perl syntax): the syntax for
instructing the search engine to perform a regular
expression search when using the simple search interface
is /search_query/

Ah. Thanks. That might come in handy at times.

Or the other way round: to mimic simple search via regexp
in my first example I would have had to search \bplo\b?

This might be a bit of overkill for the search engine,
since doing a regexp search is quite heavy for the system.

My initial reason to not give the 'r' in my sample. The
queries done there might be "a bunch" in a row and I wanted
to keep the footprint low.

The way I see it (but this is my personal opinion) regex
search should be used in cases where one needs very
complicated queries

Would be my feeling as well.

[...]
For retrieving all the words that contain 'plo' as a substring, the
possibilities are:

perform_request_search(p=" 'plo' ") #simple search (encapsulating your
search query in single quotes(' ') means substring match, while double
quotes (" ") means exact phrase search.

Well, I need a substring search in the case in question but
no fancy stuff like full regexp machinery.

but this is in general a bit
confusing for people so we will probably drop this behavior in the near
future

Ok. And agree about the confusion. Quoting in whatever way
has some notion of "literal search".

So...

[...]
perform_request_search(p1="plo", m1='p') # advanced search + Partial
Phrase (substring match)

this seems the way to go, right? I.e. reaching substring at
minimal footprint. Did it and seems to work for our usecase.

I wonder if this is intuitive from an end users perspective. Going to
[...]

My impression is that Google by default does word search
and not substring search. I see on their advance search
page that they have as the default option 'all these
words' .. but I can't find something clear on this.

Google is indeed a bit "unclear" on the documentation end.
It just made me wonder that once I give a (in a way) more
precise query I end up with more results.

I think having by default magic truncation would introduce
a lot of noise.

By default probably yes.

However, we stumbled upon Invenios behaviour some time ago
already and now I understant that it is a similar context:
I've a list of journals and one is named "J. Theor. Phys.".
There searching for "Theo" didn't yield any result either.
Unfortunately, many people don't care about the "proper"
shorthand and use "J. Theo. Phys." Now, even if they spell
it on the phone you'll likely miss out the missing "r" and
it took me some time to fiddle out what was going on there.
(Knowing that there has to be a hit.)

In discussing this issue some time ago Tibor opened ticket
#916 and I think it's implementation might avoid the clutter
while helping in the problem at hand.

I now understand that it is acutally the very same behaviour
we stumble upon here again. Even the functionality we're
implementing is basically the same just working on different
collections returning different formats.

We have a bit of right truncation (because we are using
stemming) so, when searching for something, you will find
also the plural & other forms, but for having more than
that, the user needs to specify it in the query.

My initial problem was that I didn't understand why I had to
specify it the regexp way I finally did. It's all backend
stuff here. If you're interested I could probably show you
at the upcoming workshop @CERN. Its all related to this
"authority thing" we need and probably gets a bit lengthy to
explain by mail due to the rather complex sourroundings
where it lives. (Though it's simple in the essence.)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : [email protected]
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Kennen Sie schon unsere app? http://www.fz-juelich.de/app

Reply via email to