Jason, I would really like to leave the series index info in the search index.  
It would be nice if staff/customers could do a series search like "Harry Potter 
1" to get all the titles for the first harry potter book.

It seems like the issue is that one config.metabib_field entry for Series Title 
is set to both search_field and facet_field.  If I turn off the facet_field 
flag for that entry and create a new entry for a series title facet, and then 
just apply the normalizer to that field,  I wonder if that would do it?  So the 
facet entries would get cleaned up, but the index entries would be left alone.

Josh Stompro - LARL IT Director

From: Open-ils-general 
[mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Boyer, 
Jason A
Sent: Wednesday, March 01, 2017 10:22 AM
To: Evergreen Discussion Group
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

Thanks for figuring this out, Josh. I was able to modify our normalizer like so 
to continue removing the $v:
BEGIN;
UPDATE config. index_normalizer SET param_count =3 WHERE id IN (SELECT id FROM 
config. index_normalizer WHERE func = 'regexp_replace');
UPDATE config.metabib_field_index_norm_map SET params='["; *[0-9]*","","g"]' 
WHERE field = 1 and norm in (SELECT id FROM config. index_normalizer WHERE func 
= 'regexp_replace');
COMMIT;

If you have more than 1 normalizer that uses regexp_replace or are using it on 
more than one field you won't want to use this as-is, but if you only have the 
1 and are currently only using it on your series titles it's good to go.

Jason

--
Jason Boyer
MIS Supervisor
Indiana State Library
http://library.in.gov/

From: Open-ils-general 
[mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Josh 
Stompro
Sent: Wednesday, March 01, 2017 10:41 AM
To: Evergreen Discussion Group 
<open-ils-general@list.georgialibraries.org<mailto:open-ils-general@list.georgialibraries.org>>
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

**** This is an EXTERNAL email. Exercise caution. DO NOT open attachments or 
click links from unknown senders or unexpected email. ****
________________________________
Removing the regex replace normalizer did take care of it, sorry I didn't try 
that before posting.  I think my regex will have to be more selective, only 
getting rid of the number and the ';' so it doesn't clear out too much data.

Josh Stompro - LARL IT Director

From: Open-ils-general 
[mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Josh 
Stompro
Sent: Wednesday, March 01, 2017 9:19 AM
To: 
open-ils-general@list.georgialibraries.org<mailto:open-ils-general@list.georgialibraries.org>
Subject: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

Hello, we have noticed that only the first 490 get indexed for our series 
search index.  But all 490's get added to the series facet entry.

For example, here is a title with two 490's in mods32 format.
https://egcatalog.larl.org/opac/extras/unapi?id=tag::U2@bre/237592&format=mods32

The second 490 of "Felicity classic" isn't searchable.

When I look at the metabib.combined_series_field_entry I see the following for 
this record.
record

metabib_field

index_vector

237592

'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'

237592

1

'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'


metabib.series_field_entry
id

source

field

Value

index_vector

430451

237592

1

American Girl Beforever Felicity

'american':1A,5C 'beforev':7C 'beforever':3A 'felic':8C 'felicity':4A 
'girl':2A,6C


Metabib.facet_entry
value

count

bibid

American Girl Beforever Felicity

1

237592

Felicity classic

1

237592



The one thing that I have done is to add a search normalizer to get rid of the 
series numbering from the facet entry.  Unfortunately I don't remember if this 
issue came up before I added the normalizer.  Maybe when used on the index 
version the regex replace is actually acting on all the 490 info concatenated 
together, so by getting rid of everything after the first ' ;' I'm clearing the 
second 490 entry data?  But it does work correctly on the facet data?

There is a note on  
https://wiki.evergreen-ils.org/doku.php?id=documentation:indexing#field_normalization_settings
"Note: Only normalizations with a negative pos value are applied to the facet 
version of indexed terms!"  But that must not mean that the normalizer only 
acts on the facet when there is a negative pos value?

This is going to be wide, but here is our normalizer setup and our series 
metabib field info.

id

field

norm

params

pos

id

field_class

name

label

xpath

weight

format

search_field

facet_field

browse_field

browse_xpath

browse_sort_xpath

facet_xpath

authority_xpath

joiner

restrict

id

name

description

func

param_count

51

32

2

0

32

series

browse

Series Title (Browse)

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]

1

mods32

false

false

true

*[local-name() != "nonSort"]

//@xlink:href

false

2

Normalize date range

Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.

split_date_range

0

1

1

2

0

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

2

Normalize date range

Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.

split_date_range

0

62

1

13

["[",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

13

Replace

Replace all occurences of first parameter in the string with the second 
parameter.

replace

2

61

1

13

["]",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

13

Replace

Replace all occurences of first parameter in the string with the second 
parameter.

replace

2

52

32

17

0

32

series

browse

Series Title (Browse)

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]

1

mods32

false

false

true

*[local-name() != "nonSort"]

//@xlink:href

false

17

Search Normalize

Apply search normalization rules to the extracted text. A less extreme version 
of NACO normalization.

search_normalize

0

2

1

17

0

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

17

Search Normalize

Apply search normalization rules to the extracted text. A less extreme version 
of NACO normalization.

search_normalize

0

64

1

18

[" *;.*",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

18

Replace by regular expression

regexp_replace

2


Thanks for any ideas you might have.
Josh

Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro     | Office 218.233.3757 EXT-139
LARL IT Director | Cell 218.790.2110

Reply via email to