Jason, this alone seems to leave trailing spaces in the facet entry table, since the space before the semicolon is left, which is required for the series index to not concatenate the last word of one 490 with the first word of the next 490.
I tried adding a second normalizer that just strips trailing spaces and that seems to take care of it. insert into config.metabib_field_index_norm_map (field,norm,params,pos) values (1,18,'[" *$","",""]',-1); -- Change the first normazlier position to -2. There is also the btrim normalizer, I don't know if that would be a better/faster than using another regexp_replace. Josh Stompro - LARL IT Director From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Boyer, Jason A Sent: Wednesday, March 01, 2017 10:22 AM To: Evergreen Discussion Group Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed Thanks for figuring this out, Josh. I was able to modify our normalizer like so to continue removing the $v: BEGIN; UPDATE config. index_normalizer SET param_count =3 WHERE id IN (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace'); UPDATE config.metabib_field_index_norm_map SET params='["; *[0-9]*","","g"]' WHERE field = 1 and norm in (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace'); COMMIT; If you have more than 1 normalizer that uses regexp_replace or are using it on more than one field you won't want to use this as-is, but if you only have the 1 and are currently only using it on your series titles it's good to go. Jason -- Jason Boyer MIS Supervisor Indiana State Library http://library.in.gov/ From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Josh Stompro Sent: Wednesday, March 01, 2017 10:41 AM To: Evergreen Discussion Group <open-ils-general@list.georgialibraries.org<mailto:open-ils-general@list.georgialibraries.org>> Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed **** This is an EXTERNAL email. Exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. **** ________________________________ Removing the regex replace normalizer did take care of it, sorry I didn't try that before posting. I think my regex will have to be more selective, only getting rid of the number and the ';' so it doesn't clear out too much data. Josh Stompro - LARL IT Director From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Josh Stompro Sent: Wednesday, March 01, 2017 9:19 AM To: open-ils-general@list.georgialibraries.org<mailto:open-ils-general@list.georgialibraries.org> Subject: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed Hello, we have noticed that only the first 490 get indexed for our series search index. But all 490's get added to the series facet entry. For example, here is a title with two 490's in mods32 format. https://egcatalog.larl.org/opac/extras/unapi?id=tag::U2@bre/237592&format=mods32 The second 490 of "Felicity classic" isn't searchable. When I look at the metabib.combined_series_field_entry I see the following for this record. record metabib_field index_vector 237592 'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl' 237592 1 'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl' metabib.series_field_entry id source field Value index_vector 430451 237592 1 American Girl Beforever Felicity 'american':1A,5C 'beforev':7C 'beforever':3A 'felic':8C 'felicity':4A 'girl':2A,6C Metabib.facet_entry value count bibid American Girl Beforever Felicity 1 237592 Felicity classic 1 237592 The one thing that I have done is to add a search normalizer to get rid of the series numbering from the facet entry. Unfortunately I don't remember if this issue came up before I added the normalizer. Maybe when used on the index version the regex replace is actually acting on all the 490 info concatenated together, so by getting rid of everything after the first ' ;' I'm clearing the second 490 entry data? But it does work correctly on the facet data? There is a note on https://wiki.evergreen-ils.org/doku.php?id=documentation:indexing#field_normalization_settings "Note: Only normalizations with a negative pos value are applied to the facet version of indexed terms!" But that must not mean that the normalizer only acts on the facet when there is a negative pos value? This is going to be wide, but here is our normalizer setup and our series metabib field info. id field norm params pos id field_class name label xpath weight format search_field facet_field browse_field browse_xpath browse_sort_xpath facet_xpath authority_xpath joiner restrict id name description func param_count 51 32 2 0 32 series browse Series Title (Browse) //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"] 1 mods32 false false true *[local-name() != "nonSort"] //@xlink:href false 2 Normalize date range Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index. split_date_range 0 1 1 2 0 1 series seriestitle Series Title //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")] 1 mods32 true true false //@xlink:href false 2 Normalize date range Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index. split_date_range 0 62 1 13 ["[",""] -1 1 series seriestitle Series Title //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")] 1 mods32 true true false //@xlink:href false 13 Replace Replace all occurences of first parameter in the string with the second parameter. replace 2 61 1 13 ["]",""] -1 1 series seriestitle Series Title //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")] 1 mods32 true true false //@xlink:href false 13 Replace Replace all occurences of first parameter in the string with the second parameter. replace 2 52 32 17 0 32 series browse Series Title (Browse) //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"] 1 mods32 false false true *[local-name() != "nonSort"] //@xlink:href false 17 Search Normalize Apply search normalization rules to the extracted text. A less extreme version of NACO normalization. search_normalize 0 2 1 17 0 1 series seriestitle Series Title //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")] 1 mods32 true true false //@xlink:href false 17 Search Normalize Apply search normalization rules to the extracted text. A less extreme version of NACO normalization. search_normalize 0 64 1 18 [" *;.*",""] -1 1 series seriestitle Series Title //mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")] 1 mods32 true true false //@xlink:href false 18 Replace by regular expression regexp_replace 2 Thanks for any ideas you might have. Josh Lake Agassiz Regional Library - Moorhead MN larl.org Josh Stompro | Office 218.233.3757 EXT-139 LARL IT Director | Cell 218.790.2110