http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064
Bug ID: 13064 Summary: Indexing problem with ICU on control characters Change sponsored?: --- Product: Koha Version: master Hardware: All OS: All Status: NEW Severity: normal Priority: P5 - low Component: Searching Assignee: gmcha...@gmail.com Reporter: fridolyn.som...@biblibre.com QA Contact: testo...@bugs.koha-community.org The ICU configuration files contains a rule to remove control characters : <transform rule="[:Control:] Any-Remove"/> This rule is before tokenization. The problem is that "[:Control:]" regex contains linefeed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/