[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2023-12-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Nick Clemens  changed:

   What|Removed |Added

   See Also||https://bugs.koha-community
   ||.org/bugzilla3/show_bug.cgi
   ||?id=35621

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-21 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #10 from David Cook  ---
(In reply to Katrin Fischer from comment #8)
> You have to look at the full example in the links I posted. 3 lines:
> 
> 
> 
> 
> 
> So yes, but then it uses that form to remove the diacritics:
> https://www.compart.com/en/unicode/category/Mn

Ahhh right. I should've been more thorough.

I was thinking recently about how Zebra ICU has been seen as inferior to
Elasticsearch ICU on the listserv. 

Looking at
ftp://ftp.software.ibm.com/software/globalization/icu/3.6/icu-3_6-userguide.pdf,
it looks like ICU actually originated in Java (ICU4J) and was later ported to
C++ and C (ICU4C). 

According to
https://wiki.koha-community.org/wiki/Record_Indexing_and_Retrieval_Options_for_Koha,
the Zebra use of libicu is inferior to Lucence ICU which uses ICU4J. There's no
evidence given for the claim, but it seems believable (especially considering
global prominence of Solr and Elasticsearch).

Looking at https://lucene.apache.org/core/4_4_0/analyzers-icu/index.html, it
seems that writing systems can use dictionary based algorithms (for systems
like Thai script, Chinese, etc). That explains a lot. I know a bit of Chinese,
and I've wondered how indexers could handle such a context-dependent
language...

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-21 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Nick Clemens  changed:

   What|Removed |Added

 Status|Failed QA   |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #9 from Nick Clemens  ---
You are correct Katrin - it looks like there was confusion about whether a site
was using ICU when we wrote these patches. Testing on master everything works
correctly under ICU without this patch.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #8 from Katrin Fischer  ---
(In reply to David Cook from comment #7)
> (In reply to Katrin Fischer from comment #6)
> > It means, the change here should not be necessary... Nick, can you please
> > double check?
> 
> Although wouldn't that NFD change make Žižek into something like...
> Zizek?

You have to look at the full example in the links I posted. 3 lines:





So yes, but then it uses that form to remove the diacritics:
https://www.compart.com/en/unicode/category/Mn

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

David Cook  changed:

   What|Removed |Added

 CC||dc...@prosentient.com.au

--- Comment #7 from David Cook  ---
(In reply to Katrin Fischer from comment #6)
> It means, the change here should not be necessary... Nick, can you please
> double check?

Although wouldn't that NFD change make Žižek into something like...
Zizek?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Katrin Fischer  changed:

   What|Removed |Added

 Status|Needs Signoff   |Failed QA

--- Comment #6 from Katrin Fischer  ---
It means, the change here should not be necessary... Nick, can you please
double check?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #5 from Katrin Fischer  ---
(In reply to Katrin Fischer from comment #4)
> I wonder if adding the rules is the best way of achieving this. You can add
> a general rule for using the 'base letter'. We have been doing this I think.
> Found a hint about the rule here:
> 
> https://wiki.koha-community.org/wiki/ICU_do_not_undiacritic

Also see the documentation here:
http://userguide.icu-project.org/transforms/general

And our sample files using it:
https://wiki.koha-community.org/wiki/ICU_Chains_Library

This makes it unnecessary to add transliteration rules for every character
diacritic combination.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #4 from Katrin Fischer  ---
I wonder if adding the rules is the best way of achieving this. You can add a
general rule for using the 'base letter'. We have been doing this I think.
Found a hint about the rule here:

https://wiki.koha-community.org/wiki/ICU_do_not_undiacritic

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #3 from Michal Denar  ---
Sorry,
I forgot some ... all here:



























I'm ready for test :-)

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-20 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #2 from Michal Denar  ---
Hi Nick,
can we add some other czech and slavic letters into ICU too?























I'm ready for test :-)

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-17 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Michal Denar  changed:

   What|Removed |Added

 CC||blac...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Nick Clemens  changed:

   What|Removed |Added

 Status|NEW |Needs Signoff

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

--- Comment #1 from Nick Clemens  ---
Created attachment 109669
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=109669=edit
Bug 26390: Add transliteration for Z with caron in ICU chains

Bug 26390: Add transliteration for Z with caron in ICU chains

https://en.wikipedia.org/wiki/Caron

From RT 52831. Uunder ICU chains most patrons cannot search for Slavoj
Žižek

TO test:
 1 - Add a record with Slavoj Žižek as author
 2 - Enable ICU chains
 https://wiki.koha-community.org/wiki/ICU_chains_configuration
 3 - Ensure Koha is using zebra
 4 - Restart all the things and reindex
 5 - Try to search for 'Zizek'
 6 - Not found
 7 - Apply patch
 8 - Restart all the things and reindex
 9 - Try to search for Zizek
10 - It works!

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 26390] Add transliteration of Ž in ICU chains

2020-09-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26390

Nick Clemens  changed:

   What|Removed |Added

   Assignee|koha-b...@lists.koha-commun |n...@bywatersolutions.com
   |ity.org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/