[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

 CC||aklap...@wikimedia.org
Summary|Ancient greek accents   |No normalization for
   |problem in searches |ancient greek accents in
   ||searches

--- Comment #1 from Andre Klapper aklap...@wikimedia.org ---
Thanks for taking the time to report this!

I tried the search on https://el.wikipedia.org (which uses the CirrusSearch
extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα.
Which search backend/extension do you use? Which MediaWiki version is this?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

Nik Everett neverett+bugzi...@wikimedia.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Nik Everett neverett+bugzi...@wikimedia.org ---
Cirrus uses Elasticsearch for the anlaysis which in turn uses Apache Lucene.  I
imagine the right place to implement this is there.

It looks like
https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/el/GreekLowerCaseFilter.java
implements the normalization.  I'd file a bug over there.  It doesn't _look_
like adding the extra normalization would be that hard.  I suppose you'd have
to decide with them whether they should be enabled by default (so you could
just add them to that file) or optional.  If optional you'd just make a new
filter I believe.

After its released in Lucene and Elasticsearch we could enable it by default
for Greek across the site I think.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

Nik Everett neverett+bugzi...@wikimedia.org changed:

   What|Removed |Added

   Keywords||upstream

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #3 from paolo anghileri panghil...@digitalminds.it ---
(In reply to Andre Klapper from comment #1)
 Thanks for taking the time to report this!
 
 I tried the search on https://el.wikipedia.org (which uses the CirrusSearch
 extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα.
 Which search backend/extension do you use? Which MediaWiki version is this?

Thank you Andre for the reply.
This is the same situation I have found in my searches


My need is being able to search and retrieve ancient greek worlds even with
vowels ortographical details specified ( άλφα searchstring retrtieves άλφα,
αλφα and άλφα) and without vowels ortograhical details specified (αλφα
searchstring retrtieves άλφα, αλφα and άλφα)

The fact it works for modern greek but not for ancient suggest me that in this
case ancient greek is not supported, while modern, which has different
ortographical details, works.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #4 from paolo anghileri panghil...@digitalminds.it ---
(In reply to Andre Klapper from comment #1)
 Thanks for taking the time to report this!
 
 I tried the search on https://el.wikipedia.org (which uses the CirrusSearch
 extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα.
 Which search backend/extension do you use? Which MediaWiki version is this?

About the second part of the question, I am at a first preliminary step for
this project and did not install a mediawiki for this at the moment, so I made
tests only on public mediawiki instances for the moment, for instance
el.wiktionary.org

I will do local test in the next days. About search backend or extensions do
you have any suggestions?

Thanks again

Paolo

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #5 from paolo anghileri panghil...@digitalminds.it ---
(In reply to Nik Everett from comment #2)

Thank you Nik, I had a look at that file. 
I am not an experienced mediawiki developer, but if the problem is really
related to that, maybe I can provide some help in adding extra normalization.

Thanks

Paolo

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #6 from Nik Everett neverett+bugzi...@wikimedia.org ---
(In reply to paolo anghileri from comment #5)


If you want to propose a change to implement it in Lucene then link it here and
I'll jump over there and help.  I'm not a Lucene committer but I can certainly
review it and prod a committer.

(In reply to paolo anghileri from comment #4)
 I will do local test in the next days. About search backend or extensions do
 you have any suggestions?

Use CirrusSearch.  Its the search backend that we use on all of our wikis.  Its
better than the built in MySQL search in just about every way.  Its the only
option to get that normalization from Lucene to take effect as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #7 from paolo anghileri panghil...@digitalminds.it ---

(In reply to Nik Everett from comment #6)

Provided I am not a wikimedia expert and did not explore yet CirruSearch code,
as a CirruSearch developer do you think this normalization should go through
Lucene or is it possible to implement it direcly in CirrusSearch extension, or
maybe in its dependency elasticsearch?

Otherwise, if this can be done only passing through Lucene, I'll try adding
extra normalization in Lucene and propose a commitment for that.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #8 from Nik Everett neverett+bugzi...@wikimedia.org ---
(In reply to paolo anghileri from comment #7)
 (In reply to Nik Everett from comment #6)
 
 Provided I am not a wikimedia expert and did not explore yet CirruSearch
 code, as a CirruSearch developer do you think this normalization should go
 through Lucene or is it possible to implement it direcly in CirrusSearch
 extension, or maybe in its dependency elasticsearch?
 
 Otherwise, if this can be done only passing through Lucene, I'll try adding
 extra normalization in Lucene and propose a commitment for that.

Try getting it in Lucene.  Anything in Cirrus would be a nasty hack.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 73605] No normalization for ancient greek accents in searches

2014-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605

--- Comment #9 from paolo anghileri panghil...@digitalminds.it ---
(In reply to Nik Everett from comment #8)

Thanks Nik, I'll try following this way.
As you suggested I'll provide you a link for the Lucene commitment here soon,
so you can review it.

Thanks for your suggestions

Paolo

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l