[Bug 70873] Cirrus unable to find insource:"mazovia.pl" on pl.wp where the phrase occurs in a URL

bugzilla-daemon Tue, 16 Sep 2014 08:50:41 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=70873


--- Comment #3 from Nik Everett <[email protected]> ---
(In reply to Bartosz Dziewoński from comment #2)
> Oh, so URLs are one "segment", and this doesn't find "substrings"? That
> makes sense.
> 
> Splitting on these characters sounds reasonable to me. There are some cases
> like "AC/DC", but that shouldn't cause any problems, right?

You've got it.  The way search works is that all the words are segmented
(tokenized) and then normalized and then indexed for quick lookup.  The trick
is that each language is subtly different and I only speak English so I can
only validate that choices make sense there.  And its hard to propose changes
that cross many languages.

Anyway, I'll see if I can make a tool to easily look at how words are segmented
in your language.  And I'll see if I can make it easy to experiment a bit with
stuff.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 70873] Cirrus unable to find insource:"mazovia.pl" on pl.wp where the phrase occurs in a URL

Reply via email to