Re: [Wikidata-tech] Wikidata fulltext search results output

2017-11-12 Thread Lydia Pintscher
On Nov 3, 2017 08:39, "Stas Malyshev"  wrote:

Hi!

> When showing labels from fallback languages we do have little language
> indicators in other places. I believe we should have this here as

Makes sense. I'll look into how to get those. Is language code OK or we
need full language name (uk vs. Ukrainian)?


In the other places we show the language name. If say we should do the same
here if possible.


One thing to note here is that secondary languages have no order - i.e.
if you look in German, and there's no matching German label, but there
are 10 other language labels all the same (happens a lot for names &
places), which language will be selected is anybody's guess. We could
add rule that says "look at English as secondary first", in theory, but
not sure whether we should - after all, besides having most languages,
(and us speaking it :) there's not much special about it.


Uhhh yeah. I don't have a better idea either TBH.


> I'm slightly leaning toward showing both.

OK.

> I'd say in this case we could get rid of the word/byte count. To get a
> good glimpse of the quality of the item I'd say we'd want to show
> count of statements (excluding identifier statements), identifiers and
> sitelinks.

OK, I'll try to make this.

>> 5. Display format for Wikidata and for other wikipedia sites is
different:
>> Wikpedia:
>>
>> Title
>> Snippet
>>
>> Wikidata:
>>
>> Title: Description
>>
>> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
>> the same line, separated by colon. Is there any reason for this
>> difference? Do we want to go back to the common format?
>
> Not sure if we had a reason tbh.

OK then, I'll feel free to shuffle things around then :) Having more
freedom in the title line is good because we can then display both label
& aliases.

Thanks!
--
Stas Malyshev
smalys...@wikimedia.org
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output / Fallback languages

2017-11-07 Thread Stas Malyshev
Hi!

> Two quick thoughts on fallback languages: One general rule perhaps
> could be to not select a language with a different script when one in
> the same script is available. (At least for users with a preference
> for a latin language - this may be different the other way around.)

This is a good idea, except that I don't really have tools to do this
right now, I think, at least not without adding a lot to Language
classes (maybe I'm wrong, I'll check). So for now I think I am going to
trust whoever composed the fallback chains and use those. So basically
the idea would be to display:

1. Title & description in user's display language. If the match happened
in those, it will be highlighted.
1.1 Failing to find those, walk back the fallback chain until some
title/description is found and display that one. Again, if the match
happened there, it will be highlighted.
1.2 If we still failed to find any label/description, just display Q-id
for label and nothing for description.

2. If we still did not display the actual string that was matched (i.e.
it is an alias, non-main language, etc.) - display it too, in addition
to what we already displayed.

3. If any string is displayed in language different from user's current
display language, it will have a mark that says in which language it is
displayed (except for Q-id string, obviously).

I think this should do for now, but we can always improve it later :)
Thanks,
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output / Fallback languages

2017-11-07 Thread Neubert, Joachim
Two quick thoughts on fallback languages: One general rule perhaps could be to 
not select a language with a different script when one in the same script is 
available. (At least for users with a preference for a latin language - this 
may be different the other way around.)

A broader solution could be offering the user an individual setting for a the 
sequence of fallback languages (as used, e.g., in http content negotiation) 
across all Wikidata interfaces. But of course that's a much larger effort, and 
has perhaps been discussed elsewhere already.

Cheers, Joachim

> -Ursprüngliche Nachricht-
> Von: Wikidata-tech [mailto:wikidata-tech-boun...@lists.wikimedia.org] Im
> Auftrag von Stas Malyshev
> Gesendet: Freitag, 3. November 2017 01:40
> An: Wikidata technical discussion; Lydia Pintscher
> Cc: Internal communications for WMF search and discovery team
> Betreff: Re: [Wikidata-tech] Wikidata fulltext search results output
> 
> Hi!
> 
> > When showing labels from fallback languages we do have little language
> > indicators in other places. I believe we should have this here as
> 
> Makes sense. I'll look into how to get those. Is language code OK or we need
> full language name (uk vs. Ukrainian)?
> 
> One thing to note here is that secondary languages have no order - i.e.
> if you look in German, and there's no matching German label, but there are 10
> other language labels all the same (happens a lot for names & places), which
> language will be selected is anybody's guess. We could add rule that says 
> "look
> at English as secondary first", in theory, but not sure whether we should - 
> after
> all, besides having most languages, (and us speaking it :) there's not much
> special about it.
> 
> > I'm slightly leaning toward showing both.
> 
> OK.
> 
> > I'd say in this case we could get rid of the word/byte count. To get a
> > good glimpse of the quality of the item I'd say we'd want to show
> > count of statements (excluding identifier statements), identifiers and
> > sitelinks.
> 
> OK, I'll try to make this.
> 
> >> 5. Display format for Wikidata and for other wikipedia sites is different:
> >> Wikpedia:
> >>
> >> Title
> >> Snippet
> >>
> >> Wikidata:
> >>
> >> Title: Description
> >>
> >> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it
> >> on the same line, separated by colon. Is there any reason for this
> >> difference? Do we want to go back to the common format?
> >
> > Not sure if we had a reason tbh.
> 
> OK then, I'll feel free to shuffle things around then :) Having more freedom 
> in
> the title line is good because we can then display both label & aliases.
> 
> Thanks!
> --
> Stas Malyshev
> smalys...@wikimedia.org
> 
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-11-03 Thread Brtyzne Delacruz
http://www.investorvlp.com/phoenix.zhtml?c=252308&p=irol-stockquoteBryntzne

On Nov 2, 2017 10:05 PM, robertdelacruz0...@gmail.com wrote:

Philipino Cuban Jakarata Japan Magic fedia Una Anatorio Iron Fist Stallin
First of Alis Hienzeal Alias Bryntzne Robert S Delacruz Robert M Delacruz
Robert G Delacruz Robert H Delacruz Robert X Delacruz Robert A
Delacruz.=R1 No Change inner working HISTORIA B 4 SIR Hugene Issis
Eugene Promote Ur Culture Adapt or callapse Your Union Offenssive Windows
Brakeing  Law Build a Wall Crisis Guallce  Stand For ××× History Canada
Mexico France  SouthWest Angle Saxon My Elizabathan English Palabras Racism
Destiny Do Donts pro Long Hebrew Pro Con Kings Men BloodLine You Cant Find
A Genration Century Late but Robert Downy I Am For All Under G.O.D you will
Moscow Seca Outline Above the Law OMB IBEW IMB Ant Vex Spectum Illuminate
sermon Homily 90s Illumanati Dela Cartels On the Way to a
Azillvillalilaliawilla Hallocaust

On Nov 2, 2017 5:40 PM, "Stas Malyshev"  wrote:

Hi!

> When showing labels from fallback languages we do have little language
> indicators in other places. I believe we should have this here as

Makes sense. I'll look into how to get those. Is language code OK or we
need full language name (uk vs. Ukrainian)?

One thing to note here is that secondary languages have no order - i.e.
if you look in German, and there's no matching German label, but there
are 10 other language labels all the same (happens a lot for names &
places), which language will be selected is anybody's guess. We could
add rule that says "look at English as secondary first", in theory, but
not sure whether we should - after all, besides having most languages,
(and us speaking it :) there's not much special about it.

> I'm slightly leaning toward showing both.

OK.

> I'd say in this case we could get rid of the word/byte count. To get a
> good glimpse of the quality of the item I'd say we'd want to show
> count of statements (excluding identifier statements), identifiers and
> sitelinks.

OK, I'll try to make this.

>> 5. Display format for Wikidata and for other wikipedia sites is
different:
>> Wikpedia:
>>
>> Title
>> Snippet
>>
>> Wikidata:
>>
>> Title: Description
>>
>> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
>> the same line, separated by colon. Is there any reason for this
>> difference? Do we want to go back to the common format?
>
> Not sure if we had a reason tbh.

OK then, I'll feel free to shuffle things around then :) Having more
freedom in the title line is good because we can then display both label
& aliases.

Thanks!
--
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-11-03 Thread Brtyzne Delacruz
Philipino Cuban Jakarata Japan Magic fedia Una Anatorio Iron Fist Stallin
First of Alis Hienzeal Alias Bryntzne Robert S Delacruz Robert M Delacruz
Robert G Delacruz Robert H Delacruz Robert X Delacruz Robert A
Delacruz.=R1 No Change inner working HISTORIA B 4 SIR Hugene Issis
Eugene Promote Ur Culture Adapt or callapse Your Union Offenssive Windows
Brakeing  Law Build a Wall Crisis Guallce  Stand For ××× History Canada
Mexico France  SouthWest Angle Saxon My Elizabathan English Palabras Racism
Destiny Do Donts pro Long Hebrew Pro Con Kings Men BloodLine You Cant Find
A Genration Century Late but Robert Downy I Am For All Under G.O.D you will
Moscow Seca Outline Above the Law OMB IBEW IMB Ant Vex Spectum Illuminate
sermon Homily 90s Illumanati Dela Cartels On the Way to a
Azillvillalilaliawilla Hallocaust

On Nov 2, 2017 5:40 PM, "Stas Malyshev"  wrote:

Hi!

> When showing labels from fallback languages we do have little language
> indicators in other places. I believe we should have this here as

Makes sense. I'll look into how to get those. Is language code OK or we
need full language name (uk vs. Ukrainian)?

One thing to note here is that secondary languages have no order - i.e.
if you look in German, and there's no matching German label, but there
are 10 other language labels all the same (happens a lot for names &
places), which language will be selected is anybody's guess. We could
add rule that says "look at English as secondary first", in theory, but
not sure whether we should - after all, besides having most languages,
(and us speaking it :) there's not much special about it.

> I'm slightly leaning toward showing both.

OK.

> I'd say in this case we could get rid of the word/byte count. To get a
> good glimpse of the quality of the item I'd say we'd want to show
> count of statements (excluding identifier statements), identifiers and
> sitelinks.

OK, I'll try to make this.

>> 5. Display format for Wikidata and for other wikipedia sites is
different:
>> Wikpedia:
>>
>> Title
>> Snippet
>>
>> Wikidata:
>>
>> Title: Description
>>
>> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
>> the same line, separated by colon. Is there any reason for this
>> difference? Do we want to go back to the common format?
>
> Not sure if we had a reason tbh.

OK then, I'll feel free to shuffle things around then :) Having more
freedom in the title line is good because we can then display both label
& aliases.

Thanks!
--
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-11-02 Thread Stas Malyshev
Hi!

> When showing labels from fallback languages we do have little language
> indicators in other places. I believe we should have this here as

Makes sense. I'll look into how to get those. Is language code OK or we
need full language name (uk vs. Ukrainian)?

One thing to note here is that secondary languages have no order - i.e.
if you look in German, and there's no matching German label, but there
are 10 other language labels all the same (happens a lot for names &
places), which language will be selected is anybody's guess. We could
add rule that says "look at English as secondary first", in theory, but
not sure whether we should - after all, besides having most languages,
(and us speaking it :) there's not much special about it.

> I'm slightly leaning toward showing both.

OK.

> I'd say in this case we could get rid of the word/byte count. To get a
> good glimpse of the quality of the item I'd say we'd want to show
> count of statements (excluding identifier statements), identifiers and
> sitelinks.

OK, I'll try to make this.

>> 5. Display format for Wikidata and for other wikipedia sites is different:
>> Wikpedia:
>>
>> Title
>> Snippet
>>
>> Wikidata:
>>
>> Title: Description
>>
>> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
>> the same line, separated by colon. Is there any reason for this
>> difference? Do we want to go back to the common format?
> 
> Not sure if we had a reason tbh.

OK then, I'll feel free to shuffle things around then :) Having more
freedom in the title line is good because we can then display both label
& aliases.

Thanks!
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-10-26 Thread Lydia Pintscher
Hey :)

Thanks for getting this started.

On Wed, Oct 25, 2017 at 2:49 AM, Stas Malyshev  wrote:
> Hi!
>
> As I am working on improving Wikidata fulltext search[1], I'd like to
> talk about search results page. Right now search results page for
> Wikidata is less than ideal, here are the issues I see with it:
>
> - No match highlighting
> - Meaningless data, like word count (anybody cares to guess what it is
> counting? Anybody ever used it?) and byte count (more useful than word
> count but not by much)
> - Obviously, search quality is not super high, but that should be
> improved with proper description indexing
>
> While working on improving the situation, I would like to solicit
> opinions on the set of questions about how the search results page
> should look like. Namely:
>
> 1. If the match is made on label/description that does not match current
> display language, we could opt for:
> a) Displaying the description that matched, highlighted. Optionally
> maybe display the language of the match (in display language?)
> b) Displaying the description in display language, un-highlighted.
> Which option is preferable?

When showing labels from fallback languages we do have little language
indicators in other places. I believe we should have this here as
well. Otherwise I believe it is confusing where certain labels
suddenly come from because you might not see them when going to the
actual item.

> 2. What we do if the match is on alias? Do we display matching alias,
> original label or both? The question above also applies if the match is
> on other language alias.

I'm slightly leaning toward showing both.

> 3. It looks clear to me that words count is useless. Is byte count
> useful and does it need to be kept?

It helps in the cases where you want to get an understanding about how
large an item is and if it is worth your attention. If people actually
use it... Not sure. They definitely do in recent changes and history.

> 4. Do we want to display any other parameters of the entity? E.g. we
> have in the index: statement_count, sitelink_count, label_count,
> incoming_links, etc. Do we want to display any?

I'd say in this case we could get rid of the word/byte count. To get a
good glimpse of the quality of the item I'd say we'd want to show
count of statements (excluding identifier statements), identifiers and
sitelinks.

> 5. Display format for Wikidata and for other wikipedia sites is different:
> Wikpedia:
>
> Title
> Snippet
>
> Wikidata:
>
> Title: Description
>
> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
> the same line, separated by colon. Is there any reason for this
> difference? Do we want to go back to the common format?

Not sure if we had a reason tbh.

> Also if you have any other things/ideas/comments about how fulltext
> search output for wikidata should be, please tell me.
>
> I am sending this to wikidata-tech and discovery team list only for now,
> since it's still work in progress and half-baked, we could open this for
> wider discussion later if necessary.
>
> [1] https://phabricator.wikimedia.org/T178851
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech



-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-10-25 Thread Stas Malyshev
Hi!

> while you are at it, some things would be very useful to be search-able
> (maybe some are already by now):
> * "primary" (not references/qualifiers) years, for birth/death/flourit etc.
> * "primary" string/monolingual values (title, taxon name, etc.)
> * "primary" IDs, e.g. VIAF (might cause confusion with years, so maybe
> only add numerical IDs if 5+ digits?)

We have the code to index statements already, and we're already indexing
P31 and P279. We could index more properties. We don't have syntax or
any other way though to actually use those in search - yet, except for
boosting (see https://gerrit.wikimedia.org/r/#/c/384632/).

We're looking at which properties to add (nominations welcome, probably
in the form of phab ticket?) - since adding them requires full reindex
of wikidata (couple of days) we probably don't want to add them one by
one but want to collect a set and then do it in one hit.

We also do not have syntax for searching (as in match, instead of boost)
by statement values, but it should not be hard - we just need to design
proper syntax and implement it (syntaxes are now pluggable, so should
not be too big of a problem).

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata fulltext search results output

2017-10-25 Thread Magnus Manske
Hi Stas,

while you are at it, some things would be very useful to be search-able
(maybe some are already by now):
* "primary" (not references/qualifiers) years, for birth/death/flourit etc.
* "primary" string/monolingual values (title, taxon name, etc.)
* "primary" IDs, e.g. VIAF (might cause confusion with years, so maybe only
add numerical IDs if 5+ digits?)

Cheers,
Magnus

On Wed, Oct 25, 2017 at 1:50 AM Stas Malyshev 
wrote:

> Hi!
>
> As I am working on improving Wikidata fulltext search[1], I'd like to
> talk about search results page. Right now search results page for
> Wikidata is less than ideal, here are the issues I see with it:
>
> - No match highlighting
> - Meaningless data, like word count (anybody cares to guess what it is
> counting? Anybody ever used it?) and byte count (more useful than word
> count but not by much)
> - Obviously, search quality is not super high, but that should be
> improved with proper description indexing
>
> While working on improving the situation, I would like to solicit
> opinions on the set of questions about how the search results page
> should look like. Namely:
>
> 1. If the match is made on label/description that does not match current
> display language, we could opt for:
> a) Displaying the description that matched, highlighted. Optionally
> maybe display the language of the match (in display language?)
> b) Displaying the description in display language, un-highlighted.
> Which option is preferable?
>
> 2. What we do if the match is on alias? Do we display matching alias,
> original label or both? The question above also applies if the match is
> on other language alias.
>
> 3. It looks clear to me that words count is useless. Is byte count
> useful and does it need to be kept?
>
> 4. Do we want to display any other parameters of the entity? E.g. we
> have in the index: statement_count, sitelink_count, label_count,
> incoming_links, etc. Do we want to display any?
>
> 5. Display format for Wikidata and for other wikipedia sites is different:
> Wikpedia:
>
> Title
> Snippet
>
> Wikidata:
>
> Title: Description
>
> I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
> the same line, separated by colon. Is there any reason for this
> difference? Do we want to go back to the common format?
>
> Also if you have any other things/ideas/comments about how fulltext
> search output for wikidata should be, please tell me.
>
> I am sending this to wikidata-tech and discovery team list only for now,
> since it's still work in progress and half-baked, we could open this for
> wider discussion later if necessary.
>
> [1] https://phabricator.wikimedia.org/T178851
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech