Re: [Wikidata] Wikidata fulltext search prototype

2017-12-28 Thread Riccardo Tasso
Hi, it seems very good.
What about suggestions?

Riccardo

2017-12-18 22:31 GMT+01:00 Stas Malyshev :

> Hi!
>
> > I guess its using an older index from a few weeks ago ?  Doesn't seem to
> > have the latest properties that have landed, but that's ok if the ES
> > index isn't current yet and your just experimenting and getting feedback.
>
> Yes, exactly. Wikidata index is big, and we can not use main index since
> we're experimenting on it, so we make a copy and use that. Of course,
> the copy gets out of date :) This one is couple of weeks old.
>
> >
> > http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition=
> Special:Search=advanced=1=1=
> 25rdek6vt4n1ekkk5ht0ew0vv
> >
> > Didn't see
> > https://www.wikidata.org/wiki/Property:P4653
>
> yes, too recent :)
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Stas Malyshev
Hi!

> I guess its using an older index from a few weeks ago ?  Doesn't seem to
> have the latest properties that have landed, but that's ok if the ES
> index isn't current yet and your just experimenting and getting feedback.

Yes, exactly. Wikidata index is big, and we can not use main index since
we're experimenting on it, so we make a copy and use that. Of course,
the copy gets out of date :) This one is couple of weeks old.

> 
> http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition=Special:Search=advanced=1=1=25rdek6vt4n1ekkk5ht0ew0vv
> 
> Didn't see
> https://www.wikidata.org/wiki/Property:P4653

yes, too recent :)

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Thad Guidry
Hmm, it seems hit or miss.  Perhaps your sitelinks scoring algorithm is
having too much of an impact here ?  Because it seems like several times a
nearly full phrase will be ranked much lower than an incomplete or partial
phrase.

For example Cart World Series is showing lower ranked in this query, where
I expected it to be nearly 1st, given "cart+world":

http://wikidata-wdsearch.wmflabs.org/w/index.php?search=cart+world=Special:Search=default=1=cnkg3v57431l8qwwwt8caqtj0

-Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Thad Guidry
Hi Stas,

I guess its using an older index from a few weeks ago ?  Doesn't seem to
have the latest properties that have landed, but that's ok if the ES index
isn't current yet and your just experimenting and getting feedback.

http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition=Special:Search=advanced=1=1=25rdek6vt4n1ekkk5ht0ew0vv

Didn't see
https://www.wikidata.org/wiki/Property:P4653


On Mon, Dec 18, 2017 at 1:20 PM Stas Malyshev 
wrote:

> Hi!
>
> > Where can I learn about the internals of this jewel? (which search
> > engine, what metrics are used to rank items, and so on).
>
> Thanks for your kind words. You can track it here:
>
> https://phabricator.wikimedia.org/T125500
>
> and associated tasks like this one:
> https://phabricator.wikimedia.org/T178851
>
> which contain links to the patches. The search runs on the same
> ElasticSearch we use for search on other sites, but the prototype has
> specific code to deal with Wikidata specific data structure and the fact
> that it is, unlike most other Wikimedia sites, multilingual by design.
>
> The rankings are hand-tuned now and kind of hard to read right now
> (we're working on improving this), they are contained here:
> https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/
> and specific functions we're using here:
>
> https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/ElasticSearchRescoreFunctions.php;4c6aa54e56c68ebd3543b23c88f52ae6f176a079$25
>
> Basically it's a combination of match score (how well the string matches
> the query), incoming link count, sitelink count and special boosts like
> demoting the disambiguation pages.
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Stas Malyshev
Hi!

> Where can I learn about the internals of this jewel? (which search
> engine, what metrics are used to rank items, and so on).

Thanks for your kind words. You can track it here:

https://phabricator.wikimedia.org/T125500

and associated tasks like this one:
https://phabricator.wikimedia.org/T178851

which contain links to the patches. The search runs on the same
ElasticSearch we use for search on other sites, but the prototype has
specific code to deal with Wikidata specific data structure and the fact
that it is, unlike most other Wikimedia sites, multilingual by design.

The rankings are hand-tuned now and kind of hard to read right now
(we're working on improving this), they are contained here:
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/
and specific functions we're using here:
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/ElasticSearchRescoreFunctions.php;4c6aa54e56c68ebd3543b23c88f52ae6f176a079$25

Basically it's a combination of match score (how well the string matches
the query), incoming link count, sitelink count and special boosts like
demoting the disambiguation pages.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Antonin Delpeuch (lists)
Awww… this is awesome! It works really well, I can't wait to see this
deployed.

This is going to give a huge boost to the OpenRefine reconciliation service.

Where can I learn about the internals of this jewel? (which search
engine, what metrics are used to rank items, and so on).

Antonin

On 18/12/2017 18:23, Stas Malyshev wrote:
> Hi!
> 
> Search Platform team would like to present a prototype test site of new
> and improved Wikidata fulltext search:
> 
> http://wikidata-wdsearch.wmflabs.org/wiki/Special:Search
> 
> Please try your favorite searches on it and report whether it looks good
> and which problems you notice.
> 
> Important to note for this prototype:
> 
> - The data in the search is imported from Wikidata index but not updated
> from it after import, so it may be slightly out of date
> 
> - The search is in English by default but you can try other languages by
> using uselang parameter, e.g.:
> http://wikidata-wdsearch.wmflabs.org/w/index.php?search=Wien=Special:Search=advanced=1=1=de
> Note that since it's a test site, this is probably the best way to test
> non-English searches as logins etc. may not work there properly.
> 
> - Search would work properly only for main & property namespace (0 and
> 120).
> 
> What kind of problems we are looking for?
> 
> - Ranking and retrieval problems, i.e. result X appears too low or too
> high in specific search, or does not appear at all (please tell us
> specific search query and expected result)
> 
> - UI problems - i.e. the ranking is fine but highlighting or label or
> description is broken or look bad, or not highlighting the result that
> should be highlighted
> 
> Of course, if some search result worked spectacularly better for you, it
> would be nice to know too :)
> 
> What should work?
> 
> Any search in Special:Search in main namespace and Property namespace
> should produce sensible result. Searches without advanced syntax should
> have better results than before, and search with advanced syntax (+, -,
> *, quotes, etc.) should work no worse than before.
> 
> Please note that this is a test wiki, so nothing else but search is
> expected to work, including clicking on other links, editing, browsing
> to other pages, etc. This is also a test site, so short disruptions
> might be possible when we update or change things or fix bugs reported
> by you :)
> 
> How to provide feedback?
> 
> Several ways are possible:
> - Reply to this list or personally to me if you prefer
> - On-wiki message on my talk page:
> https://www.wikidata.org/wiki/User_talk:Smalyshev_(WMF)
> - Talk to us on IRC: #wikimedia-discovery
> 
> Thanks!
> 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata fulltext search prototype

2017-12-18 Thread Stas Malyshev
Hi!

Search Platform team would like to present a prototype test site of new
and improved Wikidata fulltext search:

http://wikidata-wdsearch.wmflabs.org/wiki/Special:Search

Please try your favorite searches on it and report whether it looks good
and which problems you notice.

Important to note for this prototype:

- The data in the search is imported from Wikidata index but not updated
from it after import, so it may be slightly out of date

- The search is in English by default but you can try other languages by
using uselang parameter, e.g.:
http://wikidata-wdsearch.wmflabs.org/w/index.php?search=Wien=Special:Search=advanced=1=1=de
Note that since it's a test site, this is probably the best way to test
non-English searches as logins etc. may not work there properly.

- Search would work properly only for main & property namespace (0 and
120).

What kind of problems we are looking for?

- Ranking and retrieval problems, i.e. result X appears too low or too
high in specific search, or does not appear at all (please tell us
specific search query and expected result)

- UI problems - i.e. the ranking is fine but highlighting or label or
description is broken or look bad, or not highlighting the result that
should be highlighted

Of course, if some search result worked spectacularly better for you, it
would be nice to know too :)

What should work?

Any search in Special:Search in main namespace and Property namespace
should produce sensible result. Searches without advanced syntax should
have better results than before, and search with advanced syntax (+, -,
*, quotes, etc.) should work no worse than before.

Please note that this is a test wiki, so nothing else but search is
expected to work, including clicking on other links, editing, browsing
to other pages, etc. This is also a test site, so short disruptions
might be possible when we update or change things or fix bugs reported
by you :)

How to provide feedback?

Several ways are possible:
- Reply to this list or personally to me if you prefer
- On-wiki message on my talk page:
https://www.wikidata.org/wiki/User_talk:Smalyshev_(WMF)
- Talk to us on IRC: #wikimedia-discovery

Thanks!
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata