On Sat, Jul 28, 2018 at 2:02 AM Stas Malyshev
wrote:
> Hi!
>
> > The top 1000
> > is:
> https://docs.google.com/spreadsheets/d/1E58W_t_o6vTNUAx_TG3ifW6-eZE4KJ2VGEaBX_74YkY/edit?usp=sharing
>
> This one is pretty interesting, how do I extract this data? It may be
> useful independently of what
Hi!
> The top 1000
> is:
> https://docs.google.com/spreadsheets/d/1E58W_t_o6vTNUAx_TG3ifW6-eZE4KJ2VGEaBX_74YkY/edit?usp=sharing
This one is pretty interesting, how do I extract this data? It may be
useful independently of what we're discussing here.
--
Stas Malyshev
smalys...@wikimedia.org
Hi!
> I think we already index way more than P31 and P279.
Oh yes, all the string properties.
> So I think that the increase is smaller than what you anticipate.
> What I'd try to avoid in general is indexing terms that have only doc
> since they are pretty useless.
For unique string
On Fri, Jul 27, 2018 at 3:31 PM David Causse wrote:
> What I'd try to avoid in general is indexing terms that have only doc
> since they are pretty useless.
>
I meant: that have only *one* doc
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
Hi,
I think we already index way more than P31 and P279.
For instance we have 102.301.706 (approximation) distinct values in the
term lexicon for statement_keywords.
Sadly I can't extract the list of unique PIDs used (we'd have to enable
field_data on statement_keywords.property).
The top 1000