Re: [Wikidata] [discovery-private] Indexing all item properties in ElasticSearch

2018-07-28 Thread David Causse
On Sat, Jul 28, 2018 at 2:02 AM Stas Malyshev wrote: > Hi! > > > The top 1000 > > is: > https://docs.google.com/spreadsheets/d/1E58W_t_o6vTNUAx_TG3ifW6-eZE4KJ2VGEaBX_74YkY/edit?usp=sharing > > This one is pretty interesting, how do I extract this data? It may be > useful independently of what

Re: [Wikidata] [discovery-private] Indexing all item properties in ElasticSearch

2018-07-27 Thread Stas Malyshev
Hi! > The top 1000 > is:  > https://docs.google.com/spreadsheets/d/1E58W_t_o6vTNUAx_TG3ifW6-eZE4KJ2VGEaBX_74YkY/edit?usp=sharing This one is pretty interesting, how do I extract this data? It may be useful independently of what we're discussing here. -- Stas Malyshev smalys...@wikimedia.org

Re: [Wikidata] [discovery-private] Indexing all item properties in ElasticSearch

2018-07-27 Thread Stas Malyshev
Hi! > I think we already index way more than P31 and P279. Oh yes, all the string properties. > So I think that the increase is smaller than what you anticipate. > What I'd try to avoid in general is indexing terms that have only doc > since they are pretty useless. For unique string

Re: [Wikidata] [discovery-private] Indexing all item properties in ElasticSearch

2018-07-27 Thread David Causse
On Fri, Jul 27, 2018 at 3:31 PM David Causse wrote: > What I'd try to avoid in general is indexing terms that have only doc > since they are pretty useless. > I meant: that have only *one* doc ___ Wikidata mailing list Wikidata@lists.wikimedia.org

Re: [Wikidata] [discovery-private] Indexing all item properties in ElasticSearch

2018-07-27 Thread David Causse
Hi, I think we already index way more than P31 and P279. For instance we have 102.301.706 (approximation) distinct values in the term lexicon for statement_keywords. Sadly I can't extract the list of unique PIDs used (we'd have to enable field_data on statement_keywords.property). The top 1000