Your approach is technically valid. It is equally obvious in part the wrong
approach. Where you say we have to consider if Wikidata is the best store
for all kinds of data, you may indicate the inadequacies of Wikidata in
relation to particular kinds of data we already store, want to use. The
fact is that this is what Wikidata is being used for. In addition there is
more data that people want to include in Wikidata that will provide a real
service, a service that blends in really well with our mission.

For me it does not really matter how and where what is stored. In this
thread it is relevant to pursue an answer to the question how will we
scale, how will we serve the needs that are now served with Wikidata and
the needs that are not yet served by Wikidata. Wikidata is the project and
as long as data comes together to be manipulated or queried in a consistent
manner it may be Wikibase or whatever.

The issue is how do we scale, not why we are to accept too little resources
by restricting the functionality of Wikidata.

On Sat, 4 May 2019 at 09:38, Stas Malyshev <smalys...@wikimedia.org> wrote:

> Hi!
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
> https://www.wikidata.org/w/index.php?title=Q57009452
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
> Any thoughts on this?
> --
> Stas Malyshev
> smalys...@wikimedia.org
Wikidata mailing list

Reply via email to