Re: [Wikidata] Are we ready for our future

2019-05-12 Thread Gerard Meijssen
Hoi,
When you consider splitting up in parts, the question becomes how do we
bring things together.. Where a person may have publications and awards and
awards are in a separate area, how do you find the person in either
environment? It is one thing to consider splitting up because they may help
with "operational"  issues, the key thing is how will interaction be.
Without a practical way to mix and match it will lead to towers of
knowledge that are effectively hardly interacting.
Thanks,
  GerardM

On Sun, 12 May 2019 at 09:20, Federico Leva (Nemo) 
wrote:

> Erik Paulson, 12/05/19 01:54:
> > It's probably less about splitting the dumps up and more about starting
> > to split the main wikidata namespace into more discrete areas [...]
>
> In fact that was one of the proposals in "The future of bibliographic
> data in Wikidata: 4 possible federation scenarios".
> https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap
>
> Federico
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-12 Thread Federico Leva (Nemo)

Erik Paulson, 12/05/19 01:54:
It's probably less about splitting the dumps up and more about starting 
to split the main wikidata namespace into more discrete areas [...]


In fact that was one of the proposals in "The future of bibliographic 
data in Wikidata: 4 possible federation scenarios".

https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap

Federico

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-11 Thread Erik Paulson
I hope that splitting the wikidata dump into smaller, more functional
chunks is something the wikidata project considers.

It's probably less about splitting the dumps up and more about starting to
split the main wikidata namespace into more discrete areas, because without
that the full wikidata graph is hard to partition/dumps to be functionally
split up into something. For example, the latest wikidata news was "The
sixty-three millionth item, about a protein, is created." (yay!) - but
there are lots and lots of proteins. If someone is mirroring wikidata
locally to speed up their queries for say an astronomy use case, having to
download, store, and process a bunch of triples about a huge collection of
proteins is only making their life harder. Maybe some of these specialized
collections should go into their own namespace, like "wikidata-proteins" or
"wikidata-biology". The project can have some guidelines about how
"notable" an item has to be before it gets moved into "wikidata-core".
Hemoglobin, yeah, that probably belongs in "wikidata-core".
"MGG_03181-t26_1" aka Q6300 (which is some protein that's been found in
rice blast fungus) - well, maybe that's not quite notable enough just yet,
but is certainly still valuable to some subset of the community.

Federated queries mean that this isn't too much harder to manage from a
usability standpoint. If my local graph query processor/database knows that
it has large chunks of wikidata mirrored into it, it doesn't need to use
federated SPARQL to make remote network calls to wikidata.org's WDQS to
resolve my query - but if it stumbles across a graph item that it needs to
follow back across the network to wikidata.org, it can.

And wikidata.org could and still should strive to manage as many entities
in its knowledge base as possible, and load as many of these different
datasets into its local graph database to feed the WDQS, potentially even
knowledgebases that aren't from wikidata.org. That way, federated queries
that previously would have had to have made network calls can instead be
just integrated into the local query plan and hopefully go much faster.

-Erik


On Fri, May 3, 2019 at 9:50 AM Darren Cook  wrote:

> > Wikidata grows like mad. This is something we all experience in the
> really bad
> > response times we are suffering. It is so bad that people are asked what
> kind of
> > updates they are running because it makes a difference in the lag times
> there are.
> >
> > Given that Wikidata is growing like a weed, ...
>
> As I've delved deeper into Wikidata I get the feeling it is being
> developed with the assumptions of infinite resources, and no strong
> guidelines of exactly what the scope is (i.e. where you draw the line
> between what belongs in Wikidata and what does not).
>
> This (and concerns of it being open to data vandalism) has personally
> made me back-off a bit. I'd originally planned to have Wikidata be the
> primary data source, but I'm now leaning towards keeping data tables and
> graphs outside, with scheduled scripts to import into Wikidata, and
> export from Wikidata.
>
> > For the technical guys, consider our growth and plan for at least one
> year.
>
> The 37GB (json, bz2) data dump file (it was already 33GB, twice the size
> of the English wikipedia dump, when I grabbed it last November) is
> unwieldy. And, as there is no incremental changes being published, it is
> hard to create a mirror.
>
> Can that dump file be split up in some functional way, I wonder?
>
> Darren
>
>
> --
> Darren Cook, Software Researcher/Developer
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-06 Thread Sebastian Hellmann

Hi all,

I would like to throw in a slightly different angle here. The 
GlobalFactSync Project 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE 
will start in June.


As a preparation we wrote this paper describing the engine behind it: 
https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf


There has already been very constructive comments by 
https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE#Interfacing_with_Wikidata's_data_quality_issues_in_certain_areas 
which led us to focus on syncing music (bands, singles, albums) as 1 of 
the 10 sync targets. Other proposals for domains are very welcome.


The rationale behind GlobalFactSync is this:

Managing Data Quality is pareto-efficient, i.e. the first 80% are easy 
to achieve and each percent after that gets much more expensive 
following the law of diminishing returns. As a consequence for Wikidata: 
WD is probably at 80% now, so maintaining it gets harder because you 
need to micro-optimize to find the new errors and fill in missing 
information. This is exponentiated by growing Wikidata further in terms 
of entities.


GlobalFactSync does not solve the pareto-efficiency, but it cheats it as 
we hope that it will pool all the manpower of Wikipedia editors and 
Wikidata editors and also mobilize DBpedia users to edit either in WP or 
WD.


In general, Wikimedia runs the 6th largest website in the World. They 
are in the same league as Google or Facebook and I have absolutely no 
doubt that they have ample expertise in tackling scalability of hosting, 
e.g. by doubling the number of servers or web-caching. The problem I see 
is that you can not easily double the editor manpower or bot edits. 
Hence the GlobalFactSync Grant.


We will send out an announcement in a week or two. Fell free to suggest 
sync targets. We are still looking into the complexity of managing 
references as this is bread and butter for the project.


All the best,

Sebastian



On 05.05.19 18:07, Yaroslav Blanter wrote:
Indeed, these collaborations in high-energy physics are not static 
quantities, they change essentially every day (people getting hired 
and had their contract expired, and most likely every two papers have 
a slightly different author list.


Cheers
Yaroslav

On Sun, May 5, 2019 at 5:58 PM Darren Cook > wrote:


> We may also want to consider if Wikidata is actually the best
store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000
statements ...

A paper with 2884 authors! arxiv.org  deals with
it by calling them the
"Atlas Collaboration": https://arxiv.org/abs/1403.0489
The actual paper does the same (with the full list of names and
affiliations in the Appendix).

The nice thing about graph databases is we should be able to set
author
to point to an "Atlas Collaboration" node, and then have that node
point
to the 2884 individual author nodes (and each of those nodes point to
their affiliation).

What are the reasons to not re-organize it that way?

My first thought was that who is in the collaboration changes over
time?
But does it change day to day, or only change each academic year?

Either way, maybe we need to point the author field to something like
"Atlas Collaboration 2014a", and clone-and-modify that node each
time we
come to a paper that describes a different membership?

Or is it better to do each persons membership of such a group with a
start and end date?

(BTW, arxiv.org  tells me there are 1059 results
for ATLAS Collaboration;
don't know if one "result" corresponds to one "paper", though.)

> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other
Wikidata data
> in a single database.

It feels like it belongs in "core" Wikidata. Being able to ask "which
papers has this researcher written?" seems like a good example of a
Wikidata query. Similarly,  "which papers have The ATLAS
Collaboration"
worked on?"

But, also, are queries like "Which authors of Physics papers went to a
high school that had more than 1000 students?" part of the goal of
Wikidata? If so, Wikidata needs optimizing in such a way that
makes such
queries both possible and tractable.

Darren

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
All the best,
Sebastian Hellmann

Director of Knowledge 

Re: [Wikidata] Are we ready for our future

2019-05-05 Thread Yaroslav Blanter
Indeed, these collaborations in high-energy physics are not static
quantities, they change essentially every day (people getting hired and had
their contract expired, and most likely every two papers have a slightly
different author list.

Cheers
Yaroslav

On Sun, May 5, 2019 at 5:58 PM Darren Cook  wrote:

> > We may also want to consider if Wikidata is actually the best store for
> > all kinds of data. Let's consider example:
> >
> > https://www.wikidata.org/w/index.php?title=Q57009452
> >
> > This is an entity that is almost 2M in size, almost 3000 statements ...
>
> A paper with 2884 authors! arxiv.org deals with it by calling them the
> "Atlas Collaboration": https://arxiv.org/abs/1403.0489
> The actual paper does the same (with the full list of names and
> affiliations in the Appendix).
>
> The nice thing about graph databases is we should be able to set author
> to point to an "Atlas Collaboration" node, and then have that node point
> to the 2884 individual author nodes (and each of those nodes point to
> their affiliation).
>
> What are the reasons to not re-organize it that way?
>
> My first thought was that who is in the collaboration changes over time?
> But does it change day to day, or only change each academic year?
>
> Either way, maybe we need to point the author field to something like
> "Atlas Collaboration 2014a", and clone-and-modify that node each time we
> come to a paper that describes a different membership?
>
> Or is it better to do each persons membership of such a group with a
> start and end date?
>
> (BTW, arxiv.org tells me there are 1059 results for ATLAS Collaboration;
> don't know if one "result" corresponds to one "paper", though.)
>
> > While I am not against storing this as such, I do wonder if it's
> > sustainable to keep such kind of data together with other Wikidata data
> > in a single database.
>
> It feels like it belongs in "core" Wikidata. Being able to ask "which
> papers has this researcher written?" seems like a good example of a
> Wikidata query. Similarly,  "which papers have The ATLAS Collaboration"
> worked on?"
>
> But, also, are queries like "Which authors of Physics papers went to a
> high school that had more than 1000 students?" part of the goal of
> Wikidata? If so, Wikidata needs optimizing in such a way that makes such
> queries both possible and tractable.
>
> Darren
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-05 Thread Gerard Meijssen
Hoi,
For your information, these huge numbers of authors are particularly
noticable when organisations like CERN are involved. Those people have all
an ORCID identifier and slowly but surely more authors are being associated
with publications. As a consequence papers are getting to be complete for
their authors.. As more authors become available, it will be possible to
get more initial value in the first instance of a paper.

Given that the SOURCEMD jobs run in a narrow batch mode, more jobs running
concurrently offset the impact of "CERN" jobs. Over time fewer edits
associated with big articles with large amounts of co-authors will be
processed.

NB this is an answer to off topic issues raised. This is only one instance
of functionality that we support.
Thanks,
   GerardM

On Sun, 5 May 2019 at 17:06, Andrew Gray  wrote:

> So, I'm not particularly involved with the scholarly-papers work, but
> with my day-job bibliographic analysis hat on...
>
> Papers like this are a *remarkable* anomaly - hyperauthorship like
> this is confined to some quite specific areas of physics, and is still
> relatively uncommon even in those. I don't think we have to worry
> about it approaching anything like 2% of papers any time soon :-)
>
> For 2018 publications, the global mean number of authors/paper is
> slightly under five (all disciplines). Over all time, allowing for
> there being more new papers than old ones, I'd guess it's something
> like three.
>
> Andrew.
>
>
>
> On Sat, 4 May 2019 at 08:58, Stas Malyshev 
> wrote:
> >
> > Hi!
> >
> > > For the technical guys, consider our growth and plan for at least one
> > > year. When the impression exists that the current architecture will not
> > > scale beyond two years, start a project to future proof Wikidata.
> >
> > We may also want to consider if Wikidata is actually the best store for
> > all kinds of data. Let's consider example:
> >
> > https://www.wikidata.org/w/index.php?title=Q57009452
> >
> > This is an entity that is almost 2M in size, almost 3000 statements and
> > each edit to it produces another 2M data structure. And its dump, albeit
> > slightly smaller, still 780K and will need to be updated on each edit.
> >
> > Our database is obviously not optimized for such entities, and they
> > won't perform very well. We have 21 million scientific articles in the
> > DB, and if even 2% of them would be like this, it's almost a terabyte of
> > data (multiplied by number of revisions) and billions of statements.
> >
> > While I am not against storing this as such, I do wonder if it's
> > sustainable to keep such kind of data together with other Wikidata data
> > in a single database. After all, each query that you run - even if not
> > related to that 21 million in any way - will have to still run in within
> > the same enormous database and be hosted on the same hardware. This is
> > especially important for services like Wikidata Query Service where all
> > data (at least currently) occupies a shared space and can not be easily
> > separated.
> >
> > Any thoughts on this?
> >
> > --
> > Stas Malyshev
> > smalys...@wikimedia.org
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> - Andrew Gray
>   and...@generalist.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-05 Thread Darren Cook
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
> 
> https://www.wikidata.org/w/index.php?title=Q57009452
> 
> This is an entity that is almost 2M in size, almost 3000 statements ...

A paper with 2884 authors! arxiv.org deals with it by calling them the
"Atlas Collaboration": https://arxiv.org/abs/1403.0489
The actual paper does the same (with the full list of names and
affiliations in the Appendix).

The nice thing about graph databases is we should be able to set author
to point to an "Atlas Collaboration" node, and then have that node point
to the 2884 individual author nodes (and each of those nodes point to
their affiliation).

What are the reasons to not re-organize it that way?

My first thought was that who is in the collaboration changes over time?
But does it change day to day, or only change each academic year?

Either way, maybe we need to point the author field to something like
"Atlas Collaboration 2014a", and clone-and-modify that node each time we
come to a paper that describes a different membership?

Or is it better to do each persons membership of such a group with a
start and end date?

(BTW, arxiv.org tells me there are 1059 results for ATLAS Collaboration;
don't know if one "result" corresponds to one "paper", though.)

> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database.

It feels like it belongs in "core" Wikidata. Being able to ask "which
papers has this researcher written?" seems like a good example of a
Wikidata query. Similarly,  "which papers have The ATLAS Collaboration"
worked on?"

But, also, are queries like "Which authors of Physics papers went to a
high school that had more than 1000 students?" part of the goal of
Wikidata? If so, Wikidata needs optimizing in such a way that makes such
queries both possible and tractable.

Darren

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-05 Thread Andrew Gray
So, I'm not particularly involved with the scholarly-papers work, but
with my day-job bibliographic analysis hat on...

Papers like this are a *remarkable* anomaly - hyperauthorship like
this is confined to some quite specific areas of physics, and is still
relatively uncommon even in those. I don't think we have to worry
about it approaching anything like 2% of papers any time soon :-)

For 2018 publications, the global mean number of authors/paper is
slightly under five (all disciplines). Over all time, allowing for
there being more new papers than old ones, I'd guess it's something
like three.

Andrew.



On Sat, 4 May 2019 at 08:58, Stas Malyshev  wrote:
>
> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



--
- Andrew Gray
  and...@generalist.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-05 Thread Gerard Meijssen
Hoi,
Yes we could do that. What follows is that functionality of Wikidata is
killed. Completely dead.

Also, this thread is about us being ready for what we could be, not for
what we are. At that we are not as good as we could be. In your suggestions
for inclusion we could have the "concept cloud" for Wikipedia articles
defined by its wikilinks and define them in Wikidata. We don't. We could
have a usable user interface like Reasonator. We don't.

The reason for this thread is what does it take for us to have a perfoming
system because we don't. Our growth is less than what it could be. Our
functionality is less that what it could be. At the same tme we are
restricted it seems by annual budgets that do not take into account what
functionality we provide and could provide. The notion that we should
restrict our content for performance sake.. A rich organisation that is the
Wikimedia Foundation. REALLY!!
Thanks,
   GerardM

On Sat, 4 May 2019 at 20:27, Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Stas,
>
> Many thanks for writing this down! It is very useful to have a clear
> statement like this from the dev team.
>
> Given the sustainability concerns that you mention, I think the way
> forward for the community could be to hold a RFC to determine a stricter
> admissibility criterion for scholarly articles.
>
> It could be one of (or a boolean combination of) these:
> - having a site link;
> - being used as a reference for a statement on Wikidata;
> - being cited in a sister project;
> - being cited in a sister project using a template that fetches the
> metadata from Wikidata such as {{cite Q}};
> - being authored by someone with Wikipedia page about them;
> - … any other criterion that comes to mind.
>
> This way, the size of the corpus could be kept in control, and the
> criterion could be loosened later if the scalability concerns are
> addressed.
>
> Cheers,
> Antonin
>
> On 5/4/19 8:37 AM, Stas Malyshev wrote:
> > Hi!
> >
> >> For the technical guys, consider our growth and plan for at least one
> >> year. When the impression exists that the current architecture will not
> >> scale beyond two years, start a project to future proof Wikidata.
> >
> > We may also want to consider if Wikidata is actually the best store for
> > all kinds of data. Let's consider example:
> >
> > https://www.wikidata.org/w/index.php?title=Q57009452
> >
> > This is an entity that is almost 2M in size, almost 3000 statements and
> > each edit to it produces another 2M data structure. And its dump, albeit
> > slightly smaller, still 780K and will need to be updated on each edit.
> >
> > Our database is obviously not optimized for such entities, and they
> > won't perform very well. We have 21 million scientific articles in the
> > DB, and if even 2% of them would be like this, it's almost a terabyte of
> > data (multiplied by number of revisions) and billions of statements.
> >
> > While I am not against storing this as such, I do wonder if it's
> > sustainable to keep such kind of data together with other Wikidata data
> > in a single database. After all, each query that you run - even if not
> > related to that 21 million in any way - will have to still run in within
> > the same enormous database and be hosted on the same hardware. This is
> > especially important for services like Wikidata Query Service where all
> > data (at least currently) occupies a shared space and can not be easily
> > separated.
> >
> > Any thoughts on this?
> >
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Marco Neumann
maybe it would be a good idea to run sparql updates directly to the
endpoint. rather than taking the de-tour via SQL blobs here.

How large is the RDF TTL of the page?

On Sat, May 4, 2019 at 7:37 PM Stas Malyshev 
wrote:

> Hi!
>
> > WQS data doesn't have versions, it doesn't have to be in one space and
> > can easily be separated. The whole point of LOD is to decentralize your
> > data. But I understand that Wikidata/WQS is currently designend as a
> > centralized closed shop service for several reasons granted.
>
> True, WDQS does not have versions. But each time the edit is made, we
> now have to download and work through the whole 2M... It wasn't a
> problem when we were dealing with regular-sized entities, but current
> system certainly is not good for such giant ones.
>
> As for decentralizing, WDQS supports federation, but for obvious reasons
> federated queries are slower and less efficient. That said, if there
> were separate store for such kind of data, it might work as
> cross-querying against other Wikidata data wouldn't be very frequent.
> But this is something that Wikidata community needs to figure out how to
> do.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Stas Malyshev
Hi!

> WQS data doesn't have versions, it doesn't have to be in one space and
> can easily be separated. The whole point of LOD is to decentralize your
> data. But I understand that Wikidata/WQS is currently designend as a
> centralized closed shop service for several reasons granted.

True, WDQS does not have versions. But each time the edit is made, we
now have to download and work through the whole 2M... It wasn't a
problem when we were dealing with regular-sized entities, but current
system certainly is not good for such giant ones.

As for decentralizing, WDQS supports federation, but for obvious reasons
federated queries are slower and less efficient. That said, if there
were separate store for such kind of data, it might work as
cross-querying against other Wikidata data wouldn't be very frequent.
But this is something that Wikidata community needs to figure out how to do.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Antonin Delpeuch (lists)
Hi Stas,

Many thanks for writing this down! It is very useful to have a clear
statement like this from the dev team.

Given the sustainability concerns that you mention, I think the way
forward for the community could be to hold a RFC to determine a stricter
admissibility criterion for scholarly articles.

It could be one of (or a boolean combination of) these:
- having a site link;
- being used as a reference for a statement on Wikidata;
- being cited in a sister project;
- being cited in a sister project using a template that fetches the
metadata from Wikidata such as {{cite Q}};
- being authored by someone with Wikipedia page about them;
- … any other criterion that comes to mind.

This way, the size of the corpus could be kept in control, and the
criterion could be loosened later if the scalability concerns are addressed.

Cheers,
Antonin

On 5/4/19 8:37 AM, Stas Malyshev wrote:
> Hi!
> 
>> For the technical guys, consider our growth and plan for at least one
>> year. When the impression exists that the current architecture will not
>> scale beyond two years, start a project to future proof Wikidata.
> 
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
> 
> https://www.wikidata.org/w/index.php?title=Q57009452
> 
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
> 
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
> 
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
> 
> Any thoughts on this?
> 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Gerard Meijssen
Hoi,
Your approach is technically valid. It is equally obvious in part the wrong
approach. Where you say we have to consider if Wikidata is the best store
for all kinds of data, you may indicate the inadequacies of Wikidata in
relation to particular kinds of data we already store, want to use. The
fact is that this is what Wikidata is being used for. In addition there is
more data that people want to include in Wikidata that will provide a real
service, a service that blends in really well with our mission.

For me it does not really matter how and where what is stored. In this
thread it is relevant to pursue an answer to the question how will we
scale, how will we serve the needs that are now served with Wikidata and
the needs that are not yet served by Wikidata. Wikidata is the project and
as long as data comes together to be manipulated or queried in a consistent
manner it may be Wikibase or whatever.

The issue is how do we scale, not why we are to accept too little resources
by restricting the functionality of Wikidata.
Thanks,
  GerardM

On Sat, 4 May 2019 at 09:38, Stas Malyshev  wrote:

> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Marco Neumann
yeah, the wikibase storage doesn't sound right here, but these are two
different issues, one with wikibase (sql) and one with the Wikidata Query
Service (blazegraph).

that 2M footprint is the sql db blob? each additional 2M edit is the
version history correct?

So the issue your are referring to here is in the design of the SQL based
"Wikibase Repository"? How does the 2M  footprint and its versions compare
to a large wikipedia blob?

WQS data doesn't have versions, it doesn't have to be in one space and can
easily be separated. The whole point of LOD is to decentralize your data.
But I understand that Wikidata/WQS is currently designend as a centralized
closed shop service for several reasons granted.




On Sat, May 4, 2019 at 8:57 AM Stas Malyshev 
wrote:

> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Stas Malyshev
Hi!

> For the technical guys, consider our growth and plan for at least one
> year. When the impression exists that the current architecture will not
> scale beyond two years, start a project to future proof Wikidata.

We may also want to consider if Wikidata is actually the best store for
all kinds of data. Let's consider example:

https://www.wikidata.org/w/index.php?title=Q57009452

This is an entity that is almost 2M in size, almost 3000 statements and
each edit to it produces another 2M data structure. And its dump, albeit
slightly smaller, still 780K and will need to be updated on each edit.

Our database is obviously not optimized for such entities, and they
won't perform very well. We have 21 million scientific articles in the
DB, and if even 2% of them would be like this, it's almost a terabyte of
data (multiplied by number of revisions) and billions of statements.

While I am not against storing this as such, I do wonder if it's
sustainable to keep such kind of data together with other Wikidata data
in a single database. After all, each query that you run - even if not
related to that 21 million in any way - will have to still run in within
the same enormous database and be hosted on the same hardware. This is
especially important for services like Wikidata Query Service where all
data (at least currently) occupies a shared space and can not be easily
separated.

Any thoughts on this?

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Marco Neumann
looks like you are ready for the weekend Gerard :-) I don't see a scale
issue at the moment for the type of wikidata use cases I come across. Even
total number of triples is plateauing at 7.6bn*. ( of course it's easy to
write "bad" queries that bring down the server). Allowing people to setup
their own local instances with their own triple stores in the future is a
good approach for a distributed and decentralized data management approach
here.

that said a faster and better wikidata instance is always appreciated. And
can certainly be provided. What's the current cost of running/hosting the
service with wikibase+blazegraph per month?

Marco

*
https://grafana.wikimedia.org/d/00489/wikidata-query-service?refresh=1m=1=now-1y=now

On Fri, May 3, 2019 at 4:28 PM Gerard Meijssen 
wrote:

> Hoi,
> Lies, damned lies and statistics. The quality of Wikidata suffers, it
> could be so much better if we truly wanted Wikidata to grow. Your numbers
> only show growth within the limits of what has been made possible. Traffic
> and numbers could be much more.
> Thanks,
> GerardM
>
> On Fri, 3 May 2019 at 17:17, Marco Neumann 
> wrote:
>
>> Gerard, I like wikidata a lot, kudos to the community for keeping it
>> going. But keep it real, there is no exponential growth here.
>>
>> We are looking at a slow and sustainable growth at the moment with
>> possibly a plateauing of number of users and when it comes to total number
>> of wikidata items. just take a look at the statistics.
>>
>> Date | Content pages | Page edits since Wikidata was set up | Registered
>> users | Active users
>>
>> 4/2015  | 13,911,417  | 213,027,375 | 1,913,828 | 15,168
>> 5/2016  | 17,432,789  | 328,781,525 | 2,688,788 | 16,833
>> 7/2017  | 28,037,196  | 514,252,789 | 2,835,219 | 18,081
>> 7/2018  | 49,081,962  | 701,319,718 | 2,970,150 | 18,578
>> 4/2019  | 56,377,647  | 931,449,205 | 3,236,569 | 20,857
>>
>> When you refer to "growing like a weed". What's that page views? queries
>> per day? Mentions in the media?
>>
>> Best,
>> Marco
>>
>>
>>
>>
>> On Fri, May 3, 2019 at 3:36 PM Gerard Meijssen 
>> wrote:
>>
>>> Hoi,
>>> This mail thread is NOT about the issues that I or others face at this
>>> time. They are serious enough but that is not for this thread. People are
>>> working hard to find a solution for now.  That is cool.
>>>
>>> What I want to know is are we technically and financially ready for a
>>> continued exponential growth. If so, what are the plans and what if those
>>> plans are needed in half the time expected. Are we ready for a continued
>>> growth. When we hesitate we will lose the opportunities that are currently
>>> open to us.
>>> Thanks,
>>>GerardM
>>>
>>> On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:
>>>
 Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly

 >we all experience in the really bad response times we are suffering.
 It is so bad that people are asked what kind of updates they are running
 because it makes a difference in the lag times there are.

 The response times are typically attributed to SPARQL queries from what
 I have seen, as well as applying multiple edits with scripts or mass
 operations. Although I recall there is a light queue mechanism inherent in
 the Blazegraph architecture that contributes to this, and I am fine with
 slower writes.

 What most users are not comfortable with is the slower reads in
 different areas of Wikidata.
 We need to identify those slow read areas or figure out a way to get
 consensus on what parts of Wikidata reading affect our users the most.

 So let's be constructive here:
 Gerard - did you have specific areas that affect your daily work, and
 what from of work is that (reading/writing , which areas) ?

 Thad
 https://www.linkedin.com/in/thadguidry/
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> --
>>
>>
>> ---
>> Marco Neumann
>> KONA
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Gerard Meijssen
Hoi,
Lies, damned lies and statistics. The quality of Wikidata suffers, it could
be so much better if we truly wanted Wikidata to grow. Your numbers only
show growth within the limits of what has been made possible. Traffic and
numbers could be much more.
Thanks,
GerardM

On Fri, 3 May 2019 at 17:17, Marco Neumann  wrote:

> Gerard, I like wikidata a lot, kudos to the community for keeping it
> going. But keep it real, there is no exponential growth here.
>
> We are looking at a slow and sustainable growth at the moment with
> possibly a plateauing of number of users and when it comes to total number
> of wikidata items. just take a look at the statistics.
>
> Date | Content pages | Page edits since Wikidata was set up | Registered
> users | Active users
>
> 4/2015  | 13,911,417  | 213,027,375 | 1,913,828 | 15,168
> 5/2016  | 17,432,789  | 328,781,525 | 2,688,788 | 16,833
> 7/2017  | 28,037,196  | 514,252,789 | 2,835,219 | 18,081
> 7/2018  | 49,081,962  | 701,319,718 | 2,970,150 | 18,578
> 4/2019  | 56,377,647  | 931,449,205 | 3,236,569 | 20,857
>
> When you refer to "growing like a weed". What's that page views? queries
> per day? Mentions in the media?
>
> Best,
> Marco
>
>
>
>
> On Fri, May 3, 2019 at 3:36 PM Gerard Meijssen 
> wrote:
>
>> Hoi,
>> This mail thread is NOT about the issues that I or others face at this
>> time. They are serious enough but that is not for this thread. People are
>> working hard to find a solution for now.  That is cool.
>>
>> What I want to know is are we technically and financially ready for a
>> continued exponential growth. If so, what are the plans and what if those
>> plans are needed in half the time expected. Are we ready for a continued
>> growth. When we hesitate we will lose the opportunities that are currently
>> open to us.
>> Thanks,
>>GerardM
>>
>> On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:
>>
>>> Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly
>>>
>>> >we all experience in the really bad response times we are suffering.
>>> It is so bad that people are asked what kind of updates they are running
>>> because it makes a difference in the lag times there are.
>>>
>>> The response times are typically attributed to SPARQL queries from what
>>> I have seen, as well as applying multiple edits with scripts or mass
>>> operations. Although I recall there is a light queue mechanism inherent in
>>> the Blazegraph architecture that contributes to this, and I am fine with
>>> slower writes.
>>>
>>> What most users are not comfortable with is the slower reads in
>>> different areas of Wikidata.
>>> We need to identify those slow read areas or figure out a way to get
>>> consensus on what parts of Wikidata reading affect our users the most.
>>>
>>> So let's be constructive here:
>>> Gerard - did you have specific areas that affect your daily work, and
>>> what from of work is that (reading/writing , which areas) ?
>>>
>>> Thad
>>> https://www.linkedin.com/in/thadguidry/
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Marco Neumann
Gerard, I like wikidata a lot, kudos to the community for keeping it going.
But keep it real, there is no exponential growth here.

We are looking at a slow and sustainable growth at the moment with possibly
a plateauing of number of users and when it comes to total number of
wikidata items. just take a look at the statistics.

Date | Content pages | Page edits since Wikidata was set up | Registered
users | Active users

4/2015  | 13,911,417  | 213,027,375 | 1,913,828 | 15,168
5/2016  | 17,432,789  | 328,781,525 | 2,688,788 | 16,833
7/2017  | 28,037,196  | 514,252,789 | 2,835,219 | 18,081
7/2018  | 49,081,962  | 701,319,718 | 2,970,150 | 18,578
4/2019  | 56,377,647  | 931,449,205 | 3,236,569 | 20,857

When you refer to "growing like a weed". What's that page views? queries
per day? Mentions in the media?

Best,
Marco




On Fri, May 3, 2019 at 3:36 PM Gerard Meijssen 
wrote:

> Hoi,
> This mail thread is NOT about the issues that I or others face at this
> time. They are serious enough but that is not for this thread. People are
> working hard to find a solution for now.  That is cool.
>
> What I want to know is are we technically and financially ready for a
> continued exponential growth. If so, what are the plans and what if those
> plans are needed in half the time expected. Are we ready for a continued
> growth. When we hesitate we will lose the opportunities that are currently
> open to us.
> Thanks,
>GerardM
>
> On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:
>
>> Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly
>>
>> >we all experience in the really bad response times we are suffering. It
>> is so bad that people are asked what kind of updates they are running
>> because it makes a difference in the lag times there are.
>>
>> The response times are typically attributed to SPARQL queries from what I
>> have seen, as well as applying multiple edits with scripts or mass
>> operations. Although I recall there is a light queue mechanism inherent in
>> the Blazegraph architecture that contributes to this, and I am fine with
>> slower writes.
>>
>> What most users are not comfortable with is the slower reads in different
>> areas of Wikidata.
>> We need to identify those slow read areas or figure out a way to get
>> consensus on what parts of Wikidata reading affect our users the most.
>>
>> So let's be constructive here:
>> Gerard - did you have specific areas that affect your daily work, and
>> what from of work is that (reading/writing , which areas) ?
>>
>> Thad
>> https://www.linkedin.com/in/thadguidry/
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Darren Cook
> Wikidata grows like mad. This is something we all experience in the really 
> bad 
> response times we are suffering. It is so bad that people are asked what kind 
> of 
> updates they are running because it makes a difference in the lag times there 
> are.
> 
> Given that Wikidata is growing like a weed, ...

As I've delved deeper into Wikidata I get the feeling it is being
developed with the assumptions of infinite resources, and no strong
guidelines of exactly what the scope is (i.e. where you draw the line
between what belongs in Wikidata and what does not).

This (and concerns of it being open to data vandalism) has personally
made me back-off a bit. I'd originally planned to have Wikidata be the
primary data source, but I'm now leaning towards keeping data tables and
graphs outside, with scheduled scripts to import into Wikidata, and
export from Wikidata.

> For the technical guys, consider our growth and plan for at least one year.

The 37GB (json, bz2) data dump file (it was already 33GB, twice the size
of the English wikipedia dump, when I grabbed it last November) is
unwieldy. And, as there is no incremental changes being published, it is
hard to create a mirror.

Can that dump file be split up in some functional way, I wonder?

Darren


-- 
Darren Cook, Software Researcher/Developer

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Gerard Meijssen
Hoi,
This mail thread is NOT about the issues that I or others face at this
time. They are serious enough but that is not for this thread. People are
working hard to find a solution for now.  That is cool.

What I want to know is are we technically and financially ready for a
continued exponential growth. If so, what are the plans and what if those
plans are needed in half the time expected. Are we ready for a continued
growth. When we hesitate we will lose the opportunities that are currently
open to us.
Thanks,
   GerardM

On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:

> Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly
>
> >we all experience in the really bad response times we are suffering. It
> is so bad that people are asked what kind of updates they are running
> because it makes a difference in the lag times there are.
>
> The response times are typically attributed to SPARQL queries from what I
> have seen, as well as applying multiple edits with scripts or mass
> operations. Although I recall there is a light queue mechanism inherent in
> the Blazegraph architecture that contributes to this, and I am fine with
> slower writes.
>
> What most users are not comfortable with is the slower reads in different
> areas of Wikidata.
> We need to identify those slow read areas or figure out a way to get
> consensus on what parts of Wikidata reading affect our users the most.
>
> So let's be constructive here:
> Gerard - did you have specific areas that affect your daily work, and what
> from of work is that (reading/writing , which areas) ?
>
> Thad
> https://www.linkedin.com/in/thadguidry/
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Thad Guidry
Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly

>we all experience in the really bad response times we are suffering. It is
so bad that people are asked what kind of updates they are running because
it makes a difference in the lag times there are.

The response times are typically attributed to SPARQL queries from what I
have seen, as well as applying multiple edits with scripts or mass
operations. Although I recall there is a light queue mechanism inherent in
the Blazegraph architecture that contributes to this, and I am fine with
slower writes.

What most users are not comfortable with is the slower reads in different
areas of Wikidata.
We need to identify those slow read areas or figure out a way to get
consensus on what parts of Wikidata reading affect our users the most.

So let's be constructive here:
Gerard - did you have specific areas that affect your daily work, and what
from of work is that (reading/writing , which areas) ?

Thad
https://www.linkedin.com/in/thadguidry/
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Andra Waagmeester
I agree it is not clear what is being discussed here. It is growing, but in
(my opinion) in a positive way, i.e. being accepted as a viable knowledge
graph.

Regards,

Andra

On Fri, May 3, 2019 at 3:27 PM David Abián  wrote:

> Hi!
>
> Indeed, Wikidata grows and will continue growing. But I don't see
> clearly what the purpose of this thread is. Is it to propose possible
> technical and financial improvements?
>
> Regards,
> David
>
>
> On 5/3/19 14:24, Gerard Meijssen wrote:
> > Hoi,
> > Wikidata grows like mad. This is something we all experience in the
> > really bad response times we are suffering. It is so bad that people are
> > asked what kind of updates they are running because it makes a
> > difference in the lag times there are.
> >
> > Given that Wikidata is growing like a weed, it follows that there are
> > two issues. Technical - what is the maximum that the current approach
> > supports - how long will this last us. Fundamental - what funding is
> > available to sustain Wikidata.
> >
> > For the financial guys, growth like Wikidata is experiencing is not
> > something you can reliably forecast. As an organisation we have more
> > money than we need to spend, so there is no credible reason to be stingy.
> >
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
> >
> > It will grow and the situation will get worse before it gets better.
> > Thanks,
> >   GerardM
> >
> > PS I know about phabricator tickets, they do not give the answers to the
> > questions we need to address.
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
> --
> David Abián
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-03 Thread David Abián
Hi!

Indeed, Wikidata grows and will continue growing. But I don't see
clearly what the purpose of this thread is. Is it to propose possible
technical and financial improvements?

Regards,
David


On 5/3/19 14:24, Gerard Meijssen wrote:
> Hoi,
> Wikidata grows like mad. This is something we all experience in the
> really bad response times we are suffering. It is so bad that people are
> asked what kind of updates they are running because it makes a
> difference in the lag times there are.
> 
> Given that Wikidata is growing like a weed, it follows that there are
> two issues. Technical - what is the maximum that the current approach
> supports - how long will this last us. Fundamental - what funding is
> available to sustain Wikidata.
> 
> For the financial guys, growth like Wikidata is experiencing is not
> something you can reliably forecast. As an organisation we have more
> money than we need to spend, so there is no credible reason to be stingy.
> 
> For the technical guys, consider our growth and plan for at least one
> year. When the impression exists that the current architecture will not
> scale beyond two years, start a project to future proof Wikidata.
> 
> It will grow and the situation will get worse before it gets better.
> Thanks,
>       GerardM
> 
> PS I know about phabricator tickets, they do not give the answers to the
> questions we need to address.
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

-- 
David Abián

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Are we ready for our future

2019-05-03 Thread Gerard Meijssen
Hoi,
Wikidata grows like mad. This is something we all experience in the really
bad response times we are suffering. It is so bad that people are asked
what kind of updates they are running because it makes a difference in the
lag times there are.

Given that Wikidata is growing like a weed, it follows that there are two
issues. Technical - what is the maximum that the current approach supports
- how long will this last us. Fundamental - what funding is available to
sustain Wikidata.

For the financial guys, growth like Wikidata is experiencing is not
something you can reliably forecast. As an organisation we have more money
than we need to spend, so there is no credible reason to be stingy.

For the technical guys, consider our growth and plan for at least one year.
When the impression exists that the current architecture will not scale
beyond two years, start a project to future proof Wikidata.

It will grow and the situation will get worse before it gets better.
Thanks,
  GerardM

PS I know about phabricator tickets, they do not give the answers to the
questions we need to address.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata