[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-04-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  >> **Task description**
  >>
  >>> The special page currently always re-runs the constraint checks via WDQS, 
it does not get or set any cache.
  >>
  >> Why not?
  >
  > Sorry, it does set the cache, bur will not get the cache.
  
  I don’t think it sets it either – as far as I’m aware the special page is 
completely oblivious to the cache. As to the “why”, it seemed useful at the 
time to have a way for users to get guaranteed-fresh constraint check results. 
(We could still set the cache in that case, of course, it’s just not 
implemented – and since the special page is very rarely used, it wouldn’t make 
much of a difference.)

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: eprodromou, CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, 
Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, 
Aklapper, Addshore, darthmon_wmde, WDoranWMF, holger.knust, EvanProdromou, 
DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, RazeSoldier, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, Scott_WUaS, 
Pchelolo, Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, GWicke, Bawolff, 
jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, 
Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-04-06 Thread Addshore
Addshore added a comment.


  In T214362#6028271 , 
@Krinkle wrote:
  
  > **Task description**
  >
  >> The constrain checks are accessible via 3 methods:
  >>
  >> - RDF action
  >> - Special page
  >> - API
  >>
  >> […] The RDF page-action exists for use by the WDQS and will not run the 
constraint check itself, it only exposes an RDF description of the currently 
stored constraints that apply to this entity.
  >
  > If I understand correctly then, the RFC action is not a way to access the 
result of, nor to trigger, a WDQS request. Is that right?
  
  No
  
  Part of the thing being stored may have been generated using data from a 
query to WDQS however.
  
  > **Task description**
  >
  >> The special page currently always re-runs the constraint checks via WDQS, 
it does not get or set any cache.
  >
  > Why not?
  
  Sorry, it does set the cache, bur will not get the cache.
  
  > **Task description**
  >
  >> The Query service performs checks on Wikidata entities on-demand from 
users. Results of these constraint checks are cached by MediaWiki (WBQC) in 
Memcached. […] The API only makes an internal request to WDQS if the constraint 
checks data is out of date, absent, or expired […].
  >> […] We could make the Job […] that informs the Query service to pull the 
API to ingest the new data.
  >>
  >> - […] we don't currently have the result of all constraints checks for all 
Wikidata items stored anywhere.
  >>
  >>> ! From T201147 :
  >>
  >> At the moment constraints violations are only imported to WDQS if they are 
cached the moment WDQS pulls the rdfs for constraint violations for an item. 
There is a race condition between the WDQS poller and the constraints check 
execution and this is why only a fraction of constraint violations are imported.
  >
  > The above sounds contradictory to me, but I assume that must be because I'm 
misunderstanding something. 
  > If I understand correctly, the authoritive source for describing items is 
Wikidata.org.
  
  Yes
  
  > The RDF Action on Wikidata.org exposes information relevant to constraint 
checks.
  
  Yes
  For example https://www.wikidata.org/wiki/Q64?action=constraintsrdf
  If this page appears blank then you'll need to hit 
https://www.wikidata.org/wiki/Special:ConstraintReport/Q64 first to generate 
and cache the results
  
  > The way we actually execute those contraint checks is by submitting a query 
to the Query service (WDQS), which has a nice relational model all the 
relationships and metadata etc.
  
  Only some checks result in queries to the WDQS.
  Other checks are done purely in PHP.
  
  > The thing that executes these checks is the MediaWiki 
WikibaseQualityConstraints extension (WBQC), and it caches the result for a day 
in Memcached.
  
  Yes
  
  > So far so good, I think. But then I also read that Query service (WDQS) 
ingests the result of these checks (which it executed itself?), and that we 
want the Job to notify WDQS when it is best to poll for that so that it is 
likely a Memcached cache-hit.
  > I don't know why the result of this is stored in WDQS. But, that sounds to 
me like you already have a place to store them all?
  
  WDQS is not really a store, for us currently it is 16 or so stores.
  On top of that, it isn't a store it is a query service and has all kinds of 
baggage attached to it because of that.
  
  Anyway, these results need to be accessible in mediawiki PHP code.
  WDQS instances should also not be seen as persistent or consistent.
  Part of the requirement of this task is that we have a copy of constraints 
for entities so that we can dump them and reload a WDQS instance.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: eprodromou, CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, 
Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, 
Aklapper, Addshore, darthmon_wmde, WDoranWMF, holger.knust, EvanProdromou, 
DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, RazeSoldier, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, Scott_WUaS, 
Pchelolo, Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, GWicke, Bawolff, 
jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, 
Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-04-03 Thread Krinkle
Krinkle added a comment.


  **Task description**
  
  > The constrain checks are accessible via 3 methods:
  >
  > - RDF action
  > - Special page
  > - API
  >
  > […] The RDF page-action exists for use by the WDQS and will not run the 
constraint check itself, it only exposes an RDF description of the currently 
stored constraints that apply to this entity.
  
  If I understand correctly then, the RFC action is not a way to access the 
result of, nor to trigger, a WDQS request. Is that right?
  
  **Task description**
  
  > The special page currently always re-runs the constraint checks via WDQS, 
it does not get or set any cache.
  
  Why not?
  
  **Task description**
  
  > The Query service performs checks on Wikidata entities on-demand from 
users. Results of these constraint checks are cached by MediaWiki (WBQC) in 
Memcached. […] The API only makes an internal request to WDQS if the constraint 
checks data is out of date, absent, or expired […].
  > […] We could make the Job […] that informs the Query service to pull the 
API to ingest the new data.
  >
  > - […] we don't currently have the result of all constraints checks for all 
Wikidata items stored anywhere.
  
  
  
  From T201147 :
  
  > At the moment constraints violations are only imported to WDQS if they are 
cached the moment WDQS pulls the rdfs for constraint violations for an item. 
There is a race condition between the WDQS poller and the constraints check 
execution and this is why only a fraction of constraint violations are imported.
  
  The above sounds contradictory to me, but I assume that must be because I'm 
misunderstanding something.
  
  If I understand correctly, the authoritive source for describing items is 
Wikidata.org.  The RDF Action on Wikidata.org exposes information relevant to 
constraint checks. The way we actually execute those contraint checks is by 
submitting a query to the Query service (WDQS), which has a nice relational 
model all the relationships and metadata etc. The thing that executes these 
checks is the MediaWiki WikibaseQualityConstraints extension (WBQC), and it 
caches the result for a day in Memcached.
  
  So far so good, I think. But then I also read that Query service (WDQS) 
ingests the result of these checks (which it executed itself?), and that we 
want the Job to notify WDQS when it is best to poll for that so that it is 
likely a Memcached cache-hit.
  
  I don't know why the result of this is stored in WDQS. But, that sounds to me 
like you already have a place to store them all?

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Krinkle
Cc: eprodromou, CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, 
Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, 
Aklapper, Addshore, darthmon_wmde, WDoranWMF, holger.knust, EvanProdromou, 
DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, RazeSoldier, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, Scott_WUaS, 
Pchelolo, Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, GWicke, Bawolff, 
jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, 
Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-01-22 Thread daniel
daniel added a comment.


  In T214362#5814559 , 
@Addshore wrote:
  
  > Yes, we need this!
  > This decision is one of the 2 rfcs blocking continued work on constraint 
checks for Wikidata that was started in either late 2018 or early 2019. :)
  
  Tagging #core_platform_team 
 for consideration 
of a future initiative.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, Lydia_Pintscher, 
Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, 
darthmon_wmde, holger.knust, DannyS712, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, RazeSoldier, QZanden, merbst, LawExplorer, _jensen, 
rosalieper, D3r1ck01, Scott_WUaS, Pchelolo, Izno, SBisson, Eevans, Perhelion, 
Hardikj, Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, 
Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-01-18 Thread Addshore
Addshore added a comment.


  Yes, we need this!
  This decision is one of the 2 rfcs blocking continued work on constraint 
checks for Wikidata that was started in either late 2018 or early 2019. :)

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, Lydia_Pintscher, 
Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, 
darthmon_wmde, holger.knust, DannyS712, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, RazeSoldier, QZanden, merbst, LawExplorer, _jensen, 
rosalieper, D3r1ck01, Scott_WUaS, Pchelolo, Izno, SBisson, Eevans, Perhelion, 
Hardikj, Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, 
Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-01-16 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  We need this! :D (Seriously!)

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, Lydia_Pintscher, 
Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, 
darthmon_wmde, holger.knust, DannyS712, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, RazeSoldier, QZanden, merbst, LawExplorer, _jensen, 
rosalieper, D3r1ck01, Scott_WUaS, Pchelolo, Izno, SBisson, Eevans, Perhelion, 
Hardikj, Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, 
Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2020-01-16 Thread Addshore
Addshore added a comment.


  I see that T227776  has had little 
activity since November, is there any way to move this forward?

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Krinkle, mobrovac, abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, 
Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, darthmon_wmde, 
holger.knust, DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, 
RazeSoldier, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
Scott_WUaS, Pchelolo, Izno, SBisson, Eevans, Perhelion, Hardikj, Wikidata-bugs, 
Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, Jdforrester-WMF, 
Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-08-09 Thread Addshore
Addshore added a comment.


  In T214362#5363005 , 
@daniel wrote:
  
  > I'd be interested in your thoughts on T227776: Generalize ParserCache into 
a generic service class for large "current" page-derived data 
.
  
  Without digging into it too deeply that sounds like exactly what we need.
  
  In T214362#5400959 , 
@daniel wrote:
  
  > Moving to under discussion. @mobrovac  and @Joe said they had further 
questions. @Krinkle has some ideas as well.
  
  I'm keen to answer more questions!

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Krinkle, mobrovac, abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, 
Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, darthmon_wmde, 
holger.knust, DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, 
RazeSoldier, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
Pchelolo, SBisson, Eevans, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, 
santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-07-24 Thread daniel
daniel added a comment.


  In T214362#5338030 , 
@Addshore wrote:
  
  > In T214362#5324623 , 
@daniel wrote:
  >
  >> Most importantly, the proposal assumes the existence of a "more permanent 
storage solution" which is not readily available. This would have to be created.
  >
  > I guess the closest thing we have like it right now would be the parser 
cache system backed by MySQL.
  
  I'd be interested in your thoughts on T227776: Generalize ParserCache into a 
generic service class for large "current" page-derived data 
.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: mobrovac, abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, 
daniel, Agabi10, Aklapper, Addshore, darthmon_wmde, holger.knust, Nandana, 
kostajh, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, 
rosalieper, D3r1ck01, Pchelolo, SBisson, Eevans, Hardikj, Wikidata-bugs, aude, 
GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, 
Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-07-16 Thread Addshore
Addshore added a comment.


  In T214362#5324623 , 
@daniel wrote:
  
  > Moved to the RFC backlog for improvement after discussion at the TechCom 
meeting. The proposed functionality seems sensible enough, but this ticket is 
lacking information about system design that is needed to make this viable as 
an RFC.
  > Most importantly, the proposal assumes the existence of a "more permanent 
storage solution" which is not readily available. This would have to be created.
  
  I guess the closest thing we have like it right now would be the parser cache 
system backed by MySQL.
  
  > Which raises a number of questions, like:
  >
  > - what volume of data do you expect that store to hold?
  
  I can't talk in terms of bytes right now, but I we can add a bit of tracking 
to our current cache to try and figure out an average size and figure out a 
rough total size from that if that''s what we want.
  If we are talking about number of entries, this would roughly line up with 
the number of wikidata entities, which is right now 58 million.
  
  > - should data ever be evicted? Does it *have* to be evicted?
  
  It does not *have* to be "evicted", but there will be situations where it is 
detected to be out of date and thus regenerated.
  
  > - how bad is it if we lose some data unexpectedly?
  
  Not very, everything can and will be regenerated, but takes time.
  
  > How bad is it for all the data to become unavailable?
  
  Unavailable or totally lost?
  Unavailable for a short period of time would not be critical.
  Unavailable for longer periods of time could have knock on effects to other 
services such as WDQS not being able to update fully once T201147 
 is complete, but I'm sure whatever 
update code is created would be able to handle such a situation.
  
  Totally loosing all of the data would be pretty bad, it would probably take 
an extreme amount of time to regenerate at a regular pace for all entities.
  
  > - what's the read/write load?
  
  Write load once the job is fully deployed would be roughly the wikidata edit 
rate, but limited / controlled by the job queue rate for "constraintsRunCheck".
  This can be guesstimated at 250-750 per minute max, but there will also be 
de-duplication for edits to the same pages to account for there.
  If more exact numbers are required we can have a go at figuring that out.
  Currently the job is only running on 25% of edits.
  
  Read rate can currently be seen at 
https://grafana.wikimedia.org/d/00344/wikidata-quality?panelId=6=1
  On top of this the WDQS updaters would also be needing this data once 
generated.
  This would either be via a http api request which would likely hit the 
storage, or this could possibly be sent in some event queue?
  
  > - what are the requirements for cross-DC replication?
  
  Having the data accessible from both DCs (for the DC failover case) should be 
a requirement.
  
  > - what transactional consistency requirements exist?
  
  Not any super important requirements here.
  If we write to the store we would love for it to be written and readable in 
the next second.
  Writes for a single key will not really happen too close together, probably 
multiple seconds between them.
  Interaction between keys and order of writes being committed to the store 
isn't really important.
  
  > - what's the access pattern? Is a plain K/V store sufficient, or are other 
kinds of indexes/queries needed?
  
  Just a plain K/V store.
  
  > Also, so you have a specific storage technology in mind? In discussions 
about this, Cassandra seems to regularly pop up, but it's not in the proposal. 
As far as I know, there is currently no good way to access Cassandra directly 
from MW core (not abstraction layer, but apparently also no decent PHP driver 
at all, and IIRC there are also issues with network topology).
  
  For technology we don't have any particular preferences, whatever works for 
the WMF, ops and tech comm.
  Ideally something that we would be able to get access to and start working 
with sooner rather than later.
  
  > I was hoping for @Joe and @mobrovac to ask more specific questions, but 
they are both on vacation right now. Perhaps get together with them to hash out 
a proposal when they are back.
  
  More than happy to try and hash this out a bit more in this ticket before 
passing it back to a tech comm meeting again.
  It'd be great to try and make some progress here in the coming month.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: mobrovac, abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, 
daniel, Agabi10, Aklapper, Addshore, darthmon_wmde, holger.knust, Nandana, 
kostajh, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, 
rosalieper, D3r1ck01, Pchelolo, SBisson, Eevans, 

[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-06-26 Thread daniel
daniel added a comment.


  In T214362#5284886 , 
@Marostegui wrote:
  
  > Just a quick question: "this would fit a generalized parser cache 
mechanism" meaning it would fit into the existing parsercache mechanism (and 
infrastructure) or is that still to be defined?
  
  
  It doesn't fit the current mechanism, since we would need an additional key 
(or key suffix).
  
  The infrastructure for the new generalized cache is not yet defined. It would 
cover the functionality of the current parser cache (Memcached+SQL) and the 
Parsoid cache (currently Cassandra). It's not yet clear which of the two the 
unified mechanism would use, or if it should use something else entirely. So 
far, the generalized parser cache is just an idea. There is no plan yet.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, 
Agabi10, Aklapper, Addshore, darthmon_wmde, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
Pchelolo, SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, 
jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, 
Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-06-25 Thread Marostegui
Marostegui added a comment.


  Just a quick question: "this would fit a generalized parser cache mechanism" 
meaning it would fit into the existing parsercache mechanism (and 
infrastructure) or is that still to be defined?
  Thanks!

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Marostegui
Cc: abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, 
Agabi10, Aklapper, Addshore, darthmon_wmde, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
Pchelolo, SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, 
jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, 
Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-06-25 Thread daniel
daniel added a comment.


  Side note: we discussed this use case at the CPT offsite. It seems like this 
would fit a generalized parser cache mechanism. This is something we will have 
to look into for the integration of Parsoid in MW core anyway, but it's at 
least half a year out, still.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, 
Agabi10, Aklapper, Addshore, darthmon_wmde, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, 
santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-06-16 Thread Addshore
Addshore added a comment.


  In T214362#5259988 , 
@daniel wrote:
  
  > @Addshore if you think no further discussion is needed and this is ripe for 
a decision, I'll propose to move this to last call at our next meeting. I'll 
put it in the RFC inbox for that purpose.
  
  
  Yup, it seems like it is ready.
  
  > In general, if you want TechCom to make a call on the status of an RFC or 
propose for it to move to a different stage in the process, drop it in the 
inbox, with a comment.
  
  Gotcha

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, 
Agabi10, Aklapper, Addshore, darthmon_wmde, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, 
santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-06-14 Thread Addshore
Addshore added a comment.


  Just poking this a few months down the line, as far as I know this still 
rests with #techcom  for a 
decision if discussion has finished?

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, 
Agabi10, Aklapper, Addshore, darthmon_wmde, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, 
SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, 
santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-03-19 Thread Addshore
Addshore added a comment.


  Just to be clear on this RFC regarding my above comment, we are not waiting 
for a reply from @Lydia_Pintscher here. The decision is to keep the behaviour 
for the user the same. This still allows us to only need to write to storage 
during POSTs and via the job queue.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, 
Aklapper, Addshore, alaa_wmde, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, D3r1ck01, SBisson, Eevans, 
mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, 
Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-25 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  > and once we have data for all items persistently stored in theory the user 
would never ask for an items constraint check and it not be there (thus no 
writing to the storage on request)
  
  
  
  > Once the storage is fully populated this isn't even a case we need to think 
about.
  
  I don’t think these are true – I think it will still be possible that we 
realize after retrieving the cached result that it is stale, because some 
referenced page has been edited in the meantime, or because a certain point in 
time has passed.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, Aklapper, 
Addshore, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, 
LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, 
Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-21 Thread Addshore
Addshore added a comment.


  In T214362#4970952 , 
@daniel wrote:
  
  > In T214362#4970942 , 
@Addshore wrote:
  >
  > > We can continue to generate constraint check data on the fly when missing 
in GETs and simply not put it in the storage.
  >
  >
  > you could trigger a job in that case. the job may even contain the 
generated data, though that may get too big in some cases.
  
  
  Yes we could still trigger a job on the GET :)
  Probably cleaner to just have it run again, this won't be a high traffic use 
case and will slowly vanish so no need to worry about the duplicated effort / 
wasted cycles.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, 
LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, 
Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-21 Thread daniel
daniel added a comment.


  In T214362#4970942 , 
@Addshore wrote:
  
  > We can continue to generate constraint check data on the fly when missing 
in GETs and simply not put it in the storage.
  
  
  you could trigger a job in that case. the job may even contain the generated 
data, though that may get too big in some cases.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, 
LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, 
Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-21 Thread Addshore
Addshore added a comment.


  I think we can easily have this only persist to the store via the Job or a 
POST.
  
  We can continue to generate constraint check data on the fly when missing in 
GETs and simply not put it in the storage.
  Once the storage is fully populated this isn't even a case we need to think 
about.
  Purges of the stored data would then happen via a post (similar interface to 
page purges).

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, 
LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, 
Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-20 Thread Joe
Joe added a comment.


  In T214362#4967944 , 
@Addshore wrote:
  
  >
  
  
  
  
  > We currently still want to be able to compute the check on demand, either 
because the user wants to purge the current constraint check data, or if the 
check data does not already exist / is outdated.
  >  It could be possible that later down the line we put the purging of this 
data into the job queue too, and once we have data for all items persistently 
stored in theory the user would never ask for an items constraint check and it 
not be there (thus no writing to the storage on request)
  >  But that is not the immediate plan.
  
  My point here is quite subtle but fundamental - if we can split reads and 
write to this datastore based on the HTTP verb, so that constraints would be 
persisted only via either
  a - a specific job enqueued (by user request or by
  b - a POST request
  it would be possible to store these data in the cheapest k-v storage we have, 
the ParserCache. That would allow typically be cheaper and faster than using a 
distributed k-v storage like Cassandra, which I'd reserve for things that need 
to be written to from multiple datacenters.

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Joe
Cc: Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, 
LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, 
Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-20 Thread Addshore
Addshore added a comment.

In T214362#4954428, @Joe wrote:
In order to better understand your needs, let me ask you a few questions:


Do we need/want just the constraint check for the latest version of the item, or one for each revision?



Currently there is only the need to store the latest constraint check data for an item.


How will we access such constraints? Always by key and/or full dump, or other access patterns can become useful/interesting in the future?


There are currently no other access patterns on the horizon.
(storing the data like this will allow us to load it into the WDQS and query it from there)


Given those values will only be updated via the jobqueue, we don't need active-active write capabilities in the storage, or you still want to be able to compute the check on-demand and thus a/a storage is recommendable?


We currently still want to be able to compute the check on demand, either because the user wants to purge the current constraint check data, or if the check data does not already exist / is outdated.
It could be possible that later down the line we put the purging of this data into the job queue too, and once we have data for all items persistently stored in theory the user would never ask for an items constraint check and it not be there (thus no writing to the storage on request)
But that is not the immediate plan.

I'm not sure how the "full dump" would need to work, but it would seem natural that such data would be fed to wdqs with the same mechanism that updates the items.

The main regular updates for the WDQS will come via events in kafka, and then the WDQS retrieving the constraint check data from an MW API in the same way that it retrieves changes to entities.
The "full dump" is needed when starting a WDQS server off from a fresh start.
The full dump could just be a PHP script that iterates through the storage for all entities and slowly dumps the data to disk (similar to our dumpJson or dumpRdf scripts, and similar to a regular MW dump)TASK DETAILhttps://phabricator.wikimedia.org/T214362EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-14 Thread Joe
Joe added a comment.
In order to better understand your needs, let me ask you a few questions:


Do we need/want just the constraint check for the latest version of the item, or one for each revision?
How will we access such constraints? Always by key and/or full dump, or other access patterns can become useful/interesting in the future?
Given those values will only be updated via the jobqueue, we don't need active-active write capabilities in the storage, or you still want to be able to compute the check on-demand and thus a/a storage is recommendable?


I'm not sure how the "full dump" would need to work, but it would seem natural that such data would be fed to wdqs with the same mechanism that updates the items.TASK DETAILhttps://phabricator.wikimedia.org/T214362EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JoeCc: Joe, daniel, Agabi10, Aklapper, Addshore, Nandana, kostajh, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, D3r1ck01, SBisson, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs