[Wikidata-bugs] [Maniphest] T288266: Better understand the makeup of specific Wikidata object types that probably can't be dropped

2023-09-24 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288266

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Esc3300, Manuel, MPhamWMF, me, 
Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, 
karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288259: Get estimates for how many Wikidata items don't have at least 3 backlinks

2023-09-18 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288259

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288260: Get estimates for size of non-normalized values in Wikidata

2023-09-18 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288260

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288261: Determine if there are consistently used top ranked Wikidata statements, and how many of them are there

2023-09-18 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288261

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Manuel, MPhamWMF, 
Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype

2023-09-18 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288264

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: RShigapov, Lydia_Pintscher, Aklapper, AKhatun_WMF, Manuel, MPhamWMF, 
Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288265: Get estimates for Wikidata items without hot properties that are being queried

2023-09-18 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T288265

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T303831#8063021 <https://phabricator.wikimedia.org/T303831#8063021>, 
@EBernhardson wrote:
  
  > In terms of the exact code causing this, spark is terrible at telling us 
exactly where but trying to infer from the SparkUI output i think it's this 
join:
  >
  >   def getTopSubgraphItems(topSubgraphs: DataFrame): DataFrame = {
  > wikidataTriples
  >   .filter(s"predicate='<$p31>'")
  >   .selectExpr("object as subgraph", "subject as item")
  >   .join(topSubgraphs.select("subgraph"), Seq("subgraph"), "right")
  
  This is exactly the code that finds out the top subgraphs. And yes, the data 
is definitely heavily skewed, that is the nature of Wikidata and anything we do 
on Wikidata by subgraphs is going to run into similar issues. For reference, 
half of wikidata is under 1 single subgraph, and the rest half has 100s of 
subgraphs. We might need to start considering spark3.
  
  > And i suppose this is also only the first skewed join in the execution, 
there may be more later in the computations.
  
  Unfortunately, yes. `subgraph_query_mapping` is going to be another big feat 
I believe, it has similar joins and writes data daily. But we will see.
  
  In T303831#8064293 <https://phabricator.wikimedia.org/T303831#8064293>, 
@EBernhardson wrote:
  
  > - Enabled subgraph_query_mapping_daily. This started waiting for 
snapshot=20220613 (last monday) with an execution_date of 20220620 (also a 
monday). I suspect we should adjust this to target snapshot=20220620, but 
waiting for confirmation. Turned back off so it doesn't timeout and complain.
  
  It is correct to look for data from last Monday, because the data of 20220620 
actually got populated the following Friday. So if the job is running on 
current data, it wont find data for Monday on the same day. All of this 
maneuver is because the input data is both weekly and daily, so every day the 
job looks for data from the last Monday.
  
  This makes me think if the same should be done for `subgraph_mapping_weekly`, 
as it looks for 20220620 on the same day, even though it will be populated the 
following Friday. This job runs weekly, same as input data.
  
  > - Enabled subgraph_query_metrics_daily.  This is waiting for 
`event.wdqs_external_sparql_query/datacenter=eqiad/year=2022/month=6/day=20` 
(and same for codfw) but it needs to be waiting on the individual hourly 
partitions.  I hadn't thought this fully through when reviewing the patch, we 
will need to adjust the sensor to use HivePartitionRangeSensor which can 
generate all the intermediate hourly named partitions. Turned back off as it's 
also waiting for outputs of subgraph_query_mapping_daily (iiuc) which is turned 
off currently.
  
  Attempting this.

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Update:
   I tested a few options in the statbox, I am not sure how much this will 
represent the prod env, but here goes:
  
  coalesce  + 8G driver memory = failed as identified by Erik 
(SparkOutOfMemoryError at topSubgraphItems, application_1655808530211_109990)
  coalesce  + 16G driver memory = failed (SparkOutOfMemoryError at 
topSubgraphItems, application_1655808530211_110190)
  repartition  + 8G driver memory = failed (Reason: Executor heartbeat timed 
out after 176110 ms, application_1655808530211_110236)
  repartition  + 16G driver memory =  failed (Reason: Executor heartbeat timed 
out after 159925 ms, application_1655808530211_110343)
  repartition + 16G driver memory + 16G executor memory = failed (Reason: 
Executor heartbeat timed out after 145549 ms, application_1655808530211_110430)
  
  need to figure out the exact place that causes OOM

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T303831#8058159 <https://phabricator.wikimedia.org/T303831#8058159>, 
@EBernhardson wrote:
  
  > the airflow patch is deployed but i only turned on *_init dags and 
subgraph_mapping_weekly today (ran out of time, will do rest tomorrow).
  >
  > subgraph_mapping_weekly failed the first time through. I updated executor 
memory from 8g to 12g but the second execution is still failing. something is 
quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs 
varying from 100MB to 450MB giving executions times of ~30s on the small ones 
and ~8m before the final one fails.
  >
  > Not specifically related to this patch, but i wonder if we could change up 
the `SparkUtils.saveTables`  method to somehow take parameters in the path to 
specify coalesce vs repartition and the number of partitions to save by, so we 
only have to update the airflow invocation and not the jar as well to test 
variations there.
  
  Should we have params called `coalesce`, and `repartition`, and have them 
default to false. And when true, use `num_partitions` to coalesce or 
repartition accordingly?

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-03-15 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Discovery-Search (Current work), 
Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a Data Analyst for Wikidata/WDQS, I would like for the metrics from 
subgraph analysis done in T293628 <https://phabricator.wikimedia.org/T293628> 
to be periodically evaluated and stored over a period of time for further 
analysis and also so that anyone can access the analysis results without having 
to do all analysis from scratch.
  
  This ticket covers productionizing:
  
  - subgraph mapping to items and triples
  - subgraph metrics: subgraph size, number of items, predicate usage etc
  - query mapping to subgraph
  - subgraph query metrics: queries per subgraph, UA distribution, query time 
distribution, items/predicates usage etc
  
  List of all possible metrics: metrics-list 
<https://docs.google.com/spreadsheets/d/1G9WBUIXwkDiVvgK9shOvehJJp4fZzftkGYnom4HtDKU/edit?usp=sharing>

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299921: Estimate benefits of splitting and federating Wikidata subgraphs

2022-03-15 Thread AKhatun_WMF
AKhatun_WMF removed AKhatun_WMF as the assignee of this task.
AKhatun_WMF moved this task from Current work to Analysis on the 
Wikidata-Query-Service board.

TASK DETAIL
  https://phabricator.wikimedia.org/T299921

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, MPhamWMF, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-20 Thread AKhatun_WMF
AKhatun_WMF moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
AKhatun_WMF added a comment.


  The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Item_ORES_Score_Analysis>

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Lydia_Pintscher, JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, 
Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, 
calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, 
EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, 
Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-18 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T288262#7629267 <https://phabricator.wikimedia.org/T288262#7629267>, 
@Lydia_Pintscher wrote:
  
  > @AKhatun_WMF: You mention on the wiki that some Items don't have an ORES 
score. All Items should have one 😬 Do you have an example of one that does not?
  
  Oh, it's not that they don't have a score per se. They're just not in the 
event data table, so I could not get a score for them to analyze. I will 
clarify that!
  If we could run an event for all existing items, we could get scores for all 
items. The way the table is populated at present, it only produces scores for 
the latest revisions I believe.

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Lydia_Pintscher, JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, 
Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, 
calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, 
EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, 
Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-18 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T288262#7628599 <https://phabricator.wikimedia.org/T288262#7628599>, 
@MPhamWMF wrote:
  
  > @AKhatun_WMF , sorry, it's been a while since I wrote this, but I think 
what I meant when I wrote the question about "optimal separation" is given some 
distribution of ORES scores (e.g. a normal distribution), is it clear what the 
threshold is for what qualifies as a "high" vs "low" score: e.g. anything over 
.75 is a high score. But that's assuming the scores are continuous. I guess 
it's moot if they're binary (I don't actually know).
  >
  > If this isn't a sensible way of thinking about the issue, let me know if 
there's a better way.
  
  Ah, that I believe is already solved by the output of the model. Basically, 
we get probabilities for 5 classes (A to E) determining how good an item is, 
where A is the best and E is the worst. And then the score is calculated as 
`5*ProbabilityOfClassA + 4*ProbabilityOfClassB + 3*ProbabilityOfClassC + 
2*ProbabilityOfClassD + 1*ProbabilityOfClassE`. But we can definitely define 
our own thresholds as well.
  
  The analysis is done here: Wikidata_Item_ORES_Score_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Item_ORES_Score_Analysis>
  
  I will be doing a bit more to get the scores per subgraph and will add it 
here as well.

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, 
MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, 
LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, 
notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-17 Thread AKhatun_WMF
AKhatun_WMF added subscribers: dcausse, JAllemandou.
AKhatun_WMF added a comment.


  @MPhamWMF Hi, could you please clarify the question `Is there an optimal 
separation between high/low ORES scores?`. Separation in what respect?  To my 
mind comes the separation of items with respect to the subgraph it is part of.
  
  cc: @JAllemandou @dcausse

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, 
MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, 
LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, 
notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-11 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  @ACraze Indeed! I was confusing the models for revision (item quality) with 
edits (damaging/good faith). The latest revision is all I will need. Thank you!

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, 
Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, 
_jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score

2022-01-09 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T288262

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF moved this task from Incoming to Needs Reporting on the 
Discovery-Search (Current work) board.
AKhatun_WMF added a comment.


  Counts of queries and triples for astronomical objects were done here: 
Wikidata_Subgraph_Query_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis>,
 along with the top ~300 large subgraphs.
  For the specific case of Astronomical objects (and only astronomical 
objects), a list of all its subclasses was obtained and manually inspected for 
relevance to astronomical objects. The subclass list also consists of 
`subclasses of subclasses` and so on.
  
  - Percent of triples: 8.7%
  - Percent of entities: 8.9%
  - Days to recover: 245
  - Query count: 2.5M
  - Percent of queries: 1.3%
  - Percent time of all queries: 0.5%

TASK DETAIL
  https://phabricator.wikimedia.org/T288257

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
AKhatun_WMF added a comment.


  Details can be found here: Wikidata_Subgraph_Query_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis>

TASK DETAIL
  https://phabricator.wikimedia.org/T295188

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T288257

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293631: Get estimates for splitting other large subgraphs from Wikidata

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF added a project: Discovery-Search (Current work).
AKhatun_WMF added a comment.


  With the completion of T293632 <https://phabricator.wikimedia.org/T293632> 
and T293636 <https://phabricator.wikimedia.org/T293636>, this task is complete.

TASK DETAIL
  https://phabricator.wikimedia.org/T293631

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, MPhamWMF, JAllemandou, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF moved this task from incoming to in progress on the Wikidata board.
AKhatun_WMF added a comment.


  With the completion of T293632 <https://phabricator.wikimedia.org/T293632> 
and T293636 <https://phabricator.wikimedia.org/T293636>, this task is complete.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

WORKBOARD
  https://phabricator.wikimedia.org/project/board/71/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs

2022-01-05 Thread AKhatun_WMF
AKhatun_WMF moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
AKhatun_WMF added a comment.


  The analysis was completed and documented here: 
Wikidata_Subgraph_Query_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis>

TASK DETAIL
  https://phabricator.wikimedia.org/T293636

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-11-14 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, EChetty, 
toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 
4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331, 
jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-11-11 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, EChetty, 
toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 
4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331, 
jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs

2021-11-11 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T293636

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-11-08 Thread AKhatun_WMF
AKhatun_WMF added a project: Discovery-Search (Current work).
AKhatun_WMF added a comment.


  Some analysis was done here:
  
  - Property usage across subgraphs: Predicates_across_subgraphs 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis#Predicates_across_subgraphs>
  - Top predicates also used in scholarly articles: 
Top_properties_used_in_other_subgraphs 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis#Top_properties_used_in_other_subgraphs>
  
  Suggested analysis:
  
  - Categorize usage type of properties:
- Similar distribution of use across subgraphs
- Have X% usage in Y subgraphs
- Used in lots of small subgraphs, used in small quantity in all subgraphs
- Entropy over the power-law distribution of the property across subgraphs 
(spark udf entropy)
  - This will give us a single number to represent the distribution of a 
property
  - WIll incorporate the distribution as well as the variability of 
property usage
  - The entropy distribution will tell us what kinds of properties we have on 
hand
  
  The suggested analysis could be done through a new ticket if required later 
on.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata

2021-11-08 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  The analysis was completed and documented here: 
https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis

TASK DETAIL
  https://phabricator.wikimedia.org/T293632

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure

2021-11-08 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Sources:
  
  - T275068 <https://phabricator.wikimedia.org/T275068>
  - T293632 <https://phabricator.wikimedia.org/T293632> 
Wikidata_Subgraph_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis>
  - T281854 <https://phabricator.wikimedia.org/T281854> 
Wikidata_Scholarly_Articles_Subgraph_Analysis 
<https://wikitech.wikimedia.org/w/index.php?title=User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis>
  - T293636 <https://phabricator.wikimedia.org/T293636> TODO: Query count 
analysis for the subgraphs
  
  
  
  | Name  | % of entities | % of triples | number of 
days for Blazegraph to recover at current rate of growth | % of queries 
potentially affected (monthly) |
  | - | - |  | 
-- | 
--- |
  | description   |   | 20   | 518  
  | 12  
|
  | external id   |   | 9| 239  
  | 30  
|
  | label |   | 4| 104  
  | 48  
|
  | altLabel  |   | 0.8  | 21   
  | 16  
|
  | name  |   | 0.6  | 16   
  | 8   
|
  | lexicographical entities  | 8 |  | 10   
  | 0.09
|
  | scholarly article | 40| 50   | 1370 
  | 2   
|
  | astronomical object   | 9 | 9| 238  
  | 
|
  | human | 10| 7| 200  
  | 
|
  | Wikimedia category| 5 | 6| 157  
  | 
|
  | taxon | 3.4   | 3| 77   
  | 
|
  | family name   | 0.5   | 1.4  | 40   
  | 
|
  | Wikimedia disambiguation page | 1.5   | 1.4  | 37   
  | 
|
  | gene  | 1.3   | 0.9  | 25   
  | 
|
  | Wikimedia template| 0.9   | 0.9  | 23   
  | 
|
  | chemical compound | 1.3   | 0.7  | 19   
  | 
|
  |
  
  The numbers were rounded. Only the top 10 subgraphs were listed. More can be 
found here: Table_of_top_50_subgraph_information 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis#Table_of_top_50_subgraph_information>

TASK DETAIL
  https://phabricator.wikimedia.org/T295188

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure

2021-11-08 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Sources:
  
  - T275068 <https://phabricator.wikimedia.org/T275068>
  - Wikidata_Subgraph_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis>
  - Wikidata_Scholarly_Articles_Subgraph_Analysis 
<https://wikitech.wikimedia.org/w/index.php?title=User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis>
  
  
  
  | number/% of entities | number/% of triples | number of days for Blazegraph 
to recover at current rate of growth | number/% of queries potentially affected 
|
  |  | --- | 
-- | 
 |
  | ok   | nai | ok 
| ai   |
  | ok   | nai | ok 
| ai   |
  | ok   | nai | ok 
| ai   |
  | ok   | nai | ok 
| ai   |
  |

TASK DETAIL
  https://phabricator.wikimedia.org/T295188

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype

2021-10-19 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  > Basically Wikidata's Properties have a datatype.
  
  Ah, datatype of properties.
  
  > I am not seeing that in the analysis you linked but maybe I am overlooking 
something.
  
  The one I listed is for datatype of objects, so you didn't miss anything. 
  Thank you for clarifying! It should be fairly easy to find out as well :)

TASK DETAIL
  https://phabricator.wikimedia.org/T288264

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a subscriber: Lydia_Pintscher.
AKhatun_WMF added a comment.


  @Lydia_Pintscher 
  Is this ticket asking for counts of various datatype used in WIkidata? Both 
URI and literals.
  Does wikitech:User:AKhatun/Wikidata_Basic_Analysis#Object 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Basic_Analysis#Object>
 help?

TASK DETAIL
  https://phabricator.wikimedia.org/T288264

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T293632

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service.

TASK DESCRIPTION
  As a Data Analyst for Wikidata and WDQS, I would like to know how often the 
large subgraphs in Wikidata are queried. The aim is to get an estimate of the 
gain (or loss) of splitting them from Wikidata.
  
  Questions:
  
  - How many queries touch on the large subgraph of Wikidata
  - Analysis of those queries in terms of query time, user agent, etc
  - How many queries span across multiple subgraphs (to estimate how much query 
federation might be required)

TASK DETAIL
  https://phabricator.wikimedia.org/T293636

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T293632

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service.

TASK DESCRIPTION
  As a Data Analyst for Wikidata and WDQS, I would like to know what are the 
other large subgraphs in Wikidata (besides scholarly articles and astronomical 
objects) and the connectivity between them. The aim is to get an estimate of 
the gain (or loss) of splitting them from Wikidata.
  
  Subgraphs in Wikidata
  
  - What are the various large subgraphs (found using P31 
<https://phabricator.wikimedia.org/P31> and possible merge of obviously similar 
groups)
  - What are their sizes, how many items they have
  - Connectivity among these subgraphs
- What properties do these subgraphs commonly use and what properties 
overlap among them
- What items overlap
- How many triples connect multiple subgraphs (through items, e.g 
`?item_of_subgraph1 Pxx ?item_of_subgraph2`)

TASK DETAIL
  https://phabricator.wikimedia.org/T293632

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293631: Get estimates for splitting other large subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service.

TASK DESCRIPTION
  As a Data Analyst for Wikidata and WDQS, I would like to know what are the 
other large subgraphs in Wikidata (besides scholarly articles and astronomical 
objects) and how often they are queried. The aim is to get an estimate of the 
gain (or loss) of splitting them off of Wikidata.
  
  This task has 2 parts:
  
  - Identifying and analyzing the subgraphs themselves
  - Query analysis of the queries that touch on these subgraphs

TASK DETAIL
  https://phabricator.wikimedia.org/T293631

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, MPhamWMF, JAllemandou, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a subtask: T288257: Get estimates for size of astronomical 
objects and queries in Wikidata graph.

TASK DETAIL
  https://phabricator.wikimedia.org/T282790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, 
Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping 
data from Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T288257

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a subtask: T281854: Get baseline measurements/expectations 
for splitting scholarly articles from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T282790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, 
Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping 
data from Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a subtask: T288257: Get estimates for size of astronomical 
objects and queries in Wikidata graph.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T293628: Get baseline 
measurements/expectations for splitting various subgraphs from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T288257

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a subtask: T281854: Get baseline measurements/expectations 
for splitting scholarly articles from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T293628: Get baseline 
measurements/expectations for splitting various subgraphs from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a subtask: T291205: Analysis: Property usage by items' P31.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T293628: Get baseline 
measurements/expectations for splitting various subgraphs from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a subtask: T293628: Get baseline measurements/expectations 
for splitting various subgraphs from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T282790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, 
Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T282790: [EPIC] Get estimates for dropping 
data from Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a Data Analyst for Wikidata and WDQS, I would like to know what are the 
various large subgraphs in Wikidata and what are the benefits/losses of 
splitting them off from Wikidata. The aim is to identify large subgraphs 
besides those already known (scholarly articles, astronomical objects) and find 
out how often these subgraphs are queried. This can be estimated from:
  
  - The subgraph sizes
  - Connection of subgraphs to other subgraphs
  - Number of queries that inquire of this subgraph
  - Number of queries that span multiple subgraphs (estimation of how much 
federation load)

TASK DETAIL
  https://phabricator.wikimedia.org/T293628

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-09-27 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Astronomical objects are structured hierarchically and so not everything is 
direct `instance of` Q6999 <https://www.wikidata.org/wiki/Q6999> (unlike 
scholarly articles).
  
  Considering all subclasses of  Q6999 <https://www.wikidata.org/wiki/Q6999>, 
the number of astronomical objects form ~9% of all Wikidata entities. (sparql 
query 
<https://query.wikidata.org/#SELECT%20%28count%28%2a%29%20as%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%5B%5D%20wdt%3AP31%2Fwdt%3AP279%2a%20wd%3AQ6999.%0A%7D>)
  And an approximation of the number of triples 'related to' these entities is 
7.5% (~1B) of all Wikidata triples. Approximated from top 10 subclasses (which 
are 7% of all entities)

TASK DETAIL
  https://phabricator.wikimedia.org/T288257

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF edited projects, added Discovery-Search (Current work); removed 
Discovery-Search.

TASK DETAIL
  https://phabricator.wikimedia.org/T291190

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a project: Discovery-Search.

TASK DETAIL
  https://phabricator.wikimedia.org/T291190

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Query analysis report for some vertical slices of Wikidata: 
Wikidata_Vertical_Analysis#Query_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#Query_Analysis>
  Summary: Wikidata_Vertical_Analysis#TL;DR 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#TL;DR>

TASK DETAIL
  https://phabricator.wikimedia.org/T291190

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Here is the analysis done on scholarly articles in Wikidata and WDQS queries 
related to them: 
https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure

2021-09-17 Thread AKhatun_WMF
AKhatun_WMF added a subtask: T291190: Determine cost-benefit of doing vertical 
data slicing on WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T282790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, 
Invadibot, maantietaja, NavinRizwi, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS

2021-09-17 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T282790: [EPIC] Get estimates for dropping 
data from Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T291190

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T289754: Triple level deduplication

2021-08-25 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata.

TASK DESCRIPTION
  The deduplication of wikibase RDF dumps happens at the quads level at the 
moment. i.e, context, subject, predicate, object. Therefore, even after 
deduplication, some triples were found to have different contexts, creating 
duplicates in the triple-level (subject, predicate, object). It may be required 
down the line to contain distinct triples in the dump, but since all these 
duplicates are related to wikipages, it is not required for analysis at the 
immediate present.
  
  - Number of duplicate triples = ~170K (Total triples: 12.9B)
  - Number of distinct triples that have duplicates: 5K
  
  A snippet of the duplicate triples:
  
  | Subject   | Predicate   
| Object| 
Number of different Contexts |
  | https://zh.wikiquote.org/ | 
http://wikiba.se/ontology#wikiGroup | "wikiquote"   
| 306  |
  | https://ta.wikinews.org/  | 
http://wikiba.se/ontology#wikiGroup | "wikinews"
| 302  |
  | https://ps.wikipedia.org/ | 
http://wikiba.se/ontology#wikiGroup | "wikipedia"   
| 301  |
  | https://fo.wikipedia.org/ | 
http://wikiba.se/ontology#wikiGroup | "wikipedia"   
| 301  |
  | https://am.wiktionary.org/| 
http://wikiba.se/ontology#wikiGroup | "wiktionary"  
| 37   |
  | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://schema.org/Article 
| 2|
  | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | http://schema.org/name  
| "Sjabloon:Navigatie voetbal Nederland Derde di... | 3 
   |
  | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | http://schema.org/name  
| "Sjabloon:Navigatie voetbalclubs Nijkerk"@nl  | 2 
   |
  |
  
  All predicates involved among the duplicate triples:
  
  | Predicate   | Number of occurrences |
  | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | 1049  |
  | http://schema.org/inLanguage| 1049  |
  | http://schema.org/name  | 1049  |
  | http://schema.org/isPartOf  | 1049  |
  | http://wikiba.se/ontology#wikiGroup | 868   |

TASK DETAIL
  https://phabricator.wikimedia.org/T289754

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, Aklapper, dcausse, AKhatun_WMF, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T289753: Optimize deduplication of triples when loading into wikibase RDF dumps

2021-08-25 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata.

TASK DESCRIPTION
  The deduplication of triples as of now is not optimized. It takes ~3hrs, 
previously took ~1hr without deduplication, but it works nonetheless.
  
  @JAllemandou suggested few optimizations may be possible for the process of 
deduplication. This task is aimed to handle the possible optimizations.

TASK DETAIL
  https://phabricator.wikimedia.org/T289753

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, dcausse, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287225: Add all prefixes defined in Blazegraph

2021-08-10 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Hi, thanks for the deploy! 
  Can we re-run the previous jobs? All preferably, since the analysis will 
require previous data.

TASK DETAIL
  https://phabricator.wikimedia.org/T287225

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: EBernhardson, JAllemandou, Aklapper, dcausse, Lucas_Werkmeister_WMDE, 
MPhamWMF, Gehel, AKhatun_WMF, Biggs657, Invadibot, Lalamarie69, maantietaja, 
Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T281854#7266495 <https://phabricator.wikimedia.org/T281854#7266495>, 
@EgonWillighagen wrote:
  
  > @AKhatun_WMF, when you write "authors connected to other subgraphs", do you 
mean subgraphs within Wikidata (so, excluding external identifiers), or also 
graphs from other resources part of, for example, the Linked Open Data Cloud?
  
  I mean within wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T286436: Deduplicate triples when loading the wikibase RDF dumps into hive

2021-07-26 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T286436

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T286436: Deduplicate triples when loading the wikibase RDF dumps into hive

2021-07-26 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Joseph will suggest an optimization to this task when he is back. For now a 
simple `.distinct()` has been done on Spark dataframe to facilitate analysis on 
Wikidata dumps.

TASK DETAIL
  https://phabricator.wikimedia.org/T286436

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T281854#7062631 <https://phabricator.wikimedia.org/T281854#7062631>, 
@Fnielsen wrote:
  
  > Some of the statistics that is wanted are listed on Scholia, currently on 
the frontpage: https://scholia.toolforge.org/ (UPDATE: now here: 
https://scholia.toolforge.org/statistics)
  >
  > "percentage, number of Wikidata entities that are scholarly article": 
  > 37.246.721  Scholarly articles, so 37/97 ~ 40% are scholarly articles.
  
  Could I get an idea of what the 97 was and where the number was listed maybe?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287225: Add all prefixes defined in Blazegraph

2021-07-22 Thread AKhatun_WMF
AKhatun_WMF created this task.
AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata, Discovery-Search 
(Current work).

TASK DESCRIPTION
  As of now, the Jena parser fails if it cannot find some prefix definitions.
  
  We would like to include a list of all prefixes defined in Blazegraph by 
reusing those declared in other parts of the code, instead of listing them 
separately for the parser.

TASK DETAIL
  https://phabricator.wikimedia.org/T287225

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: JAllemandou, Aklapper, dcausse, Esc3300, Lucas_Werkmeister_WMDE, MPhamWMF, 
Gehel, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-19 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  @dcausse: Yes, just adding the prefix declaration in Jena parser is what we 
want to do.
  @JAllemandou: Should I add the other prefixes as well?

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, MPhamWMF, Lucas_Werkmeister_WMDE, Esc3300, dcausse, Aklapper, 
AKhatun_WMF, JAllemandou, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-16 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  - For June, the average daily successful parsing rate was **~85%**. Ranging 
from 75% to 90%. Note that this only includes queries with status 200 and 500.
  - 11% of the distinct queries ran into errors related to prefixes. The number 
of distinct queries due to each prefix is shown below. By adding the first 4 
prefixes (mwapi, geof, foaf, gas) into the query processors' prefix list  the 
average daily successful parsing rate was >96%. A few prefixes were off 
slightly (data instead of wdata, ref instead of wdref. These account for very 
few queries, but I fixed them nevertheless.)
  
  | **prefix_name** | **count ** |
  | mwapi   | 7419357  |
  | geof| 54183  |
  | foaf| 17198  |
  | gas | 13753  |
  | wds | 2761   |
  | wdv | 216|
  | fn  | 62 |
  | dc  | 50 |
  | mediawiki   | 23 |
  | wdref   | 22 |
  | wdata   | 3  |
  |
  
  Total distinct queries: 67467327
  
  - Other errors included:
- `Variable used when already in-scope`. This happened when the same 
variable was reused in a query. Testing such queries in WDQS returns results 
nicely. These form 2% of the errors in distinct queries.
- Another notable error is the `WITH` clause. Although it runs well in 
WDQS, parser doesn't accept it. These form 2.5% of the distinct queries.
  
  It seems including the prefixes should fix things, but should we also think 
of fixing the other two errors (although small in number). Not sure why Jena 
cannot parse them though.

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-13 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-13 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282790: Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure

2021-06-23 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Some of the vertical analyses were done as a part of familiarizing with 
wikidata. See the findings in Wikidata_Vertical_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis>. 
Will get back to this ticket when done with T282139 
<https://phabricator.wikimedia.org/T282139>.

TASK DETAIL
  https://phabricator.wikimedia.org/T282790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-22 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-22 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries

2021-06-04 Thread AKhatun_WMF
AKhatun_WMF triaged this task as "Low" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T283256

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-06-04 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283255: Create CLI job extracting info from wdqs queries

2021-06-04 Thread AKhatun_WMF
AKhatun_WMF closed this task as "Resolved".
AKhatun_WMF removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T283255

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Lalamarie69, 
Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, 
Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-06-04 Thread AKhatun_WMF
AKhatun_WMF closed subtask T283255: Create CLI job extracting info from wdqs 
queries as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-02 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Some of the suggested information to analyse or extract through this analysis 
are:
  
  - Top items
  - Top properties
  - Top subject, object types
  - Top property types
  - Top wikidata vs other predicates
  - Number of S, P, O that don't involve wikidata
- The aim is to find the size of the subgraph not concerning wikidata, i.e 
size of leaves. They are leaves because once they point to something outside of 
wikidata, they are not expanded within wikidata. Some things are not even 
exapandable like literals. If we have too many leaves, we may consider using 
property graphs (where leaves will be listed as properties of a node).

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries

2021-06-01 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Update 1 June 2021:
  
  Had a chat with @JAllemandou and based on the Wikidata Checkpoint Meeting of 
27/5/2021, we will be taking up this ticket later as required. For now, we 
focus on productionizing the existing data extracted from SPARQL queries and 
get the data flowing (T273854 <https://phabricator.wikimedia.org/T273854>).
  
  We will need more info on how to flatten the AST but so far we have talked 
about making a simple list of tuples. The order of the list shows how the AST 
was traversed and each element in the list is a tuple of Type and Value.
  e.g (operator, join), (filter, ?x+?y = ?z), (node_var, x), (extend, ?x+?y as 
?z) etc

TASK DETAIL
  https://phabricator.wikimedia.org/T283256

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-25 Thread AKhatun_WMF
AKhatun_WMF removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Lalamarie69, 
Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, 
Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries

2021-05-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Idea on how to store the SPARQL query as a list:
  Let's make a list of `generic custom class` **QueryElem[T]**. QueryElem 
contains `elemType: String` and `elem: T`.
  
  Classes for each element type needs to be created, e.g `NodeClass extends 
QueryElem`. Class defnitions of all elements are given below:
  
  For nodes:
  `elemType = "Node", elem: NodeInfo = some_node` (NodeInfo is a case class 
containing NodeType and NodeValue like `("NODE_VAR", "x")`  )
  
  For expression:
  `elemType = "expression"`
  Expressions can get quite convoluted, 1 variable, 2 variable, n variable. 
Like BIND("AK" as ?x), (?x+?y as ?z), (REGEX("[abc]*") as ?x) respecively. 
Moreover they can go very deep as well like FILTER(?x==1 || ?y==2 || ?z==3)
  **I am not entirely sure how to represent expressions**
  
  For BGP:
  `elemType = "BGP", elem: List[TripleInfo] = List(triple1, triple2, triple3, 
triple4, ...)` (TripleInfo contains NodeInfo for Sub, Pred and Obj)
  
  For services:
  `elemType = "service", serviceName:"service_name", elem: BGP`  (service_name 
like wikibase:label)
  
  For tables:
  `elemType = "table", elem: TableData`
  
  TableData is: `tableVars: List[NodeInfo], tableRow: List[Rows]`
  Row is: `List[NodeInfo]`
  
  For paths (sub path obj) :
  A path predicate is identified as `PATH` in NodeType anyways, so we can 
consider paths to be ordinary triples. Or create a special `pathTriple`
  `elemType = "pathTriple", elem: TripleInfo`
  
  For filters:
  `elemType = "filter", elem: Expression`  (Expression class as described above)
  
  For extends:
  `elemType = "extend", elem: Expression, expVar: NodeInfo`  (Expression class 
as described above)
  e.g `(?x+?y as ?z)`, here `?z` is the expVar and elem is `?x+?y`
  elem can be a single Node as well: `BIND ("AK" as ?x)`
  
  Could it be anything else? **This requires more thinking and not sure what to 
put in `elem` for extends.**
  
  Op Names:
  `elemType = "operations", elem = "join"` (elem can be join, optional, project 
etc. Sometimes elem will be redundant, like BGP, path, table etc which have 
their own classes)
  
  Let me know if and what I am missing, how else can we represent a query as 
list?

TASK DETAIL
  https://phabricator.wikimedia.org/T283256

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-20 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T282130#7100051 <https://phabricator.wikimedia.org/T282130#7100051>, 
@JAllemandou wrote:
  
  > @AKhatun_WMF That's great! could you please provide some info on expected 
data-size in parquet (for daily data for instance)? Many thanks.
  
  @JAllemandou Added estimate of daily data size.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-20 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-19 Thread AKhatun_WMF
AKhatun_WMF claimed this task.
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark

2021-05-19 Thread AKhatun_WMF
AKhatun_WMF claimed this task.
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T282129

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-19 Thread AKhatun_WMF
AKhatun_WMF closed subtask T282127: Add unit-tests to WDQS analysis toolkit as 
"Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit

2021-05-19 Thread AKhatun_WMF
AKhatun_WMF closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T282127

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit

2021-05-19 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Unit tests done, patch merged!
  
  - Created a file containing queries that pass and also a file containing 
queries that don't pass. Those are checked for correctness in the unit tests.
  - Checked correctness of extracted nodes for 2 examples queries written 
inline in the code.

TASK DETAIL
  https://phabricator.wikimedia.org/T282127

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit

2021-05-07 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T282127

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-04-23 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs