[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2023-04-01 Thread Pppery
Pppery edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pppery
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331, Themindcoder, Adamm71, Jersione, Hellket777, 
LisafBia6531, 786, Biggs657, Juan90264, Alter-paule, Beast1978, Un1tY, Hook696, 
Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, 
Bsandipan, Lewizho99, Maathavan, Neuronton
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-07-20 Thread Addshore
Addshore added a comment.


  When T120242: Consistent MediaWiki state change events | MediaWiki events as 
source of truth  is ready we could 
probably change some of the architecture and process around dumping for 
Wikidata.org
  
  We would likely keep the existing scripts as they are for the Wikibase 
usecases, and may still want to create a script that generates TTL/RDF from a 
JSON dump
  
  For Wikidata.org we could move towards
  `edit--> kafka --> streaming job --> WMF-API(get content) --> store on HDFS`
  And then from HDFS generate JSON, RDF, TTL dumps much faster and consistently
  
  This will likely tie into the ongoing wikidata / wikibase subsetting 
discussions too, as subsetting dumps from HDFS will be much easier than while 
using the existing systems.
  See T46581: Partial dumps  etc.
  
  But most of this probably lives in separate tickets.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-22 Thread Addshore
Addshore removed a project: Wikidata-Campsite.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, 
maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, 
Jonas, Lydia_Pintscher
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-19 Thread dcausse
dcausse added a comment.


  Indeed, the RDF data is available in the hive table `discovery.wikibase_rdf` 
but it is generated reading the TTL dumps so it might not help for this 
particular task.
  Using hadoop will indeed allow to process the json efficiently but has 
drawbacks as already pointed out:
  
  - requires maintaining the Wikibase -> RDF projection in multiple codebases 
(PHP wikibase & in spark)
  - once created from the hadoop cluster it will have to be pushed back to the 
labstore machine for public consumption and might add extra delay

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, 
maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-19 Thread JAllemandou
JAllemandou added a subscriber: dcausse.
JAllemandou added a comment.


  Info: There already is in the cluster a job doing `TTL -> RDF` conversion. 
The TTL dumps are imported weekly, and converted to blazegraph RDF once 
available.
  The job is maintained by the Search Platform team (ping @dcausse ' :).

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, 
maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-14 Thread Addshore
Addshore added a comment.


  In T94019#5131531 , 
@JAllemandou wrote:
  
  > The analytics hadoop cluster could also be of use here: the task can easily 
take advantage of parallelization.
  
  Indeed, and it already gets the JSON dumps loaded into it.
  However this would mean keeping the JSON -> RDF mapping in multiple places 
(both in PHP in Wikbiase & elsewhere to interface with hadoop).
  Though that sounds like something that we could deal with in some way?

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, 
hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, maantietaja, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-14 Thread Addshore
Addshore added a project: wdwb-tech.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, 
Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, maantietaja, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Addshore, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2020-08-03 Thread toan
toan added a project: Wikidata-Campsite.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: toan
Cc: toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, 
Liuxinyu970226, mkroetzsch, Aklapper, daniel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2020-07-29 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, 
Liuxinyu970226, mkroetzsch, Aklapper, daniel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2020-07-29 Thread gerritbot
gerritbot added a comment.


  Change 617153 had a related patch set uploaded (by Hoo man; owner: Hoo man):
  [mediawiki/extensions/Wikibase@master] Experimental support for creating 
dumps from JSON dumps
  
  https://gerrit.wikimedia.org/r/617153

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, 
Liuxinyu970226, mkroetzsch, Aklapper, daniel, Akuckartz, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs