[Wikidata-bugs] [Maniphest] T179681: Add HDT dump of Wikidata

2021-03-25 Thread jjkoehorst
jjkoehorst added a comment.


  Small update from my side. After downloading the latest ttl file from 
Wikidata I receive no errors but also no output. I tried the exact command with 
a small dataset and that worked.
  
time sudo docker run -v `pwd`:/wikidata rdfhdt/hdt-cpp:v1.3.3 rdf2hdt -f 
turtle -p -i wikidata/latest-all.ttl.gz wikidata/latest-all.hdt

sudo docker run -v `pwd`:/wikidata rdfhdt/hdt-cpp:v1.3.3 rdf2hdt -f turtle 
-p  19.75s user 13.90s system 0% cpu 50:21:55.81 total
  
  So I am not exactly sure what is happening. This is the temp first 103 lines 
of the turtle file.
  
time sudo docker run -v `pwd`:/wikidata rdfhdt/hdt-cpp:v1.3.3 rdf2hdt -f 
turtle -p -i wikidata/tmp.ttl.gz wikidata/tmp.hdt   
Predicate Bitmap in 21 usp: 0 % / 5.4 % 
   
Count predicates in 17 userences: 0 % / 6.75 %  
Count Objects in 8 us Max was: 8: 0 % / 27 %  
Bitmap in 9 usx bitmap: 0 % / 39.6 %  
Bitmap bits: 56 Ones: 38
Object references in 23 usces: 0 % / 42.75 %  
Sort lists in 17 uslists: 0 % / 64.8 %  
Index generated in 119 us
sudo docker run -v `pwd`:/wikidata rdfhdt/hdt-cpp:v1.3.3 rdf2hdt -f turtle 
-p  0.04s user 0.03s system 1% cpu 4.868 total
  
  and then I can access the turtle file on the local drive.

TASK DETAIL
  https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jjkoehorst
Cc: jjkoehorst, MPhamWMF, Daniel_Mietchen, hoo, Addshore, Smalyshev, Ladsgroup, 
Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Invadibot, maantietaja, 
Akuckartz, Dinadineke, DannyS712, Nandana, tabish.shaikh91, Lahi, Gq86, 
GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
TheDJ, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T179681: Add HDT dump of Wikidata

2021-03-22 Thread Lydia_Pintscher
Lydia_Pintscher closed subtask T277662: latest all rdf dump: bad IRI scheme as 
"Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: jjkoehorst, MPhamWMF, Daniel_Mietchen, hoo, Addshore, Smalyshev, Ladsgroup, 
Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Invadibot, maantietaja, 
Akuckartz, Dinadineke, DannyS712, Nandana, tabish.shaikh91, Lahi, Gq86, 
GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
TheDJ, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T179681: Add HDT dump of Wikidata

2021-03-17 Thread Andrawaag
Andrawaag added a subtask: T277662: latest all rdf dump: bad IRI scheme.

TASK DETAIL
  https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Andrawaag
Cc: jjkoehorst, MPhamWMF, Daniel_Mietchen, hoo, Addshore, Smalyshev, Ladsgroup, 
Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, maantietaja, Akuckartz, 
Dinadineke, DannyS712, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, 
Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, abian, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T179681: Add HDT dump of Wikidata

2021-03-16 Thread jjkoehorst
jjkoehorst added a comment.


  As I was having some issues with compiling the code I used a docker instance 
directly for the conversion unfortunately it failed due to rdf syntax reasons 
while using the latest database. As I didn't time it I cannot give any details 
yet about the performance.
  
  ?  wikidata sudo docker run -v `pwd`:/wikidata rdfhdt/hdt-cpp:v1.3.3 rdf2hdt 
-p -i wikidata/latest-all.nt.gz wikidata/latest-all.hdt
  error: wikidata/latest-all.nt.gz:604276348:139: bad IRI scheme char `2F'
  Catch exception load: Error parsing input.
  ERROR: Error parsing input.

TASK DETAIL
  https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jjkoehorst
Cc: jjkoehorst, MPhamWMF, Daniel_Mietchen, hoo, Addshore, Smalyshev, Ladsgroup, 
Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, maantietaja, Akuckartz, 
Dinadineke, DannyS712, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, 
Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, abian, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T179681: Add HDT dump of Wikidata

2021-01-15 Thread Daniel_Mietchen
Daniel_Mietchen added a comment.


  The 32-bit issue at https://github.com/rdfhdt/hdt-cpp/issues/135 that was 
mentioned above seems to be resolved, so perhaps this can be revisited now?

TASK DETAIL
  https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Daniel_Mietchen
Cc: Daniel_Mietchen, hoo, Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, 
Lucas_Werkmeister_WMDE, Aklapper, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, 
JakeTheDeveloper, QZanden, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs