[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2019-11-05 Thread Addshore
Addshore added a comment.


  Relevant to this ticket.
  I just wrote a blog post guiding through the process of changing concept URI 
of a wikibase and reloading data into a fresh query service.
  
https://addshore.com/2019/11/changing-the-concept-uri-of-an-existing-wikibase-with-data/
  
  @despens do you see any more actionables here?

TASK DETAIL
  https://phabricator.wikimedia.org/T197658

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Lucas_Werkmeister_WMDE, CXuesong, dbs, Tarrow, Smalyshev, tk, Aklapper, 
Addshore, despens, darthmon_wmde, Jelabra, ET4Eva, DannyS712, Nandana, Lahi, 
Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, Asahiko, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2019-10-31 Thread Smalyshev
Smalyshev added a comment.


  In T197658#5619433 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > I believe `munge.sh` applies the WDQS data differences 

 documented on the RDF Dump Format page (e. g. merge `wdata:` and `wd:`).
  
  That is correct. It also historically cleaned up some bad data coming from 
Wikidata (e.g. broken dates) due to bugs, etc. but not sure how much of that 
still applies.

TASK DETAIL
  https://phabricator.wikimedia.org/T197658

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Lucas_Werkmeister_WMDE, CXuesong, dbs, Tarrow, Smalyshev, tk, Aklapper, 
Addshore, despens, darthmon_wmde, Jelabra, ET4Eva, DannyS712, Nandana, Lahi, 
Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, Asahiko, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2019-10-31 Thread Addshore
Addshore added a comment.


  In T197658#5619433 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > I believe `munge.sh` applies the WDQS data differences 

 documented on the RDF Dump Format page (e. g. merge `wdata:` and `wd:`).
  
  Thats a great list, I was unaware of those docs.

TASK DETAIL
  https://phabricator.wikimedia.org/T197658

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Lucas_Werkmeister_WMDE, CXuesong, dbs, Tarrow, Smalyshev, tk, Aklapper, 
Addshore, despens, darthmon_wmde, Jelabra, ET4Eva, DannyS712, Nandana, Lahi, 
Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, Asahiko, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2019-10-30 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  I believe `munge.sh` applies the WDQS data differences 

 documented on the RDF Dump Format page (e. g. merge `wdata:` and `wd:`).

TASK DETAIL
  https://phabricator.wikimedia.org/T197658

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, CXuesong, dbs, Tarrow, Smalyshev, tk, Aklapper, 
Addshore, despens, darthmon_wmde, Jelabra, ET4Eva, DannyS712, Nandana, Lahi, 
Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, Asahiko, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2019-10-30 Thread Addshore
Addshore added a comment.


  If you feel that it would benefit others do you think you could submit some 
changes to docs or add a script?
  
  In T197658#4593841 , 
@despens wrote:
  
  > **Big Issue**
  > None of the query building helpers in WDQS work. The interface doesn't know 
about any properties or objects.
  
  That sounds seperate to this task?
  
  > **Questions**
  >
  > - What is required to make the query building functions in WDQS work?
  
  I believe you just need to point things to the correct mediawiki APIs
  
  > - what do `munge.sh` and `loadData.sh` do apart from splitting up a 
potentially large TTL file into smaller chunks? (Since Rhizome's data is quite 
small at the moment, we wouldn't really need to split up the data.)
  
  munge does things with triples.
  What exactly it does I can't write here and would have to dig through the 
java code.

TASK DETAIL
  https://phabricator.wikimedia.org/T197658

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: CXuesong, dbs, Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, 
darthmon_wmde, Jelabra, ET4Eva, DannyS712, Nandana, Lahi, Gq86, Darkminds3113, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, 
Asahiko, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-18 Thread despens
despens added a comment.
We have been able to put the ttl-dump into Blazegraph now with the following process:


in the wdqs container, install curl:


# apk add --no-cache curl




export ttl dump into directory shared in between containers; command on docker host:


# docker exec dockercomposefiles_wikibase_1 php extensions/Wikibase/repo/maintenance/dumpRdf.php > dumps/ttl-20180917.ttl




inside wdqs container, import ttl file by directly instructing blazegraph to load it:


# curl "http://localhost:/bigdata/namespace/wdq/sparql"  --data-urlencode "update=DROP ALL; LOAD ;"




queries are now possible via the query service, example query


Big Issue

None of the query building helpers in WDQS work. The interface doesn't know about any properties or objects.

Questions


What is required to make the query building functions in WDQS work?



what do munge.sh and loadData.sh do apart from splitting up a potentially large TTL file into smaller chunks? (Since Rhizome's data is quite small at the moment, we wouldn't really need to split up the data.)
TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-14 Thread Addshore
Addshore added a comment.

In T197658#4582768, @despens wrote:
So I guess the whole change history is not available in the Wikibase API after migrating the database from the previous install.


It is RecentChanges that you need to look at.
The age of the stuff in RC is configured by https://www.mediawiki.org/wiki/Manual:$wgRCMaxAge
Depending on the MW version it could be anywhere between 7 and 90 days. And of course this can be altered by your config.

Would it be correct then to export the full ttl dump, load it into Blazegraph, and then run the updater again?

If the changes are not in RecentChanges, then yes.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-14 Thread despens
despens added a comment.
Thank you @Smalyshev!

It seems like the --conceptUri switch is a part of munger. It is not accepted as a parameter for runUpdate.sh.

After checking this, I did also modify the Wikibase's LocalSettings.php to explicitely use http (without 's') for concept URLs:

$wgWBRepoSettings['conceptBaseUri'] = 'http://staging.catalog.rhizome.org/entity/';

The API delivers RDF with the http protocol used in all local name spaces, too: http://staging.catalog.rhizome.org/wiki/Special:EntityData/Q1996.ttl

Now, running the updater still only updates 26 edits:

bash-4.4# ./runUpdate.sh -- -w staging.catalog.rhizome.org -s 2001010100 --init
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid119.log due to No such file or directory

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
05:57:09.390 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 26 changes, from Q2596@77721@20180130201206|81595 to Q4977@77814@20180215170648|81689
05:57:09.809 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-02-15T17:06:48Z (next: 20180215170711|81690) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
05:57:09.885 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Skipping change with bogus title:  Main Page
05:57:09.887 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 9 changes, from Q4977@77815@20180215170711|81690 to Q1166@77841@20180708070530|81717
05:57:09.976 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-07-08T07:05:30Z at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
05:57:10.008 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
05:57:10.009 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
05:57:20.036 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes

So I guess the whole change history is not available in the Wikibase API after migrating the database from the previous install.

Would it be correct then to export the full ttl dump, load it into Blazegraph, and then run the updater again?TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-13 Thread Smalyshev
Smalyshev added a comment.
Any idea what is going on here?

Yes, you're using https://staging.catalog.rhizome.org/ in your data but set the concept uri to http://staging.catalog.rhizome.org/. Use --conceptUri to set the correct URI. See https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation/StandaloneTASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-13 Thread despens
despens added a comment.
When removing the alias staging.catalog.rhizome.org from the wikibase container in the docker-compose file, the connection is made via the docker network and can be established. However, the contents of the Wikibase are still rejected:

bash-4.4# ./runUpdate.sh -- -w staging.catalog.rhizome.org -s 2001010100 --init
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid125.log due to No such file or directory

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
11:04:25.240 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 26 changes, from Q2596@77721@20180130201206|81595 to Q4977@77814@20180215170648|81689
11:04:25.896 [update 4] WARN  org.wikidata.query.rdf.tool.Updater - Contained error syncing.  Giving up on Q4966
org.wikidata.query.rdf.tool.rdf.Munger$BadSubjectException: Unrecognized subjects:  [https://staging.catalog.rhizome.org/entity/statement/Q4966-3e9eee06-4352-81b8-04cc-c1526542629e, https://staging.catalog.rhizome.org/entity/statement/Q4966-2beb1833-409f-0b9f-c075-571cc6b78eb0, https://staging.catalog.rhizome.org/entity/statement/Q4966-69862d13-4a1e-5770-c402-c37cf7441093, https://staging.catalog.rhizome.org/entity/statement/Q4966-9b394b4f-4c87-6aad-a016-042256e4d990, https://staging.catalog.rhizome.org/entity/Q4966, https://staging.catalog.rhizome.org/value/2b3197a343b1554b824b915aa6ffd70f].  Expected only sitelinks and subjects starting with http://staging.catalog.rhizome.org/wiki/Special:EntityData/ and http://staging.catalog.rhizome.org/entity/
	at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.finishCommon(Munger.java:833)
	at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.munge(Munger.java:430)
	at org.wikidata.query.rdf.tool.rdf.Munger.munge(Munger.java:223)
	at org.wikidata.query.rdf.tool.Updater.handleChange(Updater.java:305)
	at org.wikidata.query.rdf.tool.Updater.lambda$handleChanges$0(Updater.java:188)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
11:04:25.897 [update 6] WARN  org.wikidata.query.rdf.tool.Updater - Contained error syncing.  Giving up on Q4967

[...repeated for hundreds of Q-ids...]

Any idea what is going on here? Is the data in the Wikibase itself structured wrongly?TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-13 Thread despens
despens added a comment.
I think there is lots of ambiguity here...

For the record: I'm entering the WDQS container, not the WDQS-updater one: 
root@wikibase-docker:~# docker exec -ti dockercomposefiles_wdqs_1  bash

From inside that docker, I'm executing the command:
bash-4.4# ./runUpdate.sh -- -w staging.catalog.rhizome.org -s 2001010100

The connection fail seems to happen when the updater tries to connect to Rhizome's Wikibase:

bash-4.4# ./runUpdate.sh -- -w staging.catalog.rhizome.org -s 2001010100
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid616.log due to No such file or directory

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
10:48:42.193 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during updater run.
java.lang.RuntimeException: org.apache.http.conn.HttpHostConnectException: Connect to staging.catalog.rhizome.org:443 [staging.catalog.rhizome.org/172.18.0.5] failed: Connection refused (Connection refused)

The connection to the Wikibase is refused. It points to an internal docker network IP address and tries to connect via HTTPS, but the docker setup doesn't provide HTTPS naturally.

However, if I'm running the updater without specifying the Wikibase host with the -w switch, it happily gets the triples of Wikidata proper into my modest Blazegraph instance:

bash-4.4# ./runUpdate.sh -- -s 2001010100 --init
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid644.log due to No such file or directory

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
10:55:06.656 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 79 changes, from Q43576244@725717221@20180814105503|761278916 to Q55870317@725717323@20180814105519|761279016
10:55:15.531 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-08-14T10:55:19Z (next: 20180814105519|761279017) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
10:55:15.762 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 77 changes, from Q23647613@725717330@20180814105520|761279026 to Q39895159@725717420@20180814105536|761279115
[...]

So to me it looks like the localhost: doesn't seem too wrong, just the source of the triples cannot be set.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-13 Thread Addshore
Addshore added a comment.
Updating via http://localhost:/bigdata/namespace/wdq/sparql

Just a hunch but if you are using the docker images then this is probably wrong.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-12 Thread Smalyshev
Smalyshev added a comment.
it doesn't seem possible to connect to Blazegraph because the updater already runs?

The errors do indicate connection to Blazegraph fails, most likely either because Blazegraph is not listening to localhost: or something in your setup prevents Updater from connecting there. It is completely fine to run more than one instance of Updater (though in most cases not recommended) so that would not be a problem.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-12 Thread despens
despens added a comment.
I think I need more guidance on how to run this script. I did use the --start and --init switches before.

When calling ./runUpdate.sh -- -v -s 2001010112 --init it doesn't seem possible to connect to Blazegraph because the updater already runs?

bash-4.4# ./runUpdate.sh -- -v -s 2001010112 --init
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid355.log due to No such file or directory

Could not start Jolokia agent: java.net.BindException: Address in use
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
18:36:01.641 [main] INFO  o.w.q.rdf.tool.options.OptionsUtils - Verbose mode activated
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
18:36:02.229 [main] DEBUG o.w.query.rdf.tool.rdf.RdfRepository - Setting last updated time to Mon Jan 01 12:00:00 GMT 2001
18:36:02.242 [main] DEBUG o.w.query.rdf.tool.rdf.RdfRepository - Running SPARQL: DELETE {
?o .
}
WHERE {
?o .
};
INSERT DATA {
"2001-01-01T12:00:00.000Z"^^ .
}

18:36:02.277 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 1, will retry
18:36:04.280 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 2, will retry
18:36:08.283 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 3, will retry
18:36:16.286 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 4, will retry
18:36:26.289 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 5, will retry
18:36:36.291 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - HTTP request failed: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused, attempt 6, will fail
18:36:36.294 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during initialization.
org.wikidata.query.rdf.tool.exception.FatalException: Error updating triple store
	at org.wikidata.query.rdf.tool.rdf.RdfRepository.execute(RdfRepository.java:732)
	at org.wikidata.query.rdf.tool.rdf.RdfRepository.updateLeftOffTime(RdfRepository.java:665)
	at org.wikidata.query.rdf.tool.Update.buildRecentChangePollerChangeSource(Update.java:156)
	at org.wikidata.query.rdf.tool.Update.buildChangeSource(Update.java:141)
	at org.wikidata.query.rdf.tool.Update.main(Update.java:65)
Caused by: com.github.rholder.retry.RetryException: Retrying failed to complete successfully after 6 attempts.
	at com.github.rholder.retry.Retryer.call(Retryer.java:174)
	at org.wikidata.query.rdf.tool.rdf.RdfRepository.execute(RdfRepository.java:721)
	... 4 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused
	at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118)
	at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:101)
	at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:639)
	at org.wikidata.query.rdf.tool.rdf.RdfRepository.lambda$execute$0(RdfRepository.java:722)
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
	at com.github.rholder.retry.Retryer.call(Retryer.java:160)
	... 5 common frames omitted
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.eclipse.jetty.io.SelectorManager.finishConnect(SelectorManager.java:340)
	at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processConnect(SelectorManager.java:671)
	at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:640)
	at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:607)
	at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:545)
	at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
	at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" org.wikidata.query.rdf.tool.exception.FatalException: Error updating triple store
	at org.wikidata.query.rdf.tool.rdf.RdfRepository.execute(RdfRepository.java:732)
	at 

[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-12 Thread Smalyshev
Smalyshev added a comment.
This error means that the timestamp stored in the database is more than 30 days behind (can be changed with wikibaseMaxDaysBack property). In this case, you can:


Load a dump that is reasonably recent
Run Updater with -s DATE --init
TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Tarrow, Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-09-07 Thread Smalyshev
Smalyshev added a comment.
This:

https://staging.catalog.rhizome.org/entity/statement/Q1166-fcdfc3a7-4c2e-d1a8-d7e0-3a5d523ca48e].  Expected only sitelinks and subjects starting with http://wikibase.svc/wiki/Special:EntityData/ and http://wikibase.svc/entity/

Looks like a sign that the concept URI base is not set up correctly - the service thinks the URI base is wikibase.svc but in fact it is staging.catalog.rhizome.org. Note that concept URI base and server URL are two different things - the former says how RDF looks like and the latter says where to go to fetch it, they can be completely different. See https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation/Standalone for more docs about it.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-07-31 Thread despens
despens added a comment.
Thanks @Smalyshev! Indeed the double dashes were the issue!

I did run the command as you provided it inside the wdqs-updater container.

It doesn't seem to change the contents of Blazegraph, there is still almost no data available at WDQS, apart from a few items.

I wonder if the change data is expired somehow in my Wikibase? The output says Found start time in the RDF store: 2018-07-08T07:05:29Z and indeed not much has happened since that time, but everything happened before!

Would it make sense to erase Blazegraph and then start ./runUpdate.sh?

Here is the output of runUpdate.sh

wait-for-it.sh: waiting 120 seconds for wikibase.svc:80
wait-for-it.sh: wikibase.svc:80 is available after 0 seconds
wait-for-it.sh: waiting 120 seconds for wdqs.svc:
wait-for-it.sh: wdqs.svc: is available after 0 seconds
Updating via http://wdqs.svc:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid99.log due to No such file or directory

Could not start Jolokia agent: java.net.BindException: Address in use
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
10:02:53.443 [main] INFO  org.wikidata.query.rdf.tool.Update - Checking where we left off
10:02:53.446 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
10:02:53.663 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
10:02:53.664 [main] INFO  org.wikidata.query.rdf.tool.Update - Found start time in the RDF store: 2018-07-08T07:05:29Z
10:02:54.014 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 1 changes, from Q1166@77841@20180708070530|81717 to Q1166@77841@20180708070530|81717
10:02:54.259 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Contained error syncing.  Giving up on Q1166
org.wikidata.query.rdf.tool.rdf.Munger$BadSubjectException: Unrecognized subjects:  [https://staging.catalog.rhizome.org/wiki/Special:EntityData/Q1166, https://staging.catalog.rhizome.org/entity/Q1166, https://staging.catalog.rhizome.org/entity/statement/Q1166-83bdfae9-469c-aaa2-8241-823fa96365e5, https://staging.catalog.rhizome.org/entity/statement/Q1166-A687AA4E-AC6C-4BCA-8010-35C23EF783C3, https://staging.catalog.rhizome.org/entity/statement/Q1166-511bb92d-497e-f37f-7b8b-2cce2730042a, https://staging.catalog.rhizome.org/entity/statement/Q1166-85603605-496B-4865-8F1C-FBB19E478DFF, https://staging.catalog.rhizome.org/entity/statement/Q1166-21D3661A-56E5-45AF-8DCB-019D30E5E3AC, https://staging.catalog.rhizome.org/entity/statement/Q1166-B2C4587C-9318-4717-B5C9-32A13B1CC2AC, https://staging.catalog.rhizome.org/entity/statement/Q1166-4d8aad6f-429f-270e-ca33-4d032f5f7f5b, https://staging.catalog.rhizome.org/entity/statement/Q1166-fcdfc3a7-4c2e-d1a8-d7e0-3a5d523ca48e].  Expected only sitelinks and subjects starting with http://wikibase.svc/wiki/Special:EntityData/ and http://wikibase.svc/entity/
	at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.finishCommon(Munger.java:833)
	at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.munge(Munger.java:430)
	at org.wikidata.query.rdf.tool.rdf.Munger.munge(Munger.java:223)
	at org.wikidata.query.rdf.tool.Updater.handleChange(Updater.java:305)
	at org.wikidata.query.rdf.tool.Updater.lambda$handleChanges$0(Updater.java:188)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
10:02:54.342 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-07-08T07:05:30Z at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
10:02:54.370 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
10:02:54.370 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
10:03:04.395 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
10:03:04.396 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
10:03:14.419 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
10:03:14.419 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
10:03:24.442 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
10:03:24.443 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
[...and so forth...]TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 

[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-07-30 Thread Smalyshev
Smalyshev added a comment.
It should be like this:

/runUpdate.sh -- -v --start 2001010112 -t 2 --verify -W https://staging.catalog.rhizome.org/ -U http://staging.catalog.rhizome.org/ --init

Note the -- part. If you want to pass arguments to Updater directly, pass them after --. The ones that go before -- are for the script (which ultimately get translated to Updater arguments too). See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#runUpdate.sh for more info.

Note that the script and Updater can have same arguments with different meaning (e.g. -t means different things) so watch where you put the args.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Smalyshev, tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-07-06 Thread despens
despens added a comment.
I am looking at runUpdate.sh inside the wdqs container, which I believe is documented here.

In order to import all statements, I tried to set a start time long in the past, yielding errors:

bash-4.4# pwd
/wdqs
bash-4.4# ./runUpdate.sh -v --start 2001010112 -t 2 --verify -W https://staging.catalog.rhizome.org/ -U http://staging.catalog.rhizome.org/ --init
./runUpdate.sh: illegal option -- v
./runUpdate.sh: illegal option -- -
Updating via http://localhost:/bigdata/namespace/wdq/sparql
OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid238.log due to No such file or directory

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
Invalid argument:  com.lexicalscope.jewel.cli.ArgumentValidationException: Option only takes one value; cannot use [2001010112]: --sparqlUrl -u value : URL to post updates and queries.
Unexpected Option: W
Unexpected Option: U
The options available are:
	[--batchSize -b value] : Number of recent changes fetched at a time.
	[--entityNamespaces value] : If specified must be numerical indexes of Item and Property namespaces that defined in Wikibase repository, comma separated.
	[--help] : Show this message
	[--idrange value] : If specified must be -. Ids are iterated instead of recent changes. Start and end are inclusive.
	[--ids value...] : If specified must be  or list of , comma or space separated.
	[--init -I] : Initialize last update time to start time
	[--keepTypes] : Preserve all types
	[--labelLanguage value...] : Only import labels, aliases, and descriptions in these languages.
	[--pollDelay -d value] : Poll delay when no updates found
	[--singleLabel value...] : Only import a single label and description using the languages specified as a fallback list. If there isn't a label in any of the specified languages then no label is imported.  Ditto for description.
	[--skipSiteLinks] : Skip site links
	--sparqlUrl -u value : URL to post updates and queries.
	[--start -s value] : Start time in 2015-02-11T17:11:08Z or 20150211170100 format.
	[--tailPoller -T value] : Use secondary poller with given gap (seconds) to catch up missed updates
	[--threadCount -t value] : Thread count
	[--verbose -v] : Verbose mode
	[--verify -V] : Verify updates (may have performance impact)
	[--wikibaseHost -w value] : Wikibase host
	[--wikibaseScheme value] : Wikidata url scheme

Does that mean I am trying to run the wrong runUpdate.sh, or is this another version from the one documented?TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-07-06 Thread despens
despens added a comment.
This issue is about clearing blazegraph and importing the complete dataset from Wikibase. This is required for migrating an existing Wikibase instance or for recovering from failure.

My approach would be to write a script running on the docker host that:


creates a local empty file
docker-mounts that file in Wikibase container
triggers the TTL export script
unmounts the file in Wikibase container
mounts the file in query service container
triggers the process for importing into blazegraph
unmounts the file from query service
removes the file


However I am wondering if that will be enough to get the full benefits of the WDQS, like name autocompletion?

Of course if there is an easier way for doing this, like telling the WDQS update service that it should get all changes from the beginning of time just once, I would totally use that. TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: tk, Aklapper, Addshore, despens, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-06-25 Thread Addshore
Addshore added a comment.
So, the service will start empty, so you shouldn't need to run a command to drop the data.

And also if you want to reload data you can just remove the docker volume the data is contained in and start again.

I'm not sure we really need a script to do this.TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Addshore, despens, Aklapper, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Gstupp, merbst, LawExplorer, Avner, Gehel, Abbe98, Jonas, FloNight, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197658: Provide easy script to reset Blazegraph

2018-06-19 Thread despens
despens added a comment.
At Rhizome we used the following command to reset Blazegraph:

curl "http://localhost:/blazegraph/namespace/kb/sparql"  --data-urlencode "update=DROP ALL; LOAD ;"TASK DETAILhttps://phabricator.wikimedia.org/T197658EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: despensCc: despens, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, Gstupp, LawExplorer, Abbe98, Wikidata-bugs, aude, Addshore, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs