Addshore closed this task as "Invalid".
Addshore added a comment.
Run an empty blazegraph container.
docker run -d -p 9999:9999 --env WIKIBASE_SCHEME=https --env
WIKIBASE_HOST=intentionally-empty.wiki.opencura.com --env WDQS_HOST=localhost
--env WDQS_PORT=9999 --name demo-wdqs wikibase/wdqs:0.3.40 /runBlazegraph.sh
Wait for the service to come up, and make sure it is empty
curl
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
You should see something like this
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
<variable name='a'/>
<variable name='b'/>
<variable name='c'/>
</head>
<results>
</results>
</sparql>
Run the updater once pointing to some wikibase, and the query service we just
made
docker exec demo-wdqs /runUpdate.sh
You should see something like this, and you can kill / stop it after a few
loops (Ctrl+C)
wait-for-it.sh: waiting 300 seconds for
intentionally-empty.wiki.opencura.com:80
wait-for-it.sh: intentionally-empty.wiki.opencura.com:80 is available after
0 seconds
wait-for-it.sh: waiting 300 seconds for localhost:9999
wait-for-it.sh: localhost:9999 is available after 0 seconds
Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} -
%msg%n
18:00:17.284 [main] INFO org.wikidata.query.rdf.tool.Update - Starting
Updater 0.3.40 (a115a80eec974454d140389e1f52aad0e54913f9)
18:00:18.959 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking
where we left off
18:00:18.960 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking
for left off time from the updater
18:00:19.267 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking
for left off time from the dump
18:00:19.333 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Defaulting
start time to 30 days ago: 2021-02-15T18:00:19.333Z
18:00:20.452 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got no
real changes
18:00:20.780 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up
to 2021-02-15T18:00:19.333Z at (0.0, 0.0, 0.0) updates per second and (0.0,
0.0, 0.0) milliseconds per second
18:00:21.066 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got no
real changes
18:00:21.067 [main] INFO org.wikidata.query.rdf.tool.Updater - Sleeping
for 10 secs
18:00:31.661 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got no
real changes
18:00:31.662 [main] INFO org.wikidata.query.rdf.tool.Updater - Sleeping
for 10 secs
Run the updater again.
docker exec demo-wdqs /runUpdate.sh
This time you should see the error
wait-for-it.sh: waiting 300 seconds for
intentionally-empty.wiki.opencura.com:80
wait-for-it.sh: intentionally-empty.wiki.opencura.com:80 is available after
0 seconds
wait-for-it.sh: waiting 300 seconds for localhost:9999
wait-for-it.sh: localhost:9999 is available after 0 seconds
Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} -
%msg%n
18:00:55.545 [main] INFO org.wikidata.query.rdf.tool.Update - Starting
Updater 0.3.40 (a115a80eec974454d140389e1f52aad0e54913f9)
18:00:57.495 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking
where we left off
18:00:57.496 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking
for left off time from the updater
18:00:57.996 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Found left
off time from the updater
18:00:58.000 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during
initialization.
java.lang.IllegalStateException: RDF store reports the last update time is
before the minimum safe poll time. You will have to reload from scratch or you
might have missing data.
at
org.wikidata.query.rdf.tool.change.ChangeSourceContext.getStartTime(ChangeSourceContext.java:100)
at org.wikidata.query.rdf.tool.Update.initialize(Update.java:145)
at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
Exception in thread "main" java.lang.IllegalStateException: RDF store
reports the last update time is before the minimum safe poll time. You will
have to reload from scratch or you might have missing data.
at
org.wikidata.query.rdf.tool.change.ChangeSourceContext.getStartTime(ChangeSourceContext.java:100)
at org.wikidata.query.rdf.tool.Update.initialize(Update.java:145)
at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
This is because the timestamp recording where updates are has been set, and
is no longer "safe".
This can be seen as a triple, and is by default 30 days ago.
curl
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
<variable name='a'/>
<variable name='b'/>
<variable name='c'/>
</head>
<results>
<result>
<binding name='a'>
<uri>https://intentionally-empty.wiki.opencura.com</uri>
</binding>
<binding name='b'>
<uri>http://schema.org/dateModified</uri>
</binding>
<binding name='c'>
<literal
datatype='http://www.w3.org/2001/XMLSchema#dateTime'>2021-02-15T18:00:18Z</literal>
</binding>
</result>
</results>
</sparql>
If everything is safe to update, and you're not going to end up missing data,
you can reset this time, to a date in the last 30 days.
(Overriding what is normally done
https://github.com/wmde/wikibase-docker/blob/0c561dd6c17a918323b44c7282b5e5acccfd4e45/wdqs/0.3.40/runUpdate.sh#L9)
docker exec demo-wdqs bash -c '/wdqs/runUpdate.sh -h
http://${WDQS_HOST}:${WDQS_PORT} -- --wikibaseUrl
${WIKIBASE_SCHEME}://${WIKIBASE_HOST} --conceptUri
${WIKIBASE_SCHEME}://${WIKIBASE_HOST} --entityNamespaces
${WDQS_ENTITY_NAMESPACES} --init --start 20210301010101'
The date is now updated
curl
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
Should show something like
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
<variable name='a'/>
<variable name='b'/>
<variable name='c'/>
</head>
<results>
<result>
<binding name='a'>
<uri>https://intentionally-empty.wiki.opencura.com</uri>
</binding>
<binding name='b'>
<uri>http://schema.org/dateModified</uri>
</binding>
<binding name='c'>
<literal
datatype='http://www.w3.org/2001/XMLSchema#dateTime'>2021-03-01T01:01:00Z</literal>
</binding>
</result>
</results>
</sparql>
I'm going to close this ticket now as the scope of it is rather unclear.
The case mentioned above should not really be happening during regular
operation of a wikibase, but perhaps we need to make the last step here
(resetting the timestamp) more resilient, and perhaps the default behaviour
when using an empty wikibase a bit better.
This would need some collaboration between wmde and the wikidata query
service team.
If people have individual bugs or feature requests then new tickets are
welcome!
TASK DETAIL
https://phabricator.wikimedia.org/T186161
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Addshore
Cc: RShigapov, danshick-wmde, Samantha_Alipio_WMDE, darthmon_wmde, WMDE-leszek,
Superraptor123, Tinyttt, Louperivois, Jsamwrites, Considering.Different.Routes,
DarTar, Addshore, Andrawaag, Aklapper, maantietaja, Akuckartz, Jelabra,
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen,
rosalieper, Scott_WUaS, Asahiko, abian, despens, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs