[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-10-07 Thread Cmjohnson
Cmjohnson closed subtask T263125: Check for errors on wdqs1009 disks as 
"Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T263110

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, Cmjohnson
Cc: CBogen, dcausse, Aklapper, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-09-23 Thread dcausse
dcausse moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse closed this task as "Declined".
dcausse added a comment.


  I did not find anything obvious but looking at the various classes involved 
in managing the writes I see excessive locking protection and object reuse esp:
  
  - WriteCacheService 

 which keeps and reuses WriteCache instances.
  - WriteCache 

 which (protects?) wrap access to a ByteBuffer
  - DirectBufferPool 

 which according to comments seems to have issues managing its references: 
//When DEBUG is true we do not permit a buffer which was not correctly release 
to be reused// which in other words means //When DEBUG is **false** we **do** 
permit a buffer which was not correctly release to be **reused**//
  
  Given the high number different locks being used a race condition might allow 
the buffer to be written from different threads leading to the checkum being 
computed before a subsequent write to this same buffer.
  
  Recovering the journal does not seem possible.
  
  Declining as I'm not sure it is worth spending more efforts on this, finding 
the real issue in this code base seems unlikely.

TASK DETAIL
  https://phabricator.wikimedia.org/T263110

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: CBogen, dcausse, Aklapper, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-09-22 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Ready for Development to In Progress on the 
Discovery-Search (Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T263110

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: CBogen, dcausse, Aklapper, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-09-21 Thread CBogen
CBogen added a comment.


  Let's time box this task by a maximum of 1 day to see if we can find the 
cause of the error.

TASK DETAIL
  https://phabricator.wikimedia.org/T263110

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: CBogen
Cc: CBogen, dcausse, Aklapper, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-09-21 Thread Zbyszko
Zbyszko moved this task from All WDQS-related tasks to Current work on the 
Wikidata-Query-Service board.
Zbyszko added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T263110

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko
Cc: dcausse, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263110: Investigate the cause of: ChecksumError: offset=517789868032, nbytes=16, expected=-58390144, actual=535102966 while importing wikidata dumps

2020-09-17 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  While importing dump on wdqs1009 blazegraph started to fail. The journal 
seems corrupted and even restarting the service no further updates can be made.
  
  As wdqs maintainer I would like to understand why the import has failed and 
corrupted the journal so that I can workaround this problem more easily in the 
future.
  
14:44:31.194 [qtp226170135-60032] ERROR c.b.r.sail.webapp.BigdataRDFServlet 
- cause=java.util.concurrent.ExecutionException: 
java.util.concurrent.ExecutionException: 
org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem 
with entry at -19097155469835866: lastRootBlock=rootBlock{ rootBlock=0, 
challisField=514, version=3, nextOffset=35009034351079734, 
localTime=1600178968059 [Tuesday, September 15, 2020 2:09:28 PM UTC], 
firstCommitTime=1599735642110 [Thursday, September 10, 2020 11:00:42 AM UTC], 
lastCommitTime=1600178958117 [Tuesday, September 15, 2020 2:09:18 PM UTC], 
commitCounter=514, commitRecordAddr={off=NATIVE:-104241626,len=422}, 
commitRecordIndexAddr={off=NATIVE:-84930192,len=220}, blockSequence=32987, 
quorumToken=-1, metaBitsAddr=2078422184494075, metaStartAddr=8184932, 
storeType=RW, uuid=d0e3d4a7-6bd8-40d4-8be1-88f0c4a66385, offsetBits=42, 
checksum=-593744401, createTime=1599735640542 [Thursday, September 10, 2020 
11:00:40 AM UTC], closeTime=0}, query=SPARQL-UPDATE: updateStr=LOAD 
 
req.requestURI=/bigdata/namespace/wdq/sparql, req.xForwardedFor=null, 
req.queryString=null, req.method=POST, req.remoteHost=localhost, 
req.requestURL=http://localhost:/bigdata/namespace/wdq/sparql, 
req.userAgent=curl/7.52.1
com.bigdata.util.ChecksumError: 
offset=517789868032,nbytes=16,expected=-58390144,actual=535102966
at 
com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783)
Wrapped by: java.lang.IllegalStateException: Error reading from WriteCache 
addr: 517789868032 length: 12, writeCacheDebug: No WriteCache debug info
at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321)
Wrapped by: java.lang.RuntimeException: addr=-112096902 : 
cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 
517789868032 length: 12, writeCacheDebug: No WriteCache debug info
at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399)
Wrapped by: java.lang.RuntimeException: Problem with entry at 
-19097155469835866
at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425)
Wrapped by: java.lang.RuntimeException: Problem with entry at 
-19097155469835866: lastRootBlock=rootBlock{ rootBlock=0, challisField=514, 
version=3, nextOffset=35009034351079734, localTime=1600178968059 [Tuesday, 
September 15, 2020 2:09:28 PM UTC], firstCommitTime=1599735642110 [Thursday, 
September 10, 2020 11:00:42 AM UTC], lastCommitTime=1600178958117 [Tuesday, 
September 15, 2020 2:09:18 PM UTC], commitCounter=514, 
commitRecordAddr={off=NATIVE:-104241626,len=422}, 
commitRecordIndexAddr={off=NATIVE:-84930192,len=220}, blockSequence=32987, 
quorumToken=-1, metaBitsAddr=2078422184494075, metaStartAddr=8184932, 
storeType=RW, uuid=d0e3d4a7-6bd8-40d4-8be1-88f0c4a66385, offsetBits=42, 
checksum=-593744401, createTime=1599735640542 [Thursday, September 10, 2020 
11:00:40 AM UTC], closeTime=0}
at 
com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3134)
Wrapped by: org.openrdf.query.UpdateExecutionException: 
java.lang.RuntimeException: Problem with entry at -19097155469835866: 
lastRootBlock=rootBlock{ rootBlock=0, challisField=514, version=3, 
nextOffset=35009034351079734, localTime=1600178968059 [Tuesday, September 15, 
2020 2:09:28 PM UTC], firstCommitTime=1599735642110 [Thursday, September 10, 
2020 11:00:42 AM UTC], lastCommitTime=1600178958117 [Tuesday, September 15, 
2020 2:09:18 PM UTC], commitCounter=514, 
commitRecordAddr={off=NATIVE:-104241626,len=422}, 
commitRecordIndexAddr={off=NATIVE:-84930192,len=220}, blockSequence=32987, 
quorumToken=-1, metaBitsAddr=2078422184494075, metaStartAddr=8184932, 
storeType=RW, uuid=d0e3d4a7-6bd8-40d4-8be1-88f0c4a66385, offsetBits=42, 
checksum=-593744401, createTime=1599735640542 [Thursday, September 10, 2020 
11:00:40 AM UTC], closeTime=0}
at 
com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1080)
Wrapped by: java.util.concurrent.ExecutionException: 
org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem 
with entry at -19097155469835866: lastRootBlock=rootBlock{ rootBlock=0, 
challisField=514, version=3, nextOffset=35009034351079734, 
localTime=1600178968059 [Tuesday, September 15, 2020 2:09:28 PM UTC], 
firstCommitTime=1599735642110 [Thursday, September 10, 2020 11:00:42 AM UTC], 
lastCommitTime=160017895