Hello Andy,
sorry for the long silence, I was distracted by a few weeks full of
production updates and I didn't had any time to spent with Jena.
Below you'll find the test we used to point out the problem. I solved it
by changing AbstractDateTime.equals and AbstractDateTime.hashCode (A
diff is also attached).
It's up to you, if you like the fix and want to integrate it :-)
Perhaps caching the hashCode could also improve performance.
Greetings
André
Index: AbstractDateTime.java
===================================================================
--- AbstractDateTime.java (revision 2)
+++ AbstractDateTime.java (revision 4)
@@ -118,9 +118,13 @@
if (obj instanceof AbstractDateTime) {
AbstractDateTime adt = (AbstractDateTime) obj;
for (int i = 0; i < data.length; i++) {
- if (data[i] != adt.data[i]) return false;
+ if(i==msscale || i==ms)
+ continue;
+ else if (data[i] != adt.data[i])
+ return false;
}
- return true;
+
+ return fractionalSeconds==adt.fractionalSeconds;
}
return false;
}
@@ -131,7 +135,18 @@
@Override
public int hashCode() {
int hash = 0;
+ int scale=data[msscale];
+ int scaledMs=data[ms];
+ while(scale<3) {
+ scale++;
+ scaledMs*=10;
+ }
for (int i = 0; i < data.length; i++) {
+ if(i==msscale)
+ hash=(hash<<1)^scale;
+ else if(i==ms)
+ hash=(hash<<1)^scaledMs;
+ else
hash = (hash << 1) ^ data[i];
}
return hash;
package com.hojoki.tdb;
import java.util.Calendar;
import java.util.GregorianCalendar;
import junit.framework.TestCase;
import org.junit.Test;
import com.hp.hpl.jena.query.Dataset;
import com.hp.hpl.jena.query.ReadWrite;
import com.hp.hpl.jena.rdf.listeners.StatementListener;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.tdb.TDBFactory;
public class Iteratortest extends TestCase {
public Iteratortest(String testName) {
super(testName);
}
Resource s0 = ResourceFactory.createResource("s://r0");
Property p0 = ResourceFactory.createProperty("p://r0");
Resource s1 = ResourceFactory.createResource("s://r1");
Property p1 = ResourceFactory.createProperty("p://r1");
Resource o1 = ResourceFactory.createResource("o://r1");
Resource s2 = ResourceFactory.createResource("s://r2");
Property p2 = ResourceFactory.createProperty("p://r2");
Resource o2 = ResourceFactory.createResource("o://r2");
@Test
public void testGraphMemIterator() {
ModelChangedListenerImpl listener = new ModelChangedListenerImpl();
Dataset set = TDBFactory.createDataset("/tmp/test");
set.begin(ReadWrite.WRITE);
Model model = set.getDefaultModel();
model.register(listener);
Model model2 = ModelFactory.createDefaultModel();
Calendar cal=GregorianCalendar.getInstance();
cal.setTimeInMillis(System.currentTimeMillis()/100*100);
Literal literal = model.createTypedLiteral(cal);
model.add(s1, p1, model.createTypedLiteral(cal));
Statement statement = model.listStatements(s1, p1, (RDFNode)null
).next();
Literal value = statement.getLiteral();
assertTrue(literal.equals(value));
assertTrue(literal.hashCode()==value.hashCode());
model.add(s1, p1, o1);
model2.add(s1,p1,literal);
model2.add(s2, p2, o2);
model2.add(s1, p1, model.createTypedLiteral(cal));
model.add(model2);
model.unregister(listener);
Model added=listener.getInsertModel();
final StmtIterator objectStmtIter = added.listStatements(s0, p0,
(RDFNode) null);
if (objectStmtIter != null) {
while (objectStmtIter.hasNext()) {
final Resource objectResource =
objectStmtIter.next().getObject().asResource();
final StmtIterator updatedStmtIter =
objectResource.listProperties(p1);
if (updatedStmtIter != null && updatedStmtIter.hasNext()) {
Statement next = updatedStmtIter.next();
if (updatedStmtIter.hasNext()) {
Statement next2 = updatedStmtIter.next();
if (next.toString().equals(next2.toString())) {
// JenaUtil.printModel(added);
throw new RuntimeException("object has more than one
IDENTICAL atom:updated, uri: '"
+ objectResource.getURI() + "' statement " + next);
}
}
}
}
}
set.abort();
}
}
class ModelChangedListenerImpl extends StatementListener {
private Model insertModel = ModelFactory.createDefaultModel();
private Model deleteModel = ModelFactory.createDefaultModel();
public void addedStatement(final Statement statement) {
insertModel.add(statement);
deleteModel.remove(statement);
}
public void removedStatement(final Statement statement) {
deleteModel.add(statement);
insertModel.remove(statement);
}
public Model getInsertModel() {
return this.insertModel;
}
public Model getDeleteModel() {
return this.deleteModel;
}
}
On 16.04.2013 19:31, Andy Seaborne wrote:
> On 12/04/13 17:15, Andy Seaborne wrote:
>> On 12/04/13 15:06, "Dr. André Lanka" wrote:
>>> Hello to all,
>>
>> Hi there,
>>
>> Could you put this on JIRA please? ideally with a complete test case to
>> make sure we're agree on the details.
>
> https://issues.apache.org/jira/browse/JENA-437
>
>>
>> Is it TDB specific only?
>
> No, although TDB is more likely to bump into it.
>
>>
>> Thanks,
>> Andy
>>
>>>
>>> we've got duplicated statements within the same model (stored in a
>>> GraphTripleStoreMem). Duplicated means that each of the three components
>>> s,p and o are pairwise equal between the statements.
>>>
>>> The reason is that the literals have differing hashCodes so that they
>>> are added twice to the model. This is because the hashCode method for
>>> XSDDateTime doesn't respect the scale of the milliseconds (field 8 in
>>> the data array). When you call Model.createTypedLiteral(Calendar) the
>>> scale is either zero or three. Whereas TDB formats it (while reading
>>> from the triple store) to 0,1,2 or 3 digits depending on the number of
>>> zeros at the end (DateTimeNode.unpack). So you can put a xsd:dateTime
>>> into TDB and get back a literal that equals the given one but has
>>> another hashCode.
>>>
>>> You can reproduce it by using a TDB backed model and do:
>>>
>>> Calendar cal=GregorianCalendar.getInstance();
>>> cal.setTimeInMillis(System.currentTimeMillis()/100*100);
>>>
>>> Literal literal = model.createTypedLiteral(cal);
>>> model.add(s1, p1, model.createTypedLiteral(cal));
>>>
>>> Statement statement = model.listStatements(s1, p1, (RDFNode)null
>>> ).next();
>>> Literal value = statement.getLiteral();
>>>
>>> assertTrue(literal.equals(value));
>>> assertTrue(literal.hashCode()==value.hashCode());
>>>
>>>
>>> The last line fails.
>>>
>>> In order to respect the general contract of equals, XSDDateTime should
>>> get a special getHashCode(LiteralLabel) method instead of using the one
>>> from BaseDatatype. For instance this method could leave out array index
>>> 7 and 8 and could use the fractional seconds (xor with the double value)
>>> instead.
>>>
>>> Cheers
>>> André
>>>
>>
>
--
Dr. André Lanka * 0178 / 134 44 47 * http://dr-lanka.de