Hello Andy,

sorry for the long silence, I was distracted by a few weeks full of
production updates and I didn't had any time to spent with Jena.

Below you'll find the test we used to point out the problem. I solved it
by changing AbstractDateTime.equals and AbstractDateTime.hashCode (A
diff is also attached).

It's up to you, if you like the fix and want to integrate it :-)

Perhaps caching the hashCode could also improve performance.


Greetings
André



Index: AbstractDateTime.java
===================================================================
--- AbstractDateTime.java       (revision 2)
+++ AbstractDateTime.java       (revision 4)
@@ -118,9 +118,13 @@
         if (obj instanceof AbstractDateTime) {
             AbstractDateTime adt = (AbstractDateTime) obj;
             for (int i = 0; i < data.length; i++) {
-                if (data[i] != adt.data[i]) return false;
+              if(i==msscale || i==ms)
+                continue;
+              else if (data[i] != adt.data[i])
+                return false;
             }
-            return true;
+
+            return fractionalSeconds==adt.fractionalSeconds;
         }
         return false;
     }
@@ -131,7 +135,18 @@
     @Override
     public int hashCode() {
         int hash = 0;
+        int scale=data[msscale];
+        int scaledMs=data[ms];
+        while(scale<3) {
+          scale++;
+          scaledMs*=10;
+        }
         for (int i = 0; i < data.length; i++) {
+          if(i==msscale)
+            hash=(hash<<1)^scale;
+          else if(i==ms)
+            hash=(hash<<1)^scaledMs;
+          else
             hash = (hash << 1) ^ data[i];
         }
         return hash;








package com.hojoki.tdb;

import java.util.Calendar;
import java.util.GregorianCalendar;

import junit.framework.TestCase;

import org.junit.Test;

import com.hp.hpl.jena.query.Dataset;
import com.hp.hpl.jena.query.ReadWrite;
import com.hp.hpl.jena.rdf.listeners.StatementListener;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.tdb.TDBFactory;

public class Iteratortest extends TestCase {

  public Iteratortest(String testName) {
    super(testName);
  }

  Resource s0 = ResourceFactory.createResource("s://r0");
  Property p0 = ResourceFactory.createProperty("p://r0");

  Resource s1 = ResourceFactory.createResource("s://r1");
  Property p1 = ResourceFactory.createProperty("p://r1");
  Resource o1 = ResourceFactory.createResource("o://r1");

  Resource s2 = ResourceFactory.createResource("s://r2");
  Property p2 = ResourceFactory.createProperty("p://r2");
  Resource o2 = ResourceFactory.createResource("o://r2");

  @Test
  public void testGraphMemIterator() {

    ModelChangedListenerImpl listener = new ModelChangedListenerImpl();

    Dataset set = TDBFactory.createDataset("/tmp/test");
    set.begin(ReadWrite.WRITE);

    Model model = set.getDefaultModel();

    model.register(listener);

    Model model2 = ModelFactory.createDefaultModel();

    Calendar cal=GregorianCalendar.getInstance();
    cal.setTimeInMillis(System.currentTimeMillis()/100*100);
    Literal literal = model.createTypedLiteral(cal);
    model.add(s1, p1, model.createTypedLiteral(cal));
    Statement statement = model.listStatements(s1, p1, (RDFNode)null
).next();
    Literal value = statement.getLiteral();

    assertTrue(literal.equals(value));
    assertTrue(literal.hashCode()==value.hashCode());

    model.add(s1, p1, o1);

    model2.add(s1,p1,literal);

    model2.add(s2, p2, o2);
    model2.add(s1, p1, model.createTypedLiteral(cal));

    model.add(model2);

    model.unregister(listener);

    Model added=listener.getInsertModel();

    final StmtIterator objectStmtIter = added.listStatements(s0, p0,
(RDFNode) null);
    if (objectStmtIter != null) {
      while (objectStmtIter.hasNext()) {

        final Resource objectResource =
objectStmtIter.next().getObject().asResource();
        final StmtIterator updatedStmtIter =
objectResource.listProperties(p1);
        if (updatedStmtIter != null && updatedStmtIter.hasNext()) {
          Statement next = updatedStmtIter.next();
          if (updatedStmtIter.hasNext()) {
            Statement next2 = updatedStmtIter.next();
            if (next.toString().equals(next2.toString())) {
              // JenaUtil.printModel(added);
              throw new RuntimeException("object has more than one
IDENTICAL atom:updated, uri: '"
                  + objectResource.getURI() + "' statement " + next);
            }
          }
        }
      }
    }

    set.abort();

  }

}



class ModelChangedListenerImpl extends StatementListener {

  private Model insertModel = ModelFactory.createDefaultModel();
  private Model deleteModel = ModelFactory.createDefaultModel();

  public void addedStatement(final Statement statement) {

    insertModel.add(statement);
    deleteModel.remove(statement);
  }

  public void removedStatement(final Statement statement) {

    deleteModel.add(statement);
    insertModel.remove(statement);
  }

  public Model getInsertModel() {
    return this.insertModel;
  }

  public Model getDeleteModel() {
    return this.deleteModel;
  }
}




On 16.04.2013 19:31, Andy Seaborne wrote:
> On 12/04/13 17:15, Andy Seaborne wrote:
>> On 12/04/13 15:06, "Dr. André Lanka" wrote:
>>> Hello to all,
>>
>> Hi there,
>>
>> Could you put this on JIRA please? ideally with a complete test case to
>> make sure we're agree on the details.
> 
> https://issues.apache.org/jira/browse/JENA-437
> 
>>
>> Is it TDB specific only?
> 
> No, although TDB is more likely to bump into it.
> 
>>
>>      Thanks,
>>      Andy
>>
>>>
>>> we've got duplicated statements within the same model (stored in a
>>> GraphTripleStoreMem). Duplicated means that each of the three components
>>> s,p and o are pairwise equal between the statements.
>>>
>>> The reason is that the literals have differing hashCodes so that they
>>> are added twice to the model. This is because the hashCode method for
>>> XSDDateTime doesn't respect the scale of the milliseconds (field 8 in
>>> the data array). When you call Model.createTypedLiteral(Calendar) the
>>> scale is either zero or three. Whereas TDB formats it (while reading
>>> from the triple store) to 0,1,2 or 3 digits depending on the number of
>>> zeros at the end (DateTimeNode.unpack). So you can put a xsd:dateTime
>>> into TDB and get back a literal that equals the given one but has
>>> another hashCode.
>>>
>>> You can reproduce it by using a TDB backed model and do:
>>>
>>>      Calendar cal=GregorianCalendar.getInstance();
>>>      cal.setTimeInMillis(System.currentTimeMillis()/100*100);
>>>
>>>      Literal literal = model.createTypedLiteral(cal);
>>>      model.add(s1, p1, model.createTypedLiteral(cal));
>>>
>>>      Statement statement = model.listStatements(s1, p1, (RDFNode)null
>>> ).next();
>>>      Literal value = statement.getLiteral();
>>>
>>>      assertTrue(literal.equals(value));
>>>      assertTrue(literal.hashCode()==value.hashCode());
>>>
>>>
>>> The last line fails.
>>>
>>> In order to respect the general contract of equals, XSDDateTime should
>>> get a special getHashCode(LiteralLabel) method instead of using the one
>>> from BaseDatatype. For instance this method could leave out array index
>>> 7 and 8 and could use the fractional seconds (xor with the double value)
>>> instead.
>>>
>>> Cheers
>>> André
>>>
>>
> 

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Reply via email to