On 29/07/12 17:05, Andy Seaborne wrote:
I've put some debugging in so that the term being unpacked it printed out.

It looks like it is the timezone.

     Andy

(A) problem found : timezones with non-zero minutes.

Recorded as JENA-287

        Andy


On 24/07/12 12:13, Michael Brunnbauer wrote:

Hello Andy,

On Thu, Jun 14, 2012 at 01:12:25PM +0100, Andy Seaborne wrote:
I guess it would be a good idea to look at the end of the dump and
check
the
corresponding named graph for bad datetimes ?

Yes - my best guess at the moment is that a dateTime can get in (they
are encoded into 56 bits, not recorded using the lexical form) but there
was a problem on the recreation of the lexical form.  Whether the
encoding or decoding is wrong, I can't tell.

I was not able to find the named graph causing the problem so I
recreated the
TDB with tdbloader2 from apache-jena-2.7.2 and tried tdbdump from
apache-jena-2.7.2 immediately after that. The result is that I seem to
run
into the same problem:

Exception in thread "main" org.openjena.atlas.AtlasException:
formatInt: overflow
    at
org.openjena.atlas.lib.NumberUtils.formatUnsignedInt(NumberUtils.java:115)

    at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:87)
    at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:60)
    at
com.hp.hpl.jena.tdb.store.DateTimeNode.unpack(DateTimeNode.java:255)
    at
com.hp.hpl.jena.tdb.store.DateTimeNode.unpackDateTime(DateTimeNode.java:180)

    at com.hp.hpl.jena.tdb.store.NodeId.extract(NodeId.java:313)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:64)

    at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:163)
    at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:155)
    at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:45)
    at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:89)
    at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:85)
    at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
    at
org.openjena.atlas.iterator.IteratorCons.next(IteratorCons.java:94)
    at org.openjena.atlas.iterator.Iter.sendToSink(Iter.java:560)
    at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:45)
    at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:37)
    at org.openjena.riot.RiotWriter.writeNQuads(RiotWriter.java:41)
    at tdb.tdbdump.exec(tdbdump.java:49)
    at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
    at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
    at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
    at tdb.tdbdump.main(tdbdump.java:31)

This seems to be a serious issue.

BTW: Here is some output from tdbloader2 for this TDB which shows that
the tdbloader2 data phase runtime gets quite non-linear for very big
datasets.
I called tdbloader2 with JVM_ARGS="-Xmx32768M -server" and it did not
seem to
run into memory problems.

  12:39:17 -- TDB Bulk Loader Start
  12:39:17 Data phase
...
INFO  Add: 100,000,000 Data (Batch: 68,027 / Avg: 57,649)
...
INFO  Add: 500,000,000 Data (Batch: 55,309 / Avg: 41,446)
...
INFO  Add: 1,000,000,000 Data (Batch: 27,901 / Avg: 24,119)
...
INFO  Add: 1,100,000,000 Data (Batch: 335 / Avg: 6,308)
...
INFO  Add: 1,138,800,000 Data (Batch: 256 / Avg: 5,038)
...
INFO  Total: 1,138,845,529 tuples : 227,654.44 seconds : 5,002.52
tuples/sec [2012/07/22 03:53:36 CEST]
...
  20:24:24 -- TDB Bulk Loader Finish
  20:24:24 -- 373477 seconds

Regards,

Michael Brunnbauer



Reply via email to