Chris Cannam wrote: > Attached is a patch for raptor svn r15635 (also works with the 1.4.19 > release) which addresses a scalability problem in Turtle serialization > by replacing raptor_sequence with raptor_avltree for the subject and > blanks containers passed to raptor_abbrev_node_lookup. This attempts > to fix the "this should really be a hash, not a list" FIXME previously > noted in that function. > > The tricky bit is handling the way blank nodes were removed from the > sequence when written, if they were not going to be needed again. > Because you can't just replace an item in the tree with NULL (as was > done in the sequence) without breaking tree ordering, and you can't > remove an item from the tree while iterating over it, I've instead > added a "valid" flag to the subject struct itself which is reset on > writing and subsequently tested to prevent duplicate writes. I'm not > hugely keen on this, should anyone have any better ideas. > > This patch reduces the runtime of my own test case (c. 400K triples > constructed and serialized) from about 25 minutes to about 14 seconds. > > It also reduces the runtime for the Turtle test suite from about 12.1 > to 10.4 seconds on this machine in release trim, and it passes > valgrind --leak-check=full with no errors or leaks. > > The bad news is that it causes a number of unit tests to fail because > of changes to the ordering in output. One test (ex-38) in the rdfxml > and turtle test suites fails (I think because rdfdiff is wrongly > seeing a difference that isn't there), and the entire feeds test suite > fails (I think because it doesn't use rdfdiff at all). I haven't > spotted anything that looks like a "real" failure, but I might be > missing something. Thoughts welcome.
Thanks for the patch Chris. It certainly was something that we knew was a performance problem for serializing large turtle and rdf/xml-abbrev graphs. I had a quick look at the rdfdiff logic and it does look dubious. I may see if I can fix it so the tests still run. If I get it working, I'll roll a new raptor1 release with this (and another crash fix patch already in SVN) and commit it to the raptor2 trunk also. Thanks again. Dave _______________________________________________ redland-dev mailing list [email protected] http://lists.librdf.org/mailman/listinfo/redland-dev
