On 28/11/12 21:57, Zhe Wu wrote:
Hi Andy,

Please see my comments inline.

On 11/28/2012 11:07 AM, Andy Seaborne wrote:
We're looking at simplifying the core of Jena.  There are features in
the Graph layer that are not used but do potential add a complexity
cost to graph implementations (e.g. storage systems).

The RDF APIs will not change but there are some implications and we'd
like feedback.

1/ Graph level bulk update handler.

This is used by components that want to insert in bulk for some
reason.  We decided that is the wrong way to do it (the batching
should be done by any receiving storage layer that cares) and it's not
used.


Actually, Oracle is actively using BulkUpdateHandler and we have
extended it. Having a receiving storage layer
to decide on batching/incremental loading is possible. However, allowing
bulk loading
at a graph level adds flexibility to end users.

Thanks,

Zhe

Zhe - could you say some more about how Oracle are using it and how Oracle have extended it?

1/ Do you use events from bulk loading?

2/ Which Jena version are you currently using?

3/ How do users interact with it? Which model API calls?

Graph loading at the graph level isn't an end user feature - what's more, I don't see how the current design actually helps you very much as it is either memory-limited or you need to pass in a pull-iterator which is a difficult paradigm to work against.

The potential code changes are available in:

https://svn.apache.org/repos/asf/jena/branches/jena-core-simplified/

        Andy



If removed, ModelChangedListener would change, at least in effect,
even if not in signature.

At the moment there are difference event callbacks for different ways
to add a batch of statements, whether a list, an array, a model or a
iterator was used.  Instead, all these would be replaced by multiple
calls of addedStatment/removedStatement.

If you use StatementListener, that's what already happens.

Does any code depend and use knowing the unit of update?

----

The Model API operations would remain:

Model.add(Model)
Model.add(Statement[])
Model.add(StmtIterator iter)
Model.add(List<Statement> statements)
Model.remove(Statement[])
Model.remove(StmtIterator iter)
Model.remove(List<Statement> statements)

will remain.

2/ Reification

Currently there are three modes :  Standard, Convenient and Minimal.

The default is "Standard" and TDB and SDB only support standard.

Standard does not require any additional state management - it can be
provided with pure code that runs when reifications operations are
made.  Convenient and Minimal hide partial reifications from view and
require statement management.  Only the memory models support
Convenient and Minimal (RDB did - but it's gone.)

The idea is that the graph-level would not deal in reification at all,
and "Standard" reification semantics would be provided in the
model-layer implementation via code similar to that used by TDB and
SDB currently.

The code as used by SDB and TDB is in
com.hp.hpl.jena.sparql.graph.Reifiers2.

Does anyone use "convenient" or "minimal"?

3/ Graph level query.

This is not SPARQL and is unrelated to SPARQL execution.

There is a separate "query" system inside the graph layer which was
developed for RDQL.  SPARQL is a bit more complicated.  It's main use
now is in reification and if that is removed, and replaced then there
is no need for it in the graph layer.

Does any one use this code?  If you have never heard of it, then
you're not using it except via reification.

    Andy

PS If you want to know why things are the way they currently are, I
tried to write that down in:

http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%[email protected]%3E



Reply via email to