Thanks Frank, Nick. Some comments on your document. I think it would be good to have some material that puts the change log resource model in the context of the integrations that would benefit from it, in particular, the resource model that the central index would offer to consumers (cross-provider query & reporting over OSLC Defined Resources). I would suggest that the implementation and specification issues be separated. Whilst the implementation insights are valuable, I think we should focus initially on the problem and the proposed specification.
What guidelines should a consumer follow when deciding whether to run queries against the central index or against the provider-specific query service? Would we not want to expose OSLC Query over the central index (as well as/instead of SPARQL)? There is no discussion of security; in the current distributed model, each client can make queries across OSLC providers, and each provider is responsible for securing resources within its authority. Coping data into a central index means that this simplicity is lost - how is the central index secured? There is a Talis RDF vocabulary for change set [1] which includes some parts of your proposal . It includes the idea of a change to a resource, and additionally (and relevant to Olivier's follow-up) information on the nature of the change in terms of triples added/removed. It does not expressly deal with deletion or creation. (The Talis resource model doesn't have the idea of a log.) Some more detailed comments below. Timestamps On reading the draft spec I was wondering about the role that timestamps have. Am i right in thinking that oslc:at is being used as a sequence number and need not be a timestamp? Or, is oslc:at required to be the time at which the change occurred? If so, MUST this timestamp be identical to the dc:modified of the corresponding OSLC Defined Resource (if it has such a property)? The difference between sequence and timestamp could be valuable for some change log providers. Knowing that something has changed is less information than additionally knowing when that change happened. In an implementation that I'm experimenting with, the component providing the change log doesn't know the time of the change, but it does know the sequence in which all changes occurred. It does know the time at which a change notification was received, but this is not the same as the recorded dc:modifed of the requirement that underwent change. Perhaps this is an idiosyncrasy of this particular system that we ought not to let influence the spec? The alternative is that oslc:at MAY be a timestamp; what is REQUIRED is oslc:at be an xsd:integer which orders the changes as they were made. Sensitivity of recursive crawling to the OSLC Shape of provided resources The crawl configuration centralized in the linked data service reflects aspects of what is really a distributed type system. Eg How would the administrator of the crawl configuration react to a change in the OSLC Resource Shape from some change log provider. The addition of new properties is dynamic in most systems so unsuited to a centralized configuration. Instead, providers could contribute declarative configurations into each of the their indexers, which would GET those configuration resources before starting a crawl. OSLC Core would need to specify these configuration resources. The recursive crawler needs to know how to deal with cyclic graphs but this isn't mentioned in the design. A simple policy would be to stop when that graph has already been indexed, irrespective of when it was indexed. Any discrepancies will be picked up by the incremental indexing to achieve eventual consistency. I think this initial seeding of the central index is a key problem and I'm pretty sure that crawling over the resources as described in this draft proposal will not suffice because it will take too long and as a result have large inconsistencies and "gaps" in query results. Knowing more of the requirements of the central index would allow us to assess performance characteristics. Policies on "what to crawl first" might help in this regard. One policy that would be attractive is to give priority to "recently accessed" resources. [1] http://n2.talis.com/wiki/Changesets best wishes, -ian [email protected] (Ian Green1/UK/IBM@IBMGB) Chief Software Architect, Requirements Definition and Management IBM Rational [email protected] wrote on 20/12/2010 15:45:56: > From: Frank Budinsky <[email protected]> > To: [email protected] > Cc: Martin Nally <[email protected]> > Date: 20/12/2010 15:47 > Subject: [oslc-core] New OSLC ChangeLog Proposal > Sent by: [email protected] > > Hello, > > Nick Crossley and I would like to submit a proposal for adding a > ChangeLog service to the OSLC core specification. This new service > will be key to the success of an indexer, and therefore we would > like to queue it up for discussion as soon as possible in January. > > The OSLC proposal, itself, is described in section 1.5 of the > attached document, while the rest of the document describes an > indexer prototype, including how it intends to use the change log. > > (See attached file: RDF_indexer_overview_1220.doc) > > Any comments/feedback on the new proposal or the indexer prototype > itself would be very welcome. > > Thanks, > Frank.[attachment "RDF_indexer_overview_1220.doc" deleted by Ian > Green1/UK/IBM] _______________________________________________ > Oslc-Core mailing list > [email protected] > http://open-services.net/mailman/listinfo/oslc-core_open-services.net
