Hi Frank, I've been giving the OSLC indexing spec proposal a lot of thought. I've been looking for a good way to layer the specification, so that it is sufficiently general to garner wide use, and not overly dependent on other OSLC mechanisms that would make it more awkward for applications that have nothing to do with any OSLC domains.
First, a couple of general comments on the 3 capabilities in the current spec: re: Resource Publishing Capability. The need to enumerate is clear. However, it is unclear how an index-building client would use this. Even if there was stable paging, it's unclear how the client would then catch up on all the changes that happened since the client started walking the enumeration. I believe the client needs to have a way to learn exactly which event it needs to carry on from using the change log. The only way to do this currently is for the client to make a separate request to retrieve the current event number from the change log before it requests the enumeration. Also, the resources enumeration is tied to the time of the request. For a server with a very large set of resources, this may be expensive. If it were possible for the server to answer an enumeration of its resource set at a point in the past of its choosing, the server would have more flexibility as to how and when to enumerate its resource set. re: Resource Changelog Capability. A stable, paged representation of change entries in reverse chronological order based on event sequence number - feels right, except for format of the representation. Using an RDF representation for this seems awkward and unwarranted (RDF does not do ordering easily). Using Atom and AtomPub would be more appropriate, and arguably closer to what people would expect of an internet change log protocol. Also, given that the server is allowed to truncate change entries from the change log, it might make sense to tell the client up front the number of the lowest numbered event in the entire log. That way the client can at least determine when they've "missed the boat" - missed out on some events that were crucial to incrementally updating their picture. re: Resource Security Capability. I didn't look at this - but agree we will need something that addresses security. One observation: the Resource Publishing Capability and Resource Changelog Capability are designed to be used by a different clientele from that of the other capabilities found in an OSLC service provider. The latter capabilities implicitly require an authenticated user, and constrain access based on the permission of that user. The former capabilities likely require an authenticated client application, will need to reify access constraints, and later apply those access constraints when running queries on behalf of an authenticated user. More generally, here's how I've come to think about this problem. A server maintains a particular set of resources, and wants to make that set of resources available to its clients. These clients, who have no a priori knowledge of which resources are or are not in the set, need a way to enumerate the URIs of the resources in the set. (Hence the Resource Publishing Capability.) The set of resources may be continually changing under foot, and clients need a way to track how those changes affect the set of resources. (Hence the Resource Changelog Capability.) Our primary envisioned clients do both. They start off enumerating, and afterwards switch to incremental updating. And the reason our clients are interested in certain sets of resources is that they are trying to retrieve them to get RDF triples to put into a RDF triple store. However, 99.99% of the protocol is about dealing with a large active set of resources, and only 0.01% about these resources being bearers of RDF triples. Rather than specify it as two separate capabilities, it would makes sense to specify them as a single capability, with a single endpoint. Concieved of this way, the capability at the heart of things is a protocol for dealing with big sets of resources. Call this the Big Resource Set protocol. A server would implement Big Resource Set protocol to expose its set of resources; a client would consume the Big Resource Set protocol to initially enumerate the resource set, and afterwards to continue monitoring for incremental changes affecting resources in the set. The Big Resource Set protocol would be neutral on how the resource set comes into being, what causes it to change, and which resources might end up as members of the set. The protocol would also be neutral on the representation of the resources; all the spec would need to promise is that all resources are identified by URIs, and that HTTP etags are used to identify distinct resource states. The Big Resource Set protocol would serve as the lowest level protocol spec. We would build a second separate layer atop it. For our problem at hard - defining general purpose sources of indexable RDF content - a server would provide an endpoint implementing the Big Resource Set protocol, with the added provisio that all resources in the resource set are dereferencable to RDF content, with the etag varying with significant changes to that RDF content. (We would also address the matter of security at this level, and spell out the expectations about how these endpoints are available only to trusted indexer clients that can pick up ACL information for the resources and correctly apply it when the index data is shown to regular users. We would also need to spell out expectations regarding the logical consistency of the RDF content across resources in the resource set, since it will likely be undesirable if the fact base contains contradictions. The matter of overlapping resource sets that you raise would also be addressed at this level.) We would add a thin layer on top the second to tie things in to an OSLC domain. In the context of an OSLC domain specification, we would further specify that an OSLC service provider should expose one or more RDF index source endpoints and make them discoverable via markup in the OSLC service provider. The resources in the resource sets would be the "published resources" belonging to that OSLC service provider. Regards, Jim From: Frank Budinsky/Toronto/IBM@IBMCA To: [email protected] Cc: RELM Development <relm_development%[email protected]> Date: 04/05/2011 04:23 PM Subject: [oslc-core] ChangeLog Proposal moving to Convergence Phase Sent by: [email protected] I've updated the Change Log proposal to include all the issues we've discussed, and to provide a little more elaboration for things that people didn't seem to easily pick up from earlier drafts. It is available here: http://open-services.net/pub/Main/IndexingProposals/OSLC_indexing_0404.doc I'll look at converting it to the proper OSLC TWiki format next. Thanks, Frank._______________________________________________ Oslc-Core mailing list [email protected] http://open-services.net/mailman/listinfo/oslc-core_open-services.net
