Hi Ivan, Thanks for your email! Comments inline below:
> I'm currently working on my PhD project, where I'm building a distributed > archiving solution. Strangely familiar :) I was doing the same thing in the context of OODT from 2003-2007, see here for the culmination: http://sunset.usc.edu/~mattmann/Dissertation.pdf > > Basically the distributed archive will consist of a number of nodes (every > node belonging to another organization), where every node will be storing his > data on a local node and replicas on a number of selected remote nodes. Gotcha. > > There will be a number of predefined processes (eg., integrity checking, > creating additional replicas, etc.) that will run either periodically or when > some event occurs (node lost event, corrupted object event, etc.). The data > that the system will archive will consist of RDF/XML files (metadata) + > binary files (e.g., tiff images, jpeg images, etc.; referenced from the RDF). > The RDF/XML files together with the binary files will be the products (in > OODT language). Okey dokey. > > I'm looking into OODT to see if it can be used to create such a system and > what components I would be using. > > In the following is a list of components that I have identified that I could > use: > - CAS Workflow (to implement the processes) > - CAS Push/Pull Component (to send products to remote nodes, to get products > from remote nodes). With what is the push/pull component communication on the > other side? The Pull communication in PushPull is the set of protocols like FTP, SCP, HTTP, etc. The Push Part is its ability to accept emails over IMAPS "pushed" to a mailbox, and then to take the URLs from those emails and go resolve them using the pull protocols. So, it's really simulated Push at this point, but it works well with systems that deliver emails (like NOAA, NASA, etc.) to indicate a file is ready to be pushed. > The push/pull component? From where is the push/pull component getting the > data that it will send? From the file manager? Push Pull acquires remote content and then hands off to a staging area that the crawler component picks up and reads from. crawler only handles local data (intentionally -- the complexity of acquiring remote content was large enough to warrant the creation of its own component). crawler takes the now local content (and any other content dropped in the shared staging area) and then ingests it into the file manager, sending metadata + references to it. > > What I'm missing, but should be there somewhere: > - Security Component. How do I create Virtual Organizations and manage user > and groups, so that I can restrict access? There is an sso component that is pretty light-weight at this point it implements connections to LDAP to do single sign on. At one point I did a restful-implementation of the SSO interface that connected to Java's Open SSO. Totally cleanroom using web services and protocols to connect to an OpenSSO service. I'll create a JIRA for this and attach in the next few days. > > Probably also needed: > - File Manager. In my case I would have the products (rdf + binary files) and > would need to create the profiles on the fly with some basic information. Do > I need the file manager for something other than for the end user to access > products and profiles? Yep you sure do. You'll need file manager, along with the cas-product webapp that lives in webapp/fmprod. > Since I'm going to load up the RDF files in a triple store for further use, > is it possible to extend the file manager so that the profile catalog is > stored in a triple store? Sure you could do a catalog implementation that stores the metadata to a triple store. Alternatively you could use the fmprod webapp to deliver RDF views of the metadata that's stored per product, and configure it using the rdfconf.xml file that's part of fmprod. Thanks! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
