Re: repository@ awareness?

2003-11-12 Thread Stephen McConnell
Noel:
Thanks for the W3C style reference.  One of the subjects it deals with 
is content negotiation http://www.w3.org/Provider/Style/URI.html#remove 
- and this got me thinking about how metadata as opposed to a resoruce 
that metadata is describing can be resolved.  I'm going to try to dig up 
some more on on content negotiation subject as this may be a factor in 
resolving some of the requirements I have.

Stephen.
Noel J. Bergman wrote:
Stephen McConnell asked:
 

File system - a convenient and simple solution - but should a file
system driven approach be the basis for the next generation?
   

The basis is a URI space.  Whether a URI is efficiently served by a static
file, or by some servlet, CGI or Grandma Moses typing very VERY fast really
should not be visible to the user-agent.
 

A solution must be implementation independent
   

See: http://www.w3.org/Provider/Style/URI.html
The URI is a request for content.  It should not change, regardless of the
means by which the content is generated.
 

So why a preoccupation with meta-less file system structures as opposed
to a preoccupation with an extensible repository protocol?
   

The extensible repository protocol is HTTP.  Nothing else needs to be
visible.  The only thing that the infrastructure team needs to deal with is
the implementation of the URI space (allowing that the content addressed by
a URI can vary based upon the user-agent).
--- Noel
 

--
Stephen J. McConnell
mailto:[EMAIL PROTECTED]



RE: repository@ awareness?

2003-11-12 Thread Noel J. Bergman
 You're saying that those interested in enabling a repo with
 metadata and searches based on this metadata could wrap the
 repository with a servlet.

Could?  Yes.  But that is just one way of many.  I maintain that httpd could
serve the content of most repositories, meta-data and all, without dynamic
content generation.

 The URI could be used by the servlet to give a different view
 of the repository based on [criteria embedded in the request]

IMO, the request should to encode the complete request.  There should not be
any other implied context.

 the servlet manages the interaction behind the scenes with
 some sort of metadata database to conduct the query and
 return the results as if they were regular files on the
 server's repo file system.

It depends upon the repository implementation.  It could work as you
describe, or there could just be pre-built metadata stored in files.

Consider that eventually web sites will likely use Subversion with WebDAV as
their authoring mechanism.  Authorized people will post directly to a
Subversion repository.  Although httpd can load directly from Subversion,
that will not be as efficient as serving directly from the file system.  The
reason for that is that sendfile() does not work directly out of a BDB
database (as far as I know).  Therefore, when a file is posted to
Subversion, it could be mirrored by a hook to a directory representing the
current content, which is what would then be served by httpd.  We used a
similar technique at GEIS years ago with SourceSafe, so that when a checking
occurred, a copy went into a shadow directory, and a build test was
initiated.  Likewise, a tool could be invoke to build meta-data, and store
it in the file system.

So there are ways and ways and more ways.  The goal is the same, as should
be the externally viewable behavior.

--- Noel



Re: repository@ awareness?

2003-11-12 Thread Joerg Pietschmann
Stephen McConnell wrote:
Noel:
Thanks for the W3C style reference.  One of the subjects it deals with 
is content negotiation http://www.w3.org/Provider/Style/URI.html#remove 
- and this got me thinking about how metadata as opposed to a resoruce 
that metadata is describing can be resolved.  I'm going to try to dig up 
some more on on content negotiation subject as this may be a factor in 
resolving some of the requirements I have.
The XML-DEV and various RDF related mailing lists hold
discussions about this topic regularly.
BTW this raises the question whether a RDF derivative or
a completely self-designed XML vocabulary will be used
for the repository metadata.
J.Pietschmann


Re: repository@ awareness?

2003-11-11 Thread Stephen McConnell

Leo Simons wrote:
Justin Erenkrantz wrote:
Do any 'core' infrastructure people need to get involved to help 
guide with what's practical or not?

yep. But I doubt you really need to get 'deeply' involved. A half-page
explanation of what resources are and are not available should be 
enough, don't
you think? 

I'm probably in a minority - so don't count anything I say as an 
indicator of public opinion.

First off - the board wants human readable safe downloading. Personally 
I think this objective is of minor relevance/impact to ASF in the medium 
term. Since early 1998, the notion of repository-aware applications has 
been growing. Here in Apache its in its infancy - but clearly prevalent 
in the Java community. Maven is an early example (hit a repository for 
jar downloading to resolve n build dependencies) - Avalon is another 
example - (hit the repository and get back a class loader hierarchy).

File system - a convenient and simple solution - but should a file 
system driven approach be the basis for the next generation? My 
conclusion - no. A solution must be implementation independent - I 
should be able to map a protocol to a RDMS, LDAP, simplistic HTTP over 
file layout, even an XMI repo over IIOP if deemed appropriate.

So why a preoccupation with meta-less file system structures as opposed 
to a preoccupation with an extensible repository protocol?

Here is an example of a modern repository aware application.
$ merlin http://dpml.net/avalon-http/block.xml
The above command has executed the following:
(a) bootstrapping of a repository client
(b) resolution of repository adapter implementation
(c) downloading and installation of repository adapter and dependencies 
(meta data)
(d) bootstrapping of the repository adapter into action (meta data)
(e) downloading of block.xml using the repository adapter (i.e. protocol 
independent)
(f) validation of the downloaded artifact (meta data)
(g) construction of information about block dependencies by the local 
app (meta data)
(h) recursively downloaded artefact dependencies (meta data)
(i) local creation of a class loader hierarchy based on class loader 
assignments (meta data)
(j) created a container holding a set of composite components
(k) executed the orderly deployment of supporting components
(l) started a web server, and a set of business components, and a servlet

First time user will trigger something in the order of about 30-40 
downloads. Local system will cache information and monitor the 
repository for changes.

Step 2 - user launches a command to manage the running servlet
(a) jmx management libraries are auto-downloaded (meta data)
(b) along with a dozen commons jar file (meta data)
(c) management app invokes request on management agent download
(d) agent is deployed in a target JVM (local deployment)
(e) jnlp client completes downloading of three jar files signed using 
X509 certificates into a third JVM
(f) applet appears in users browser
(g) user updates parameters
(h) updated deployment profile is sent to remote repository (meta data)
(i) local client synchronizes local cache relative to remote repo (meta 
data)

All of the above from one command and a few clicks of a mouse. Ok, I 
confess - we don't have of the above in place today - but do have the 
majority. This benefits significantly from a rigerouse protocol 
supporting artefact location, feature assessment (meta data), 
authentication, replication and validation. An argument that appears 
popular on repository@ is that the basic files system does not need to 
be meta-aware - i.e. no distinction between artefact and 
info-about-an-artificat. IMO it is basically a misadventure to focus so 
closely on subjects such as file system structure (the lowest common 
denominator solution). Instead  should we not be defining a protocol 
that is a transport and implementation independent? A protocol that will 
enable the functional requirements of artefact authentication, artefact 
navigation, artefact retrieval and artefact registration.

Popular arguments are that agreement on meta information associated with 
artefacts is not achivable - and yet the simple notion of named value 
pairs is a widespread abstraction. This simple notion of the artefact 
+ information about an artefact is IMO a fundamental requirement. 
After all - isn't thjis 2003 - we have the technology! Surely our 
repository spec should enable an implementation based on a files 
systems, but equally, should not restrict the potential for transparent 
replacement with alternative more advanced and efficient solutions.

Also of relavance are the economic and social impacts. A repository not 
capable of supporting or evolving towards forward looking 
repository-enabled requirements as outlined in the above scenario is 
destined to be redundant within a matter of a few years. Redundant 
because it will not be relevant to a predominant programmatic scenarios 
and redundant because it will not meet basic functional 

Re: repository@ awareness?

2003-11-11 Thread Michal Maczka
Stephen McConnell wrote:
[..]
All of the above from one command and a few clicks of a mouse. Ok, I 
confess - we don't have of the above in place today - but do have the 
majority. This benefits significantly from a rigerouse protocol 
supporting artefact location, feature assessment (meta data), 
authentication, replication and validation. An argument that appears 
popular on repository@ is that the basic files system does not need to 
be meta-aware - i.e. no distinction between artefact and 
info-about-an-artificat. 
Stephen:
Please understand that artifact's meta data is simply  just another 
artifact.  Every file which lives in repository is an artifact
And  we rather don't need any extra level of abstraction.
Notion of the artifact + information about an artifact is already 
exhausted when we will clarify the notion of artifact and define 
repository layout for artifacts.
You can have as many levels of metadata as you would like (meta data , 
metametametameta data and whatever else anybody will need).

In maven world we have
foo/jars/foo-1.0.jar
/poms/foo-1.0.pom
Jar is an artifact
Pom is also an artifact which provides some meta information (of course 
not all) about Jar. 
You can add as many other files to  to repository as you wish.

There is clear distinction between artifact and info-about-an-artificat 
as they both will be different files in the repository (artifacts).
Possibly info-about-an-artificat could be located in few files and 
accessed selectivly by different tools.
Metadata about repository itself can be also kept in repository.
Even directory listings in few different flavours for different tools 
can be in repository.
Can you provide an explanation what exactly is not covered by such approach?

[..]
Also of relavance are the economic and social impacts. A repository 
not capable of supporting or evolving towards forward looking 
repository-enabled requirements as outlined in the above scenario is 
destined to be redundant within a matter of a few years. Redundant 
because it will not be relevant to a predominant programmatic 
scenarios and redundant because it will not meet basic functional 
requirements.

So you want us to predict what will happen in few years :)?
Again I don't understand you:
You can build any abstractions you like on the top of the repository 
with features that were dissussed.
Aren't you doing it even now when you use  maven repository for storing 
information about your avalon services?

Michal



RE: repository@ awareness?

2003-11-11 Thread Noel J. Bergman
Justin,

 Is anyone on infrastructure@ aware of what's going on in [EMAIL PROTECTED]

Not from what I see on the subscriber list, which is why I have suggested on
more than one occassion that such participation is important.

 Apparently, AFAICT, that list is supposed to allow for Java-based
 distribution of software.  Other than that, I'm completely lost as
 to what that list is for.

Eventually, it would be desirable to have a user-friendly tool that is
capable of picking up, for example, httpd source, tomcat, and other parts,
and doing a platform-specific install.  But the tool is someone else's
problem.  The only thing that the repository needs to do is provide a
non-fragile URI space for artifacts, of which files and, eventually,
metadata are both examples.

 Do any 'core' infrastructure people need to get involved to help guide
with
 what's practical or not?

Yes.

 With a quick perusal of [EMAIL PROTECTED], I got the sense that they
 might be out in la-la land

Agreed.  The discussion on [EMAIL PROTECTED] was getting into tool areas
that should be relatively orthogonal to the repository.  There are three
areas:

  - URI space
  - metadata
  - tools

The first is the main issue that the repository needs to address.  The
second is an area where after we have decided upon the URI space, the tool
groups could use the repository list as a gathering place to seek common
ground.  And then there are tools, which belong elsewhere, but use the
repository.  Some people are jumping ahead to tools before the URI space is
resolved.

 The people advocating a file layout *only* get my uninformed +1.)

I think that most people recognize that the file layout only approach to
the URI space is necessary.  meta-data is present in the URI space, and can
be implemented with a static file.  Even if we want to key off the
user-agent for meta-data, that can still be served with static content in
the file space.

--- Noel



RE: repository@ awareness?

2003-11-11 Thread Noel J. Bergman
Stephen McConnell asked:

 File system - a convenient and simple solution - but should a file
 system driven approach be the basis for the next generation?

The basis is a URI space.  Whether a URI is efficiently served by a static
file, or by some servlet, CGI or Grandma Moses typing very VERY fast really
should not be visible to the user-agent.

 A solution must be implementation independent

See: http://www.w3.org/Provider/Style/URI.html

The URI is a request for content.  It should not change, regardless of the
means by which the content is generated.

 So why a preoccupation with meta-less file system structures as opposed
 to a preoccupation with an extensible repository protocol?

The extensible repository protocol is HTTP.  Nothing else needs to be
visible.  The only thing that the infrastructure team needs to deal with is
the implementation of the URI space (allowing that the content addressed by
a URI can vary based upon the user-agent).

--- Noel