Thanks Gregg,
Your spot on.
The memory explosion caused by multiple class loaders, one for each
remote location, is significantly eliminated by your solution of
preferred local classes. This works since all marshalled instances are
unloaded in the one classloader and utilise the same bytecode also
improving compatibility, by preventing isolation of compatible objects
by classloader tree Type differences.
I think utilising the OSGi framework combined with Codebase services,
eliminating the coupling between codebase URL's and Classloaders, can
perform a similar reduction in downloaded code, although not quite as
small, all classes from a package can use the latest compatible bundle,
share the same classloader, bytecode and any other packages that are
depended upon, significantly reducing RMIClassloader explosion and
duplicate bytecodes, once a compatible bundle has been downloaded, it
can be utilised for all instances of that class, we should prefer latest
compatible bundles I believe.
Package version Metadata (specified in each bundle) can be stored in
MarshalledObject instance Metadata, the OSGi versioning scheme,
specifies compatibility across bundle upgrades.
In reality, MarshalledObjects are a compromise to reduce downloading
remote code, they duplicate the information held within the binary
marshalled form of an object (such as implemented interfaces.) Keeping
duplicated information to a minimum is important, whenever we duplicate
data, we risk duplication errors or increased size. At some point it
becomes more efficient to just unmarshall the object rather than
increase stored metadata, this is best done at the client, internet
servers just wont have the resources to perform queries while the
security implications of executing foreign code are significant.
I think the full load issues in Reggie can be fixed, so it doesn't fail
under load, levelling out instead. Although I think your right, Reggie
isn't suited for the internet. Perhaps some type of Global indexing
service that crawls the list of available services through DNS-SD and
stores their marshalled proxy instances to perform the functions that
Reggie currently performs. Perhaps an interface inserted into the
hierarchy implementing a subset of Reggie's current methods for some
compatibility with Reggie? Another interface might implement other
methods that assist in filtering the results, or returning a bytestream,
where proxy's are unmarshalled one at a time at the client and
inspected, then dropped until a suitable match is found, the remaining
bytestream can be discarded. Garbage collection would clean up unwanted
proxy's during bytestream inspection, keeping memory usage to a minimum.
Looking at this service type definition, Daniel is using DNS-SD to
locate a Jini service directly:
Mutiple identical matches are returned with an index integer appended.
DNS SRV (RFC 2782) Service Types at
http://www.dns-sd.org/ServiceTypes.html contains this service entry:
jini Jini Service Discovery
Daniel Steinberg <daniel at oreilly.com>
Protocol description: Convention giving a
deterministic programmatic mapping
between Jini service interface names and subtypes of
the DNS-SD service
meta-type "_jini._tcp". For example, a client wishing
to discover objects
that implement the "com.oreilly.ExampleService"
interface would broswse for
the DNS-SD service subtype
"ExampleService.oreilly.com._sub._jini._tcp".
(Note: Using Apple's Bonjour programming API, service
subtypes
like this are expressed as a comma-separated list
following
main type, e.g. "_jini._tcp,ExampleService.oreilly.com".
This allows an object that implements several
interfaces to specify
all of those interfaces in a list when it registers
its service.
When browsing for services, at most a single subtype
is allowed.)
Defined TXT keys: None
Some observations:
1. Re-discovering the correct identical service would be almost
impossible with DNS-SD
2. Reggie is not designed for a world wide network.
3. Interfaces utilised by services must not change over time, they
can be extended when change is required.
4. Filtering is limited to qualified names.
5. Additional Types can be registered for one service instance.
6. We can crawl the DNS-SD list downloading marshalled proxy's
forming the basis of a new lookup service implementation.
Some thoughts:
1. You would be aware of which services you would want to make
global, perhaps a configuration option?
2. We probably want to restrict Service proxy's to simple reflective
proxies with Secure Jeri for now, security is much simpler.
3. Smart proxies, hmm, proxy verification, hmm needs more thought.
4. Interfaces for services should be in separate bundles from
implementations, to prevent Interface duplication in local JVM
ClassLoaders when implementations change in an incompatible manner
( versions over time), allowing a service to remain as an
abstraction. The OSGi frame work could locally upgrade an
interface bundle when a new service utilises a new interface or an
interface extending existing interfaces, allowing old and new
service implementations to be utilised as the same Type within a
local JVM.
DNS-SD might be used when we don't care about identity or matching
semantics.
What about when we cared about identity?
Similar ground has been covered before in Project Neuromancer with
XuidDirectory, perhaps this could be a crawler instead of utilising
registration? Don't worry about leasing, just repeat the crawl,
discarding the older data on a cyclic basis? Would it be safe to
unmarshall downloaded proxy instance to query for Xuid, provided it was
sandboxed with no permissions and utilised integrity checking? What do
you think Jim?
Interface UuidDirectory
{
Lease register(Xuid id, Object o, long leaselen) throws
UnknownXuidException, RemoteException;
Lease[] register (Xuid[] ids, Object[] o, long[] leaseLens) throws
UnknownXuidException, RemoteException;
Object lookup(Xuid id, XuidDirectory[] visited) throws
UnknownXuidException, RemoteException;
}
Not only would we need worldwide available Codebase services, but
caching Codebase services also.
eg interfaces:
org.apache.river.global.CodebaseService //code I make available which I
sign.
org.apache.river.global.CachingCodebaseService //code I've dynamically
downloaded, not signed by me ( extends CodebaseService)
A caching code base service is important to ensure bytecode remains
available over time when other service locations go down.
Unsigned jar files would not be allowed in a Codebase Service.
For Objects passed between services where identity is important, local
JVM immutability would be desired. The same object is duplicated across
nodes and doesn't change so we don't need to coordinate transactions etc.
All platform services would rely on local code. Platform code could be
upgraded using codebase services on a periodical basis, worldwide
utilising the OSGi platform. I wonder how project Jigsaw will pan out,
perhaps the JVM will become dynamically upgradeable also?
Your Thoughts?
Peter.
Gregg Wonderly wrote:
One of the primary issues with the current lookup server design and
the ServiceRegistrar interface in particular is the fact that one can
only receive unmarshalled services. My work on providing marshalled
results, visible in the http://reef.dev.java.net project, allows the
opportunity to find stuff without getting a JVM memory explosion.
However, there is a further issue, and that is in order to "see into"
the marshalled object you need to either resolve it or dive into the
stream of bytes. My further work on the PreferredClassLoader
mechanism for establishing "never preferred" classes helps to make it
possible to do resolution of remote objects using locally defined
class instances, so that you can, for example, look at Entry objects.
Also, in my reef work, I investigated adding the names of all classes
that are visible in the type hierarchy of the objects so that you
could ask "instanceof" kinds of questions without unmarshalling.
There are just all kinds of issues related to this that come into
play. Performing a Jini lookup, on the internet today, would be like
asking your web browers to open a tab for every page on the net, and
then waiting for that to finish so that you could click through the
tabs to find what you are looking for.
Clearly, lookup needs to be a completely different concept to exist in
a large world such as is visible "on the internet."
Gregg Wonderly
Peter Firmstone wrote:
Anyone got any opinions about Lookup Service Discovery?
How could lookup service discovery be extended to encompass the
internet? Could we utilise DNS to return locations of Lookup Services?
For world wide lookup services, our current lookup service might
return a massive array with too many service matches. Queries present
the opportunity to reduce the size of returned results, however
security issues from code execution on the lookup service present
problems.
If we did allow queries on a Lookup Service, could we do so with a
restricted set of available Types utilising only trusted signed
bytecodes? If bytecode becomes divorced from the origin of a
Marshalled Object, and instead obtained from a trusted codebase
service, then perhaps we could have a system of vetting source code
submitted for the purpose of becoming trusted authorised query
types? Any query utilising untrusted bytecode might return an
UntrustedByteCodeException?
Perhaps we could make service match results available as a
bytestream, clients that couldn't handle large amounts of data could
inspect the bytestream, continually discarding what isn't required?
Check out this link on DNS service discovery:
http://files.dns-sd.org/draft-cheshire-dnsext-dns-sd.txt
Cheers,
Peter.