Re: [vos-d] Van Jacobson: "named data"

Len Bullard Mon, 07 May 2007 20:21:55 -0700

Versioning yes, but also vetting and revetting of sources.  The further you
get from original sources in any communication system, the more noise you
incur without adequate checks.  Shannon 101.  Names alone won't do it.


I put a trivia test at my personal blog just for a "Do you trust Google and
Wikipedia" test.  The problem is one of not starting from an authenticated
or original source.  If you start from wikipedia to answer those questions
without the original source, you will get about half of them wrong or near
wrong.

Modern Internet traffic worries about efficiency but typically the data is
short lived.  If you live where I live you get to watch a fascinating
change: NASA is hiring as many sixty and even 70 plus year old engineers as
they can find if they have actual J2 series engine experience.  The original
sources and digital systems failed to keep enough documents alive.  They
have the designs but like the Canadians who tried to rebuild the V2 engines
for their contest submission, they don't know how to run them and it turns
out the devil is really in the details.

len

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Peter Amstutz


Summary: 

First 40 or so minutes explaining why networks up until now evolved the 
way they did.  Circuit-oriented telephone networks evolved the way they 
did due to specific ways the underlying circut switching technology 
worked (going back to human operators working a switchboard!).  
Packet-switched networks were revolutionary because, unlike the phone 
system, they were agnostic to the underlying transport medium.  TCP/IP 
was designed for point-to-point communication based on the assumption 
that the primary use of data networks would still be for point-to-point 
"conversations".  Also, TCP/IP was designed in an environment where each 
computer had many users, by constrast with today, where you have many 
computers per user.

The second part of the talk describes where we are today, and how 
networks can be adapted to make it better.  Modern Internet usage has 
evolved such that the vast majority of traffic is better described as 
broadcast traffic rather than point-to-point: publishing web pages, 
streaming video, file sharing, even email in the case of mailing lists.  
This is very inefficient if many users are requesting the same data at 
once.  Another problem posed by current architechtures are the 
challenges of data synchronization between devices, which can also be 
traced to the fact that devices are often required to synchronize on a 
peer-to-peer basis, rather than having a mechanism to broadcast changes 
to other devices.

The proposed solution is a bit light on details, but big on ideas: to 
deal with problems of scale in the age of Internet publishing, we step 
away from our notions of purely fixed-address, point-to-point 
communication, and consider that in many cases, it is highly desirable 
to be able to automatically replicate and propagate that data.  In the 
example given, when you access the New York Times (newspaper) front 
page, you shouldn't care whether the actual data you get is served from 
the NYT web server, or from some other downstream server that has a copy 
-- provided you can verify that it originated from the NYT by checking 
the digital signature.  One significant idea mentioned was that, in the 
way that TCP/IP abstracts the underlying physical transport layer, such 
a system ought to be abstracted from the protocol layer -- so that data 
can be propagated by whatever physical or virtual means are most 
appropriate or available.

He points to Gnutella and Bittorrent as examples of trends in this 
direction.  Each system demonstrates the two key properties of this type 
of approach, that once something is published and replicated a few times 
it may stay in the network even if the original source is no longer 
available, and that popular resources are inherently load balanced by 
virtue of the fact that the more people access a resource the more 
intermediate servers will have a copy.  Unfortunately he didn't seem to 
mention Freenet (http://freenetproject.org), which to my knowledge is 
the most complete implementation of many of the ideas he's promoting.

Commentary:

This talk is primarily aimed at spurring people to do more research in 
this area.  For this reason, it poses many questions but provides few 
concrete answers as to how such a system would be put together in 
practice.  He helpfully separates it out into the "easy stuff" (problems 
for which reasonable solutions already exist) and the "hard stuff" 
(everything else).

He doesn't really touch on the highly dynamic nature of current web 
sites.  When every user is served a custom web site, complete with 
widgets and ads personalized to their zip code, it's much more difficult 
to replicate in a useful way.  Of course media (sound, images, video, 
maybe 3D meshes later on) are usually not (yet) dynamically generated, 
and account for quite a lot of bandwidth, so there are still gains to be 
made there.  Resources like HTML pages could also be divided up into 
finer grained representations that distinguish static and dynamic 
elements.

He does mention that timestamps and versioning would need to be an 
inherent part of this system so that published resources can be updated.

It's worth noting that a key difference from caching seems to be that 
this would be inherently a "push" system -- when you publish something, 
you go and bang on the doors of nearby hosts and ask them to pretty 
please replicate your data and pass it on if they know of any other 
hosts that might be interested.  This is interesting, because this kind 
of "push/flood" system ends up being similar to store-and-forward 
message routing as new data is directed through several hops to 
eventually reach every host that's expressed an interest in that data.

How this might influence VOS:

Replication, migration and versioning are essential for long-term 
scalability of a distributed system like VOS, and that VOS is in many 
ways a great example of a "data dissemination" system that he talks 
about.  Something I've also come around to realize is that some notion 
of time in the system is critical, and that "time" and "versioning" are 
fairly closely related concepts when describing a series of changes to a 
particular resource.

So, it is useful to think about how the s5 design will accomodate object 
replication and migration and their relationship to time and versioning.

Something that we also need to consider is the fact that vobjects are 
both declarative (well defined data fields, not opaque) and 
computational objects.  Replicating data is relatively straightforward, 
but what about computation?  I can think of at least three cases when 
making a call on a replicated object:

 - No replicated computation: no chance for local processing, always 
send a message to the master vobject.  Example: talk messages.

 - Predictive computation: send a message, try to guess the result but 
there's a chance we'll be overruled.  Example: movement interpolation, 
physics.

 - Deterministic computation: the behavior will have effectively 
identical outcome whether run in the replication local replication or 
the master vobject.  Example: a mouse rollover graphic effect.

To really support replication in the presence of versioning, the vobject 
"descriptor" needs to incorporate time and versioning to get a fully 
qualified vobject identifier. suitable for replication and caching 
(including routing and security bits) might include:

 * site id
 * vobject id
 * embedded child id
 * last modification time
 * version number
 * capability key
 * hash code

So, Lalo, this is probably a bit more than you expected :-) I think the 
answer to your question ("could VOS be useful for the things Van 
Jacobson talks about") is yes, if we incorporate a robust notion of time 
and version as related to state changes.  If anyone thinks this is 
fanciful, this actually cuts right to the core of how remote vobjects 
work, and how we eventually handle caching -- central issues to the s5 
redesign.




_______________________________________________
vos-d mailing list
[email protected]
http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d

Re: [vos-d] Van Jacobson: "named data"

Reply via email to