[hackers] Re: Edge-to-Edge Principal / Reed's Law

2003-08-01 Thread Ka-Ping Yee
Hi, Zack.

Thanks for the pointers.  I'm copying the list on this so we can all
have the same context in this discussion.


1. THE END-TO-END ARGUMENT
--

On Thu, 31 Jul 2003, zachary rosen wrote:
 http://www.wikipedia.org/wiki/End-to-end_argument
[...]
 A lot more can be found on Edge to Edge (also referred to as end to end)
 from google.  It is something that has been talked about for quite some
 time.

Ah -- now i see that you are talking about what is familiar to me as
the end-to-end argument.  I do know the Saltzer, Reed, and Clark
paper [1].  I wondered if this is what you meant by edge-to-edge,
but you seemed to be describing something so different from the
end-to-end argument that i assumed you must have meant something else.

I think you have misunderstood what Saltzer et al. were trying to say.
Let me try to explain.  The end-to-end argument is a design principle
that has to do with deciding whether system functionality should be
placed at high levels or low levels.  The paper argues that functions
placed at low levels may be redundant or to costly to be worth it,
because the same functions often have to get reimplemented by the higher
levels anyway -- because the higher levels (e.g. the application) know
their own needs better.

Putting functionality at a lower level amounts to making an assumption
that every application will want that functionality.  But if your
assumption is wrong, the lower levels might waste a lot of resources
trying to provide a service that the application doesn't even need.
So you should rely on intelligence at the highest level (in the case
of a network, the endpoints of communication) instead of getting too
obsessed with the lower levels.

For example, it might seem reasonable to assume that a network should
always deliver error-free packets.  So adding a checksum-and-retry
feature to a network layer in order to guarantee accurate delivery may
seem like a good idea.  But there are some applications that care more
about speed than accuracy -- such as voice over IP -- and these would
be harmed by the inefficiency of a checksum-and-retry feature.

Now let's return to our question about whether the media database
should be centralized.  Regardless of whether it is centralized or
distributed, we are still obeying the end-to-end argument: we are not
putting any smarts in the transport layer (TCP/IP); we are totally
relying on smarts at the endpoints of communication (that is, the Web
browser and the Web server).  No one is putting in functions at a low
level that are getting reimplemented at a higher level.

So the end-to-end argument has no bearing on our decision at all.
In particular, it is purely an efficiency argument, and it doesn't
say anything about peer-to-peer networks.  (Be warned, by the way,
that lots of companies use the terms end-to-end and peer-to-peer
because they are fashionable, not because they know what they mean.)


2. REED'S LAW
-

 http://www.wikipedia.org/wiki/Reed%27s_law

Originally, my response was going to be that Reed's Law has no effect
on our decision either.  Reed's Law says that the utility of a network
is exponentially related to the number of participants.  But it doesn't
matter whether you have 5 users at site A and 5 users at site B, or
just 10 users at site Z -- you still have 10 users, and utility on the
order of 2^10.  The utility is the same regardless of whether the
database is centralized or distributed.

But then i went back and read the original paper [2] and thought about
it a little more.  Now i've realized that Reed's Law actually argues
in *favour* of a centralized database.

Notice that the paper doesn't say all networks have utility that
scales exponentially in the number of participants.  It refers to a
specific *type* of network -- a Group-Forming Network.  In his words:

A GFN has functionality that directly enables and supports
affiliations (such as interest groups, clubs, meetings,
communities) among subsets of its customers.  Group tools and
technologies (also called community tools) such as user-defined
mailing lists, chat rooms, discussion groups, buddy lists, team
rooms, trading rooms, user groups, market makers, and auction
hosts, all have a common theme -- they allow small or large
groups of network users to coalesce and to organize their
communications around a common interest, issue, or goal.

The reason that the utility scales exponentially is that, if N people
are allowed to form and coordinate their own groups of any size, then
there are 2^N possible groups that can be formed.  The whole point of
Reed's paper is to argue that this group-forming capability is
essential and extremely powerful.  As an example, he compares ordinary
e-mail to mailing lists.  Ordinary point-to-point e-mail connects only
two people, so its utility scales by N^2 (Metcalfe's Law).  But a
mailing list can coordinate any number of members, so its utility
scales by 

Re: [hackers] Re: Edge-to-Edge Principal / Reed's Law

2003-08-01 Thread CMR

 Giving people and media items a fixed address at one location vastly
 simplifies the problem of forming these groups and collections.  It's
 much harder to find other users and media items scattered across many
 different sites than at one central site.  (This is why we are building
 VV!)  And it's much harder to coordinate and update a collection
 containing items scattered across many sites than at one central site.


It seems to me this relates to the classic napster vs gnutella achitecture
evaluation(?). The selling point of the distrubuted, decentralized nature of
gnutella was, in the main, user  privacy. Performance though, in my personal
experience and from a system logistics point of view, was in napster's
corner and I atrribute this to the directory residing on a centralized
server. Though the files were distributed, queries were routed thru a
central hub as opposed to decentralized nodes and thus the path between
users was shortened.

 Reed's Law argues that our media network must be a Group-Forming Network.
 To form these groups, we have to link media items and people together
 and to each other.  As i explained in our IRC discussion, this is easy
 to do if the database is centralized and very complicated otherwise.


Simpler from a system wide perspective in the case media, perhaps, but keep
in mind that somebody's got to create and maintain the centralized db,
associated code and/or the host itself; realistically with the kind of
loving care and attention of a commercial venture (on call 24/7) precisely
because of the centralized structure. If the central server goes down or
lags, the whole thing grinds to a halt till somebody can find the time to
troubleshoot and fix it.

At least in the abstract(!), a decentralized p2p system could conceivably
benefit from the redundancy of the query pathways between content servers,
shades of Arpanet, so that DeanSpace labor and resources might be directed
towards the 5700 other things that need to be maintained, improved, added...

Mind you I emphasize the above term abstract cause I have no idea
precisely how such a system might be designed and implemented; gnutella
based?
Granted, the feasibility of an in place, functional and reliable distributed
system may well prove the best argument for the ctralized option in the end.
CMR

--enter gratuitous quotation that implies my profundity here--



Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised

2003-08-01 Thread CMR
Statement edited for clarity:

Granted, the feasibility of an in place, functional and reliable
distributed
system may well prove the best argument for the centralized option in the
end.

CMR

--enter gratuitous quotation that implies my profundity here--



Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread CMR
  Granted, the feasibility of an in place, functional and reliable
  distributed system may well prove the best argument for the
  centralized option in the end.

 Hi CMR,

 I'm sorry -- i'm still having trouble figuring out this statement.
 Did you mean that the *infeasibility* of a distributed system is
 an argument for a centralized option?  Or did you mean that the
 feasibility of a centralized system is an argument for a centralized
 option?  Or that the feasibility of a distributed system is an argument
 for a distributed system?

apologies;  feasibility was implied as feasibility, or lack there of

Therefore the daunting prospect of actually designing and implementing a
distributed media system that realizes our goals may, in the end, be the
best argument against it; and thus for the centralized. I say may be
because there may in fact exist opensource code that we could leverage
effectively for this task; I'm just ignorant of what's available and of the
respective pros/cons..

Cheers
CMR

--enter gratuitous quotation that implies my profundity here--



Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread zachary rosen

On Fri, 1 Aug 2003, Ka-Ping Yee wrote:

 On Fri, 1 Aug 2003, zachary rosen wrote:
  A quick note - the decentralized system that is being proposed is NOT peer
  to peer.  At the top, at the aggregator, it functions just the same as the
  centralized solution: One database, searchable and acessable by all - ie
  napster.
 
  The difference is how the metada gets to the central DB.  Either it is
  aggregated from nodes, or required to be centrally submitted to bypass the
  technical hurdles of aggregation.

 I believe we are all agreed up to this point.

Woo! :)

  I am under the impression that feasability or technical-hard-ness of
  building the metadata collection functionality to be decentralized rather
  than requiring it to be centrally submitted does not nearly outweigh the
  problems with admining, maintaining, and hosting the central solution.

 We have to maintain the central host anyway, because we've already
 agreed that we need a main search site.  The question is whether we
 ought to introduce *additional* administrative tasks for people
 running other DeanSpace sites in the media network.

I completely disagree.  I don't see how having node admins help manage
their local repositories increases the amount (*additional*)
administrative work.  We want to have all nodes have media hosting
capabilities (I believe).  This means there will have to be admin work
done to set up a media hosting module no matter what.  The only real
difference between centralized and decentralized in terms of admin work
required then becomes the mundane maintenance tasks: pruning and
organization.  If the nodes are empowered to maintain their own local
repository then doing this work is offloaded from the potentially very
large bottlneck of having it all maintained / pruned in a central site by
one set of admins (DMT) to the many capable node admins.  Am I missing
something?


  When i get home i will be rifling through some books for quotes to support
  my claim that Reed's Law and End-to-End principals support the
  decentraralized design over the centralized design.

 Since we seem to have different perspectives on what these terms mean,
 i'd rather talk directly about the reasoning, as in This design is
 better for the following reasons... rather than This design is better
 because there's a principle that says so.

Of course ;p

 Now, i'm not against fault-tolerance.  Yes, of course it would be good
 to have a resilient system where some of the hosts can fail and things
 keep working.  We can certainly talk about ways to do that after the
 data has been centrally submitted (e.g. more traditional kinds of
 redundancy, such as mirror sites for the search engine).

With the decentralized solution fault-tolerance is vastly easier to
manage: it is built into the system - every node that does media stuff
directly offloads hosting tasks from the central server.  Furthermore,
when problems arise the decentralized solution handles them far more
gracefully.  Even if the central aggregator goes down the network is still
mostly functional - the only thing that breaks is central search.  If the
central site goes down then the entire system is basically sunk.

It seems to me that the technical hurdles of designing reduncancy into the
central solution to deal with these very real problems would be harder and
require more engineering effort than creating the network decentralized in
the first place.  Furthermore the decentralzied solution can handle these
problems enumerably better.

-Zack



Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread Jay R. Ashworth
On Fri, Aug 01, 2003 at 02:32:53PM -0500, zachary rosen wrote:
  On Fri, 1 Aug 2003, zachary rosen wrote:
   The only real
   difference between centralized and decentralized in terms of admin work
   required then becomes the mundane maintenance tasks: pruning and
   organization.  If the nodes are empowered to maintain their own local
   repository then doing this work is offloaded from the potentially very
   large bottlneck of having it all maintained / pruned in a central site by
   one set of admins (DMT) to the many capable node admins.  Am I missing
   something?
 
  Yeah.  Anybody can prune the database.  For maintenance operations that
  might be a little dangerous, like vetoing media, we can hand out moderator
  accounts on the central site to volunteers we trust.  In fact, it would
  be easier to find volunteers to just moderate than volunteers to run
  an entire DeanSpace site.
 
 This is exactly the reason I am so opposed to this solution.  It is a
 basic question: who do you trust more to vett / prune media on the system
 that comes from nodes? DMT - or the nodes themselves?

It might be productive, at that juncture, to make explicit the assumptions
you're carrying about what *sort* of vetting might be being done, by whom,
and for what reasons.

I suspect we have almost a *classic* case of the centralized/decentralized
debate going here, and the two sides of this one aren't ever *going* to agree
in my experience, so let's just yell Hitler! and Godwin!, and define our
assumptions.  :-)

Cheers,
-- jra
-- 
Jay R. Ashworth[EMAIL PROTECTED]
Member of the Technical Staff Baylink RFC 2100
The Suncoast Freenet The Things I Think
Tampa Bay, Floridahttp://baylink.pitas.com +1 727 647 1274

   OS X: Because making Unix user-friendly was easier than debugging Windows
-- Simon Slavin, on a.f.c


Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread CMR

 A quick note - the decentralized system that is being proposed is NOT peer
 to peer.  At the top, at the aggregator, it functions just the same as the
 centralized solution: One database, searchable and acessable by all - ie
 napster.


Think I got it; so the aggregator functions like the news feeds(?) except
it's remotely querying databases, periodically (or on the fly -pushed?-),

 The difference is how the metada gets to the central DB.  Either it is
 aggregated from nodes, or required to be centrally submitted to bypass the
 technical hurdles of aggregation.


The killer app aspect, then, is the automation of data directory
centralization as opposed to relying on participants to visit and update the
central site manually(?)

 I am under the impression that feasability or technical-hard-ness of
 building the metadata collection functionality to be decentralized rather
 than requiring it to be centrally submitted does not nearly outweigh the
 problems with admining, maintaining, and hosting the central solution.


Given the above clarification, it sounds more do-able if the node admins can
be educated..

 When i get home i will be rifling through some books for quotes to support
 my claim that Reed's Law and End-to-End principals support the
 decentraralized design over the centralized design.


Interestingly, reed gave a presentation where he illustrated his law (I've
got a semantic bone to pick with that term as applied to theories  * 
strict definitional  sense*, but I digress..) using the example of online
auctions:

But the theory is less important than the practice, at least if you're
trying to profit from the Internet, so I'll make some predictions based on
the likely effects of the Group-Forming Law in 2002:
The obvious conclusion is that whoever forms the biggest, most robust
communities will win. But the Group-Forming idea can be used to look well
beyond the obvious and discriminate among strategies that are all billed as
building communities. For instance, Internet auction pioneer Onsale, which
buys closeout products and auctions them on its Web site, will see its value
rise only in proportion to the number of users. On-line classifieds, which
connect buyers to sellers on a peer-to-peer basis, should see a stronger,
Metcalfe effect. Ebay, which began as one person's attempt to establish a
market for Pez candy dispensers, should get an even more powerful
Group-Forming effect because it helps members act in groups as they auction
off and bid for products on-line. (Other economics work in favor of Ebay,
too. Because the Group-Forming effect will give it enormous volumes of
business, it can charge a lower commission on sales. The low fees will
attract more users and produce a virtuous circle. Also, because it's Ebay's
customers who do the selling, Ebay doesn't face any inventory or
product-development issues.)

Notice he touts EBay as an example; a centralized system. But it doesn't
necessarily follow that a thriving Group-Forming online communty can't be
fostered via a distibuted network. Jon Udell apprently thinks just the
opposite in evaluting the future of Radio Userland:

Both approaches are valid, but there is a middle ground -- more coherent
than email, less isolated than Groove -- that needs to be occupied. Radio
doesn't yet know how to occupy that middle ground. But it has the tools
people need to do the experiment: a distributed scripting engine and object
database, Web-services protocols. When Radio's currently-centralized
community engine itself becomes distributable (as is planned), I expect to
see an explosion of group-forming activity. The spaces thus constituted will
express different sets of values, but they'll federate in the way that
Reed's Law predicts.

see:
http://www.xml.com/pub/a/ws/2002/03/01/udell.html



Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread zachary rosen
On Fri, 1 Aug 2003, Ka-Ping Yee wrote:

 On Fri, 1 Aug 2003, zachary rosen wrote:
  This is exactly the reason I am so opposed to this solution.  It is a
  basic question: who do you trust more to vett / prune media on the system
  that comes from nodes? DMT - or the nodes themselves?

 We are all on the same team, Zack.  Site admins can volunteer to be
 moderators if they want; DMT people can volunteer to be moderators
 if they want; my Mom could volunteer to be a moderator, except that
 she has no idea what this is about (Oh, Howard who?  That's nice.).
 It doesn't matter whether submission is central or distributed.

 But a separate issue: it sounds as though now you're suggesting that
 the media items sit on the nodes waiting for approval before they're
 forwarded up to the search database.  Yet the earlier response to
 questions about update consistency was that things would travel
 automatically, two hops up.  So which is it: friction or no friction?

 What bothers me about trying to discuss the distributed submission
 design is that we don't *have* a distributed submission design yet.
 If the hypothetical design keeps morphing each time there's a new
 question, that makes it difficult to talk about.


 -- ?!ng


At the very least the system we come up with should support the following
vetting functionality (in my opinion).

1] Nodes should be able to vett the media in their repositories
2] The central aggregator should be able to vett the media accessible in
the central repository.

With the central solution [1] becomes hard if not impossible to
accomplish.  Either a) every node is responsible for sending an admin to sign up and
be given permission by the DMT to vet media for their node on the central site.  or
b) we build in functionality that allows nodes to veto media in their
local repository that is culled from the central DB... which only solves
half the problem: they still have no way to counter veto a vett from the
DMT. Either solution will take many man hours of time to accomplish, and
this would not even be an issue to consider with the decentralized
solution (both the aggregators and the nodes automatically can vet to their hearts
content).

Whether it is frictionless or there is friction does not effect the
engineering effort required to create this system if it is decentralized.
If it is centralized then this DOES become a problem - because
ultimately we would want to give the nodes the power to choose (frictionless or
friction) and this would become extremely tricky if not impossible if we
did make it centralized.

-Zack



Re: Fw: [hackers] Re: Edge-to-Edge Principal / Reed's Law; revised2

2003-08-01 Thread zachary rosen
On Fri, 1 Aug 2003, Ka-Ping Yee wrote:

 On Fri, 1 Aug 2003, zachary rosen wrote:
  1] Nodes should be able to vett the media in their repositories
  2] The central aggregator should be able to vett the media accessible in
  the central repository.
 
  With the central solution [1] becomes hard if not impossible to
  accomplish.  Either a) every node is responsible for sending an
  admin to sign up and be given permission by the DMT to vet media
  for their node on the central site.

 What's so hard?  In either case, a moderator opens her browser, logs
 into a site, browses the new media items, and rates them up or down.

 I'm going to stop talking about this for now and get back to coding.


What is hard is human nature. If we create a system in which communities
do not have a say in how their own sites are run - we bring on a whole
slew of political / social problems.  Nodes should be able to control
their own media repositories period - not DMT.

-Zack



RE: [hackers] Re: Edge-to-Edge Principal / Reed's Law

2003-08-01 Thread Jon Lebkowsky
It occurred to me that it might be useful to include David Reed in the route
for this msg.

David, hoping you will have time to comment on the notes below. The
hack4dean guys are building a network to support regime change and then
some.

best,
Jon L.

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
 Subject: [hackers] Re: Edge-to-Edge Principal / Reed's Law


 Hi, Zack.

 Thanks for the pointers.  I'm copying the list on this so we can all
 have the same context in this discussion.


 1. THE END-TO-END ARGUMENT
 --

 On Thu, 31 Jul 2003, zachary rosen wrote:
  http://www.wikipedia.org/wiki/End-to-end_argument
 [...]
  A lot more can be found on Edge to Edge (also referred to as end to
  end) from google.  It is something that has been talked about for
  quite some time.

 Ah -- now i see that you are talking about what is familiar to me as
 the end-to-end argument.  I do know the Saltzer, Reed, and Clark
 paper [1].  I wondered if this is what you meant by edge-to-edge,
 but you seemed to be describing something so different from the
 end-to-end argument that i assumed you must have meant something else.

 I think you have misunderstood what Saltzer et al. were trying to say.
 Let me try to explain.  The end-to-end argument is a design
 principle that has to do with deciding whether system functionality
 should be placed at high levels or low levels.  The paper argues that
 functions placed at low levels may be redundant or to costly to be
 worth it, because the same functions often have to get reimplemented
 by the higher levels anyway -- because the higher levels (e.g. the
 application) know their own needs better.

 Putting functionality at a lower level amounts to making an assumption
 that every application will want that functionality.  But if your
 assumption is wrong, the lower levels might waste a lot of resources
 trying to provide a service that the application doesn't even need.
 So you should rely on intelligence at the highest level (in the case
 of a network, the endpoints of communication) instead of getting too
 obsessed with the lower levels.

 For example, it might seem reasonable to assume that a network should
 always deliver error-free packets.  So adding a checksum-and-retry
 feature to a network layer in order to guarantee accurate delivery may
 seem like a good idea.  But there are some applications that care more
 about speed than accuracy -- such as voice over IP -- and these would
 be harmed by the inefficiency of a checksum-and-retry feature.

 Now let's return to our question about whether the media database
 should be centralized.  Regardless of whether it is centralized or
 distributed, we are still obeying the end-to-end argument: we are not
 putting any smarts in the transport layer (TCP/IP); we are totally
 relying on smarts at the endpoints of communication (that is, the Web
 browser and the Web server).  No one is putting in functions at a low
 level that are getting reimplemented at a higher level.

 So the end-to-end argument has no bearing on our decision at all.
 In particular, it is purely an efficiency argument, and it doesn't
 say anything about peer-to-peer networks.  (Be warned, by the way,
 that lots of companies use the terms end-to-end and peer-to-peer
 because they are fashionable, not because they know what they mean.)


 2. REED'S LAW
 -

  http://www.wikipedia.org/wiki/Reed%27s_law

 Originally, my response was going to be that Reed's Law has no effect
 on our decision either.  Reed's Law says that the utility of a network
 is exponentially related to the number of participants.  But it
 doesn't matter whether you have 5 users at site A and 5 users at site
 B, or just 10 users at site Z -- you still have 10 users, and utility
 on the order of 2^10.  The utility is the same regardless of whether
 the database is centralized or distributed.

 But then i went back and read the original paper [2] and thought about
 it a little more.  Now i've realized that Reed's Law actually argues
 in *favour* of a centralized database.

 Notice that the paper doesn't say all networks have utility that
 scales exponentially in the number of participants.  It refers to a
 specific *type* of network -- a Group-Forming Network.  In his
 words:

 A GFN has functionality that directly enables and supports
 affiliations (such as interest groups, clubs, meetings,
 communities) among subsets of its customers.  Group tools and
 technologies (also called community tools) such as user-defined
 mailing lists, chat rooms, discussion groups, buddy lists, team
 rooms, trading rooms, user groups, market makers, and auction
 hosts, all have a common theme -- they allow small or large
 groups of network users to coalesce and to organize their
 communications around a common interest, issue, or goal.

 The reason that the utility scales exponentially is that, if N people
 are allowed to form and coordinate their own