[freenet-tech] Re: Distributed ideas missing -- pls fwd to developers;:appropriate redundancy, sizing flexibility, appropriate authentificationand security, compression, gateways and caching

Brad Allen Sun, 15 Apr 2001 16:38:22 -0700
[Pls also fwd this msg as appropriate as last message. -u]

Addendum:

* Time.  Need to store versions of files.  One of the things that
  distributed information systems increases is stability, and where
  there is stability, you need referential integrity, and some things
  refer to time-dependent objects.  Caching time is good.  For
  instance, if USENET is gatewayed into the system, and someone posts
  a response to a message, the message being responded to would be
  available.
  
* Topography, and topography adaptability.

  I first assumed that you guys already do network discovery.

  However, the network layer can just be another gateway with
  qualities of being a transit type, or a more elaborate general
  object type with a number of networking transit qualities.
  Distributing information ought to be possible via multiple hops that
  may cross many different network bases, such as non-IP networks or
  seperate IP networks (such as gateways automatically created by the
  multiple computers between a public IPv6 network, a seperate public
  IPv6 network, another IPv4 network, and some other non-IP network).
  This would require some sort of referentially integrity, possibly by
  some sort of subnetwork identification.

  The program can figure out to go through one network to get to
  another, and if it knows how some protocols are layered, it could do
  some typs of tunneling.  For instance, if you are behind a mobile
  phone, you can use one of the limited phone browers and tunnel via
  that to get to a server which would be connected to the rest of the
  distributed network.  Also, tunneling communications inside of an
  HTTP protocol would work to get around firewalls that stop illicit
  activity on other ports indiscriminantly.

* Reference.  Adding a reference from one object to another can allow
  someone to effect a type of post.  The distributed system would be
  best to maintain a bidirectional link and good ways to find the
  second half of links which may exist but haven't yet been updated on
  both sides, so that "posting" may be reliably conducted.  USENET's
  distribution method can be mixed with WWW's lookup mechanisms
  conceptually, but done instead without any extra programming by
  simply using the inate abilities of a distributed index and storage
  system as you have or would improve to have.

  Then, real, lasting, richly organized tree and better structured
  discussions can cross and last decades, without the impediments
  found on many other systems:

  * Inability to respond to WWW pages.
  * Inability to stop USENET discussions from repeating over and over
    again forever due to the two-week deletion tendency -- this
    problem causes discussions to never get more than about two weeks
    in maturity before devolving, and tires out the more experienced
    discussors, who eventually leave the discussions and stop
    contributing fine-tuned or mature aspects of those discussions.

* Addendum to gateways in previous message:

  Gateways often within themselves have their own distribution.  One
  of the big problems with USENET is its insistence on every node
  having every message, causing large problems with bandwidth and
  space.  One of the big problems with WWW is its tendency of every
  message being only on one or less nodes, causing big problems with
  reliability of accessing the message.  Both impact stability.
  However, in both cases, there is a technique to know what the
  reference is -- in WWW, URLs or multiple URLs (in case of mirrors);
  in USENET, the message ID.  Gateways ought to gateway this concept
  straight into the compression system that I discussed earlier.

  You might want to take shortcuts and say "but why go by Ulmo's
  letter and do compression checks when two messages have the same
  message IDs?  You could just ignore the other as being equal."
  Well, no -- the probability aspect of the protocol ought to put a
  probability on that.  Requested probability of correct message being
  equal to the originl would sometimes require the program to download
  both copies available on two available USENET servers (such as NNTP
  client servers), so as to compare the two copies.  Besides a more
  complex storage of interesting differential distribution history
  (from the Path: header in USENET), there is something far more
  important and more relevent and therefore more necessary and less
  skippable:  the actual content (Summary:, Subject:, From:, Date:,
  etc., anything that would affect the actual content of the message,
  as well as the message body itself).  If there is some discrepancy,
  both versions must be transferred and maintained to the end, and the
  user must be notified by his UA (user agent) of the differences; if
  the UA is incapable of differentiating, then there are a
  smorgasboard of choices in order to "do the right thing" to
  hand-hold such unintelligent UAs:


  1.  Download a lot of copies.

      Attempt to find the source of the message.  Use Path:, From:,
      Sender:, etc.


      Use these to help indicate the best possibilities.

  2.  Look at other integrity issues.

      Check the Lines: and other authentification (such as PGPMoose,
      and other PGP signatures (PGP/MIME, straight PGP, etc.) as
      extremely strong possibilities as to what is correct.

  Finally, choose the best.


* Review.

  This has been discussed many times, but one of the most important
  things will have to be a one-keystroke applicability and quality
  meter.  Before selecting the next message or page, a user can
  give his or her review of a message with something like a single
  0-9 scale of applicability:

  0-2:  Wrongly categorized message.
  0:    Completely inapplicable message.  e.g., Transmission error.
  1:    Intentional or negligent incorrect categorization.  e.g., SPAM.
  2:    Unintentional incorrect categorization.  e.g., stupid user.
  3-6:  Correctly categorized message with low Signal to Noise (S/N) ratio.
  3:    Just doesn't know anything flamer.
  4:    Just doesn't know anything, not flaming.
  5:    Flamer but with some knowledge but who had his brain turned
        off at the time.
  6:    Someone who wrote a halfway decent message but who had his
        brain turned off and either ought to have known better,
        had better discipline, or should get more knowledge.
  7-9:  Decent messages.
  7:    Tolerably decent message, not great, not even good, but ok.
        Someone with their brain turned on, knowledgeable, and
        relevent, could manage to get a score like this on a bad day
        when they write a bad message (sick?  not enough time?  in a
        hurry?)
  8:    Good message.  Someone with time would be well to read this.
        Has most of most qualities of a great message, but just isn't
        great.
  9:    Great message, very well written, very objective, not at all
        subjective except where that is appropriate and well formed,
        high S/N ratio, etc.  Anybody not reading this message
        ought not read any messages at all unless looking up a
        specific thing.

  Default filters would choose a smorgasboard of reviewers and choose
  a threshold of averages somewhere equal to and above 7 or 8.

  This one keystroke is easy enough that it does not take a lot of
  extra time to complete, and everybody can do it.  If it requires
  more than one keystroke or mousestroke, it is just too much.  It can
  be followed with whatever navigation strokes are necessary for the
  next thing.

All of these concepts apply to all data types.  Text discussion is
definitely a nice area to integrate with ideas above, but it is also
highly relevent with music (subpieces to a score, ideas, versions,
etc.; ever try to put together a movie soundtrack, or song tracks?),
pictures (objects that belong in the pictures and pictures that depict
things inside of other pictures or that are parts of other pictures or
updates to or similar to other pictures), web sites (reviews of a
product directly on the web site of the product discussion or
company's web server without their consent nor ability to stop you
from posting), and all media types in general.  For instance, a data
set which represents some tape dump from scientific space observations
can have all sorts of "replies" or "references" attached to it:
papers that used or interpreted the data; programs that were written
that used or interpreted that data (both of those examples would save
some reviewers of the data a lot of time); parental links explaining
where the data came from, its format, etc.

When you get into distributed information, you really get into all of
this stuff.  This is the worldwide network we're talking about --
beyond World Wide Webivision.  With the programs you're writing now,
the time to put in these enhancements is not some "fuzzy future" nor
"your kids' responsibility", but now.  You have the code bases upon
which to graft modifications.  Rewrites are issues you can handle.
Waiting for some televison corporation to write these programs is a
fallacy.  Take a hint from Mike Godwin -- he's right to have people
take matters into their own hands.  That's yours, if you're a
developer.

I would have written all this myself if I had a serene clean mountain
to meditate on and substinence that kept me healthy, but due to the
bastardly difficult requirements of staying alive in the thin
margin-busting inefficient 1980s, 1990s, and 2000s, none of this is
possible.  Sponsors typically are uninterested, since none of these
quality-enhancing necessities really make any "money".

Brad Allen <[EMAIL PROTECTED]>
PGP signature
[freenet-tech] Re: Distributed ideas missing -- pls fwd to developers;:appropriate redundancy, sizing flexibility, appropriate authentificationand security, compression, gateways and caching

Reply via email to