[Pls also fwd this msg as appropriate as last message. -u]
Addendum:
* Time. Need to store versions of files. One of the things that
distributed information systems increases is stability, and where
there is stability, you need referential integrity, and some things
refer to time-dependent objects. Caching time is good. For
instance, if USENET is gatewayed into the system, and someone posts
a response to a message, the message being responded to would be
available.
* Topography, and topography adaptability.
I first assumed that you guys already do network discovery.
However, the network layer can just be another gateway with
qualities of being a transit type, or a more elaborate general
object type with a number of networking transit qualities.
Distributing information ought to be possible via multiple hops that
may cross many different network bases, such as non-IP networks or
seperate IP networks (such as gateways automatically created by the
multiple computers between a public IPv6 network, a seperate public
IPv6 network, another IPv4 network, and some other non-IP network).
This would require some sort of referentially integrity, possibly by
some sort of subnetwork identification.
The program can figure out to go through one network to get to
another, and if it knows how some protocols are layered, it could do
some typs of tunneling. For instance, if you are behind a mobile
phone, you can use one of the limited phone browers and tunnel via
that to get to a server which would be connected to the rest of the
distributed network. Also, tunneling communications inside of an
HTTP protocol would work to get around firewalls that stop illicit
activity on other ports indiscriminantly.
* Reference. Adding a reference from one object to another can allow
someone to effect a type of post. The distributed system would be
best to maintain a bidirectional link and good ways to find the
second half of links which may exist but haven't yet been updated on
both sides, so that "posting" may be reliably conducted. USENET's
distribution method can be mixed with WWW's lookup mechanisms
conceptually, but done instead without any extra programming by
simply using the inate abilities of a distributed index and storage
system as you have or would improve to have.
Then, real, lasting, richly organized tree and better structured
discussions can cross and last decades, without the impediments
found on many other systems:
* Inability to respond to WWW pages.
* Inability to stop USENET discussions from repeating over and over
again forever due to the two-week deletion tendency -- this
problem causes discussions to never get more than about two weeks
in maturity before devolving, and tires out the more experienced
discussors, who eventually leave the discussions and stop
contributing fine-tuned or mature aspects of those discussions.
* Addendum to gateways in previous message:
Gateways often within themselves have their own distribution. One
of the big problems with USENET is its insistence on every node
having every message, causing large problems with bandwidth and
space. One of the big problems with WWW is its tendency of every
message being only on one or less nodes, causing big problems with
reliability of accessing the message. Both impact stability.
However, in both cases, there is a technique to know what the
reference is -- in WWW, URLs or multiple URLs (in case of mirrors);
in USENET, the message ID. Gateways ought to gateway this concept
straight into the compression system that I discussed earlier.
You might want to take shortcuts and say "but why go by Ulmo's
letter and do compression checks when two messages have the same
message IDs? You could just ignore the other as being equal."
Well, no -- the probability aspect of the protocol ought to put a
probability on that. Requested probability of correct message being
equal to the originl would sometimes require the program to download
both copies available on two available USENET servers (such as NNTP
client servers), so as to compare the two copies. Besides a more
complex storage of interesting differential distribution history
(from the Path: header in USENET), there is something far more
important and more relevent and therefore more necessary and less
skippable: the actual content (Summary:, Subject:, From:, Date:,
etc., anything that would affect the actual content of the message,
as well as the message body itself). If there is some discrepancy,
both versions must be transferred and maintained to the end, and the
user must be notified by his UA (user agent) of the differences; if
the UA is incapable of differentiating, then there are a
smorgasboard of choices in order to "do the right thing" to
hand-hold such unintelligent UAs:
1. Download a lot of copies.
Attempt to find the source of the message. Use Path:, From:,
Sender:, etc.
Use these to help indicate the best possibilities.
2. Look at other integrity issues.
Check the Lines: and other authentification (such as PGPMoose,
and other PGP signatures (PGP/MIME, straight PGP, etc.) as
extremely strong possibilities as to what is correct.
Finally, choose the best.
* Review.
This has been discussed many times, but one of the most important
things will have to be a one-keystroke applicability and quality
meter. Before selecting the next message or page, a user can
give his or her review of a message with something like a single
0-9 scale of applicability:
0-2: Wrongly categorized message.
0: Completely inapplicable message. e.g., Transmission error.
1: Intentional or negligent incorrect categorization. e.g., SPAM.
2: Unintentional incorrect categorization. e.g., stupid user.
3-6: Correctly categorized message with low Signal to Noise (S/N) ratio.
3: Just doesn't know anything flamer.
4: Just doesn't know anything, not flaming.
5: Flamer but with some knowledge but who had his brain turned
off at the time.
6: Someone who wrote a halfway decent message but who had his
brain turned off and either ought to have known better,
had better discipline, or should get more knowledge.
7-9: Decent messages.
7: Tolerably decent message, not great, not even good, but ok.
Someone with their brain turned on, knowledgeable, and
relevent, could manage to get a score like this on a bad day
when they write a bad message (sick? not enough time? in a
hurry?)
8: Good message. Someone with time would be well to read this.
Has most of most qualities of a great message, but just isn't
great.
9: Great message, very well written, very objective, not at all
subjective except where that is appropriate and well formed,
high S/N ratio, etc. Anybody not reading this message
ought not read any messages at all unless looking up a
specific thing.
Default filters would choose a smorgasboard of reviewers and choose
a threshold of averages somewhere equal to and above 7 or 8.
This one keystroke is easy enough that it does not take a lot of
extra time to complete, and everybody can do it. If it requires
more than one keystroke or mousestroke, it is just too much. It can
be followed with whatever navigation strokes are necessary for the
next thing.
All of these concepts apply to all data types. Text discussion is
definitely a nice area to integrate with ideas above, but it is also
highly relevent with music (subpieces to a score, ideas, versions,
etc.; ever try to put together a movie soundtrack, or song tracks?),
pictures (objects that belong in the pictures and pictures that depict
things inside of other pictures or that are parts of other pictures or
updates to or similar to other pictures), web sites (reviews of a
product directly on the web site of the product discussion or
company's web server without their consent nor ability to stop you
from posting), and all media types in general. For instance, a data
set which represents some tape dump from scientific space observations
can have all sorts of "replies" or "references" attached to it:
papers that used or interpreted the data; programs that were written
that used or interpreted that data (both of those examples would save
some reviewers of the data a lot of time); parental links explaining
where the data came from, its format, etc.
When you get into distributed information, you really get into all of
this stuff. This is the worldwide network we're talking about --
beyond World Wide Webivision. With the programs you're writing now,
the time to put in these enhancements is not some "fuzzy future" nor
"your kids' responsibility", but now. You have the code bases upon
which to graft modifications. Rewrites are issues you can handle.
Waiting for some televison corporation to write these programs is a
fallacy. Take a hint from Mike Godwin -- he's right to have people
take matters into their own hands. That's yours, if you're a
developer.
I would have written all this myself if I had a serene clean mountain
to meditate on and substinence that kept me healthy, but due to the
bastardly difficult requirements of staying alive in the thin
margin-busting inefficient 1980s, 1990s, and 2000s, none of this is
possible. Sponsors typically are uninterested, since none of these
quality-enhancing necessities really make any "money".
Brad Allen <[EMAIL PROTECTED]>
PGP signature