Am Dienstag, 4. Oktober 2005 13:19 schrieb Matthew Toseland:
> On Mon, Oct 03, 2005 at 02:00:52PM +0200, Guido Winkelmann wrote:
> > Hi
> >
> > Am Montag, 19. September 2005 17:07 schrieb Matthew Toseland:
> > > On Sun, Sep 18, 2005 at 04:12:19PM +0200, Guido Winkelmann wrote:
> > > > Hi,
> > [...]
> > > > If 0.7 should be done this way, the mentioned external daemon should,
> > > > of course, come bundled with the standard Freenet package. A stripped
> > > > down fred on its own would be of no use to most, just like a TCP/IP
> > > > stack on its own - or i2p...
> > >
> > > And it would have to run in the same JVM, for performance and user
> > > friendliness reasons (an extra several tens of megs for a second VM is
> > > *BAD*). Why is it beneficial?
> >
> > It doesn't have to run at all, and if it runs, it doesn't have to run on
> > the same computer. Given the extremely high computing resource needs for
> > Freenet, I think this is beneficial.
>
> 99.999% of users will not do this.
Did you count them? Please don't make up statistics to support your point in a
discussion.
> We have to optimize the common case first.
I do think that my proposal _is_ optimizing the common case. Reason:
The common case is (okay, this is guessing again) that the user has only one
computer. The common case is also (or should be) that the node is running
permanently, 24 hours a day, 7 days a week. (Getting even more important in
the darknet case.) The common case is that the user wants to use his computer
for a lot of other things besides Freenet, preferably without having to shut
off the node for that.
Therefore, I think it would be highly beneficial to be able to have the node
use considerably less resources while it is not actually being used. If that
is not possible, users might be tempted to just switch off the whole package
if they urgently need their computing resources for something else.
> > > > Things that should be thrown out that are present in current versions
> > > > (5xxx) include:
[...]
> > > > - The Distribution Servlet (definitely)
> > > > - Everything related to splitfile handling (maybe, more talk on that
> > > > further down)
> > >
> > > Yuck. That would mean that we have to have two separate FCP protocols -
> > > high level FCP and low level FCP.
> >
> > No, not necessarily. If you move splitfile handling out of the node, you
> > cannot supply a high-level FCP at all. If you leave it in, you could
> > support both a low-level- and a high level FCP, but there's no good
> > reason for the low-level one then. So, however you do it, you'll need
> > only one FCP (/maybe/ a second one for streams, but that's an entirely
> > different discussion).
>
> No. We will need a high-level FCP in order for it to be easy to write
> clients. It is very important for it to be easy to write clients,
> especially with smaller blocksizes (=> compressed, binary, split metadata
> etc).
Your point being? I said we need either a high-level or a low-level FCP. You
say we need a high-level FCP. Right. Why not. Personally, I'm very much in
favor of a high-level one, too. (At least high-level from the POV of the
third-party programmer.)
> > > IMHO we must not require every single
> > > client author to implement binary metadata, compressed metadata,
> > > hierarchical metadata, FEC codecs, and everything else required; we
> > > must have a "high level FCP" including reasonably transparent splitfile
> > > support,
[...]
> > > and it should probably include download direct to disk,
> > > download when the client is not connected, and queueing, as well, for
> > > reasons related to starting splitfile downloads from fproxy.
> >
> > I don't think that's a good idea. Better have a properly done download
> > manager outside the node than a half-assed one inside it.
>
> Why would it be half-assed? It wouldn't have much of a GUI, so third
> parties can build one themselves for it. But it would be accessible from
> ALL CLIENTS, which solves all manner of integration problems e.g. the
> presence of large files on fproxy sites.
Hm, so it turns out it's more of helpful back-end for a real download
manager...
> > > > - The command line client (I don't think there's a good reason to
> > > > include that in the normal freenet.jar)
> > >
> > > It's tiny, and it's a useful debugging tool. There is no compelling
> > > reason to include it in the jar except to save 10kB or whatever it is.
> > > Which is insufficient reason given Fred's memory footprint!
> >
> > Well, this is not an important part of the whole discussion anyway, but I
> > still think it should get it's own .jar. Having to call it by its class
> > name from the freenet.jar may be intuitive for Java programmers, but not
> > to anyone else. This isn't how how things usually work on the
> > commandline, usually you have one utility = one binary. The way it is
> > now, most people won't even know there is a supplied cl client.
>
> You can provide one binary if you want to. Just like you can provide one
> binary for Fred.
It might actually be best to simply provide a small shellscript (or .bat file
for Windows) as a wrapper for it.
> > > > and maybe other stuff I don't know about.
> > > >
> > > > The advantages to this approach are:
> > > >
> > > > - The node source code becomes smaller and easier to maintain.
> > >
> > > It does? How so? We'd have to maintain two separate source modules!
> >
> > Yes, but it won't have to be the same people maintaining the seperate
> > deamon source who currently maintain fred's. I can, of course, not
> > guaranty that there will be people who will want to do this, but this
> > way, your chances of ever being able to delegate a significant portion of
> > the work off to others will increase dramatically.
>
> It's the same repository. It would not save any access control isssues,
> for example, because either somebody has access to the repository or he
> doesn't.
Are you running out of excuses? Come on, how hard can it be to set up a second
repository? You could just open another sourceforge project for that. Hell,
If you want, I'll set up a repositoryfor you. CVS/Subversion, your choice.
(Although I doubt you will like your repository being hosted on my server...)
> > > > - The operation of the node becomes more reliable. (There is less
> > > > stuff in there that might cause it to crash.)
> > >
> > > The node must be cleanly separated from the UI code, including e.g.
> > > splitfile assembly. This happens via interfaces, not via running it in
> > > a separate VM, and not via having it in a separate CVS module.
> >
> > It happens by running it in a different process or even on a different
> > computer.
>
> Which would slow things down significantly and be used by exactly 3
> users out of 100,000.
Why exactly would it slow down significantly? I'm talking about using the same
interface (FCP) that would be used by any other client.
> > > > - Resource consumption of the node will be lower and much more
> > > > predictable.
> > >
> > > Only if it runs without the extra daemon. Which it won't for the
> > > majority.
> >
> > You're making a big assumption here. Who says everyone will really want
> > to use the services you provide by default with Freenet? Even right now
> > there are probably lots of users who only use fproxy for status info
> > about the node and mostly run their nodes only for frost or pm4pigs or
> > something like that. For these users, fproxy is mostly bloat. (I am one
> > of those users.)
>
> Fproxy is indeed mostly bloat in Fred 0.5, however in 0.7 we are moving
> all the useful functionality that is implemented exclusively for fproxy
> and the command line client into FCPv2, so that it is used by everyone.
> Fproxy itself then is a very thin layer on top. I accept that there WAS
> a problem with there being a lot of code in Fred which was solely for
> the benefit of fproxy, but *we are fixing the problem*. And your
> proposed solution is to make the current problems much much worse by
> forcing everyone to implement FEC encoding, splitfiles, binary metadata,
> split metadata, and everything else, in their own mutually incompatible
> ways, at the cost of thousands of lines of duplicated code.
Where did I say that? Nobody is talking about making FCP2 even more complex.
Actually, I think your current plans for FCP2 are too complex already.
(Multiplexing? Why would we need that? We already have perfectly good
multiplexing on the TCP/IP-level, that's more efficient and more robust than
yours will ever be.)
> We *MUST* have FCPv2, which means that if we do as you suggest we will have
two separate versions of FCP.
Er, no? Where did I say that? (Okay, I did say something like that a bit
further down when talking about the external splitfile handling agent, but
that's only partly related to my original proposal.)
> This is quite possible but it needs to be justified.
>
> > Freenet is, in broad terms, a new anonymizing data transport layer. As
> > such, it can be used broad range of applications, most of which none of
> > us ever even thought about. "Freesite-browsing" might become the least
> > important application before you know it.
>
> It is immensely important for political reasons IMHO. And we can't
> bundle Frost, while we might be able to bundle a java-based
> email/freemail/IM client.
What I was trying to say was that you simply cannot predict future usage
patterns. There might be some great new tool in the future that dramatically
changes the freenet user experience. Then again, that might not happen. The
best approach right now is to make the software as flexible as possible.
Oh, and I'm all for bundling software for Mail over Freenet, IRC over Freenet
or Message board over Freenet (not necessarily Frost), just not as a part of
the node!
> > > The memory overhead for running two JVMs instead of one will be
> > > significant,
> >
> > As I said:
> > - It doesn't always have to be running
> > - It can run on a different computer
>
> 99% of the time it will run on the same computer. 99% of users would not
> make such extreme tweaks; they would use what they were given and then
> install more stuff on top.
But they might want to shut down parts of the Freenet software while they're
not actually using them, especially if they take up a lot of resources. If
they can't do that, they'll just shut down all of it, which is definitely not
what we want.
> > > unless the JVM automatically coalesces, in which case there
> > > is no point anyway.
> > >
> > > > - The code for enduser related functionality becomes a lot more
> > > > accessible for new programmers. (Patches to fad are pretty much
> > > > guaranteed not to interfere with the node's core operations and,
> > > > thus, are more easily accepted.)
> > >
> > > Nobody (in the OSS world) codes java. And of those who do, they don't
> > > use Fred's existing interfaces. This is a matter of documentation and
> > > communication and stubbornness, not a matter of what is in what jar.
> >
> > There definitely are people in the OSS world who code Java. Look at
> > Azureus for example or at I2P. It's also not simply a stubbornness-thing.
> > Look at other large OS-projects like Gnome, KDE, mldonkey[1] and whatnot.
> > These are thriving. Freenet isn't. I think this is a problem and I think
> > it should be addressed and I think that splitting user-related
> > functionality and core functionality is an important step in that
> > direction. (Documentation is probably even more important.)
>
> No, making the interfaces MORE COMPLEX AND MORE LOW LEVEL will not
> encourage the production of new apps.
Again, who, except for yourself, was talking about making the interfaces more
low-level?
> What we are doing - implementing
> easy to use, high level interfaces that hide the complexity of splitfile
> decoding from the app, implementing new services other than store and
> retreive, and implementing and bundling basic communication tools such
> as IRC, email and ideally CVS - will help to get new tools. Making
> everyone code a redundant several thousand lines of code in their apps
> just to save on memory usage when the user never uses splitfiles is
> ridiculous.
Right. But noone was suggesting this.
> > If a new programmer decides he wants to help out, the first thing he'll
> > have to do is find out how things work in the already-existing code,
> > where he'll have to tinker with it to achieve certain goals and where his
> > own code might fit into the whole thing. The bigger and more complex the
> > existing codebase is, the harder and more frustrating this process will
> > get. Right now, if a new programmer who has some ground-breaking ideas
> > for fproxy for example, he will have to wade through tons of code having
> > to do with things like datastore management, routing table management,
> > key processing and whatnot.
>
> He can ask.
Wrong answer. Yes, of course he can ask, and it's certainly important that he
can, but that can't be the real solution. Having to ask things about the
implementation takes up a lot of effort and time on both sides. (A programmer
will need to wait for you actually being present in IRC or write a message to
the mailing list, for example, and then this may still end up as a lengthy
discussion.)
A real solution would involve making it a rare occurence to actually have to
ask. This involves good docs, but also structuring the projects code in a way
that makes it easy to understand and work with, even for newcomers.
> It is more important that it be simple to build 3rd party
> apps than that it be simple to add on to the node, because the latter is
> always going to be harder anyway.
Exactly my point. Which is why I'm arguing for implementing those enduser
features that are supplied by the project itself in exactly the same way that
any other third party app would be implemented.
> But the key problem here is a lack of communication, which mostly emerges
> from the fact that you cannot easily and safely communicate with the devs
> from within the network. The way to fix this is:
> - Implement and bundle an IRC proxy.
> - Implement and bundle an email proxy, and tie it into the email lists.
> - In the meantime, use Frost.
> - Look into distributed version control systems.
okay...
> > If the code were in a seperate program, this would be a lot easier. The
> > general outline of the program, "the way of doing things" in there could
> > be simpler, more streamlined and better adapted to the needs of an
> > end-user tool.
>
> The whole idea of "a separate program" is pretty dubious in java. A
> separate code module maybe, but that's a lot of extra complexity for
> zero gain - you would ALWAYS and I mean ALWAYS need to run the second
> module, to provide FCPv2, if you run any clients.
Huh? FCPv2 is certainly core functionality. I was talking about moving _non_
core functionality out of the node.
> Now, it might be a slight gain from a development view, but the same could
> be achieved by having a separate module and having the two built into a
> single jar with internal interfaces between them.
>
> > Then there is also the issue that people working on enduser related code
> > might easily break some of the node's core functionality if it's all one
> > big program.
>
> End-user code such as?
Everything that actually does something for the user with the functionality
provided by FCPv2. I'd classify enduser code (in this case) as "anything that
is or could be implemented using FCP(v2)".
> CVS does not have fine-grained access control. You cannot grant somebody
> write access to one module but not to another. Now, maybe this is fixed
> by Subversion...
Not to my knowledge. I've been using Subversion for some time now. There are
two ways (that I know of) of accessing the repository, an Apache module based
on WebDAV and a dedicated server "svnserve". I've only been using the Apache
module so far, and in that case access control is done by Apache. I'm not an
expert in Apache, is it possible to do access control access based on the
supplied URL instead of based on what it maps to on the filesystem? If yes,
then you'd have your fine-grained access control.
But, that aside, fine-grained access isn't even really necessary, IMO.
Look at the KDE project:
They have had CVS with coarse access control (i.e. access to either the
complete repository or none) for years, and I think their new Subversion
repository only has coarse access control too. That means, every programmer
who maintains a program in, for example, kdenonbeta (which is a module
containing experimental, mostly unimportant stuff), could, theoretically,
also do commits to kdelibs or kdebase (which contain the core of KDE and are
highly important) if he wanted to. KDE is a very large project with several
hundred active developers from all over the world. This has been working very
well for at least 5 or 6 six years now. (KDE started in 1996, but I don't
know when they started using CVS.)
See, you're not going to give out CVS/SVN access to any doofus who comes
along, but to developers. They should be sensible enough to not commit to an
important module if they don't know what they're doing - if they can get
their work done in a different module. Besides, you can be automatically
notified of new commits, so stupid/destructive/untrusted ones won't get
unnoticed.
> > Someone from the project (I think it was Ian Clarke) said the
> > project had learnt the hard way that lots of times parts of the code that
> > shouldn't affect the performance of the node, did. A consequence of that,
> > IMHO, should be "well then don't put so much code in there that might go
> > wrong.". "Keep It Simple, Stupid", they say.
>
> Einstein once said that the trick with Occam's Razor (~= KISS) is not to
> slit your own throat. Things should be as simple as possible - AND NO
> SIMPLER.
Alright - so we're arguing about where the point of "too simple" is.
Oh, and if I got this correctly, the thing from Einstein was about something
different. It was about educating the public about science's discoveries. If
you're getting too complex in this case, the audience won't understand. If
you're getting too simple, you're losing correctness.
This can't be directly translated to computer systems. In computer systems,
simplicity is about flexibility, robustness and sometimes avoiding
duplication of effort and code. Correctness actually increases with
simplicity, because it is easier to correctly implement a simple system than
a complex, up to a certain threshold where the system will simply stop doing
what you want it to do.
> In this instance, in order for the clients to have a simple
> interface, it is absolutely vital that we move the complexity of e.g.
> splitfile handling into the node. Because this involves significant
> resource allocation issues, it is necessary to have an actual download
> manager in the node.
If we want to do splitfile handling inside the node, then yes. There still are
a few questions remaining, though, like "should the node at request write the
files directly to the disk?" or "should the downloads be aborted when the
client disconnects?"
> > (Note how many of the internet technologies that enjoy real long-term
> > success, like IP, TCP, UDP, SMTP, NNTP, rfc822-messages, HTTP... did just
> > that.)
>
> Yes, and they are built on how many layers? How many applications use IP
> directly?
Yes, they are built on layers. Which makes it possible for every single layer
of the cake to be simple, flexible and robust. This model of layers has been
hugely successful over the last few decades, so I'm wondering why you are
against using it in Freenet.
> We must have a high level FCPv2 API!
I'd really like to know where you got the idea from that I'm against a
high-level FCPv2....
> > > > - The external daemon can be written in a language other than Java
> > > > (Whether that's an advantage may depend on your POV. IMO, it's a big
> > > > one.)
> > >
> > > Not if it is bundled with Fred it can't. The high level daemon must be
> > > maintained by the project itself, meaning it has to be in java.
> >
> > Why? Have you sworn a holy oath to Java or something?
>
> Convince Ian. Then convince me. Then we can talk about it.
This question was meant seriously (well, except for the "holy oath" part). Why
can't you bundle any programs not written in Java?
> > > And,
> > > without intending to start a language flamewar here, you do know that
> > > java can be compiled, right?
> >
> > I have basic knowledge of Java. It's mandatory at the uni. Still, I don't
> > see what this has to do with anything
>
> GCJ 4.1 will in all likelihood be able to compile Fred 0.7, that's the
> point I am making.
That still hasn't much to do with this discussion.
> > > > - Third party implementations of the node itself can be done more
> > > > easily. (The programmers won't have to rewrite all of the most basic
> > > > user interface stuff from scratch) (I think this is a pretty
> > > > important issue in long run.)
> > >
> > > Why does it matter? As long as there are clear internal interfaces it
> > > is quite possible to plug in an alternative node implementation. And
> > > given the SUBSTANTIAL effort involved in cloning either the node or the
> > > high level code, this is not a big deal.
> >
> > Cloning a Unix-like operating system takes substantial effort, too. Yet,
> > people have done it, linux is proof of that. The availability of
> > userspace tools which are more or less independent from the kernel
> > they're running on is an immeasurable help in such a case. Had the GNU
> > tools been tightly integrated into the kernel they were designed for back
> > in 1991, Linux would not have come very far.
>
> Indeed, which is why it is essential that there is a simple interface
> for the third party clients to use. This is called "high level FCPv2".
> It would be quite possible to make a low level FCPv2 interface between
> the node and the high level functionality, but it would take significant
> effort for no immediately apparent gain, and it would complicate the
> code enormously. Your whole case is based on the idea that the low level
> FCP2 will be usable by most clients.
No it isn't.
> It won't. Freenet 0.7 has a 32kB
> block size. This means that the metadata for a large splitfile may not
> fit into a single block. This means we have to split the metadata
> itself. The client stuff is really quite complex, and it is already
> agreed that the fred 0.5 FEC API is a monster. We really *have* to
> implement splitfiles in the node.
Well, maybe in the node, maybe somewhere else, but certainly not in the client
code.
> > > By definition all user friendly code must be left out of the node!
> >
> > Not quite. What I am saying is that the user interface-related code in
> > the node should be as little as possible but as much as necessary. Some
> > stuff just needs to be done at that at level - resource restrictions
> > (i.e. limits for bandwidth and diskspace usage), peer management in the
> > darknet case, keeping the node up-to-date, stuff like this. This is all
> > just
> > node-administration/maintainence stuff, i.e. none of the "essential"
> > functionality. (Essential functionality being the sort of functionality
> > that can be the reason for people to run a node in the first place, i.e.
> > things like freesite-surfing, using some sort of message-board or mail
> > software via freenet or something like that.)
> >
> > This might be done via a stripped down version of fproxy or, better yet,
> > via some external utility that simply edits the config file(s) and sends
> > the node some sort of signal when it's done, to make it reread it's
> > configuration.
>
> Wonderful, so Fred ends up taking up THREE ports for HTTP.
Three? I'm counting two, one if fproxy isn't running. And even if, ports don't
cost money and only very little memory.
Besides, I mentioned the "stripped down fproxy" only as one of two
possibilities. There might be many others.
> And two for FCP.
?
> This is all pointless extra complexity. It will have cost in lines
> of code, cost in maintenance, and cost in run-time RAM footprint. That
> cost must be justified.
>
> And if we accept that high level FCP must be implemented, then the
> majority of the overhead of fproxy is in fact the things you talk about
> above - the configuration servlet, the various status pages and so on.
> Actually fetching pages is pretty easy as we have to provide that
> functionality for FCP *anyway*.
There are still a few things that need to in fproxy: Date Based Redirects,
reading the manifest of a site doing redirection based on this, code for
browsing past issues of DBR-sites (which, btw, is not good at al in the
current code) and the anonymity filter.
> > > > - Fred can finally become a lean and mean unbloated routing machine.
> > >
> > > Which is totally meaningless if it requires a bloated user daemon for
> > > it to do anything useful.
> >
> > Do you require the Mosaic browser to do anything useful at all with HTTP,
> > or even TCP/IP? You might use it, but there are plenty of other useful
> > options.
>
> Who uses IP directly? Almost all protocols are layered, and this model
> works well on the internet. Yes, it uses a bit more resources, but it
> makes life easy for the programmers, and it means that the different
> implementations are more interoperable.
... This is exactly what I'm talking about?
> Both of those are far more important than saving some RAM for the occasional
> people who *only* use IP directly.
Saving RAM is only a secondary objective behind my proposal anyway. One of the
more important ones is correctness. Fproxy and Frost are on the same logical
layer, which is higher than that of the node, so I don't see why fproxy is
integrated in the node and Frost isn't.
Besides, you can use IP directly and this is not unimportant. It might only be
used by very few people, but those are often/sometimes the people who drive
technologies forward.
[...]
[... long discussion about external agent for splitfile handling...]
> > Disadvantages:
> > - This would mean we really need to have a low-level FCP and a high-level
> > FCP.
>
> Splitting it up as you propose absolutely requires low- and high- level
> FCP.
That's what I said.
> This is not IMHO on the critical path for 0.7. It does not make life easier
> for third party devs.
How does this affect 3rd party devs?
> It does mean we have to maintain more code, more complexity, and more
> interfaces.
Not necessarily. It just has to be organized differently.
> And it does not make life much easier for potential developers.
Oh, I think it might. (I suppose you're talking about fred developers here,
not 3rd party developers.)
It might mean that new programmers will have to deal with a smaller, largely
independent chunks of code.
[...]
> > Guido
> >
> > [1] mldonkey is interesting for yet another reason: They're using a
> > language that's even less popular than Java, and they're still doing
> > better than Freenet.
>
> "Doing better than" defined how? They have more users? I would suggest
> that is due to performance more than anything else. They have more
> developers? That is down to all the CS students who studied ML finding
> they can do something useful with it (building warez-sharing clients).
Do you think there are more CS students using ML than there are CS students
using Java? Also, while file sharing might be a more popular thing, Freenet
et al. still is pretty hot stuff.
> But it is also due to the communications issues I have discussed, which
> we must improve on in 0.7.
Yes.
> These would not be fixed by adding more complexity, more code and more bloat
> for no clear purpose.
Well, I think what I am proposing is actually reducing complexity, or at least
redistributing it in a way that makes it more manageable.
Guido
PS: Could other people besides me and Matthew Toseland please speak out on the
subject? It would be nice to know some more opinions...