G'day,

New to the list, only just started using polipo, and was wondering
what the future of it is. I think it maybe could fill an important
niche that is begging for a solution.

With dnsmasq there is a very small, simple, powerful, and reliable
combined dns server, dns cache, dhcp server, and tftp server. It's got
everything you need to serve a small home/classroom/school sized
network that can include thin clients. It's very easy to configure
(dns just serves up the /etc/hosts file with auto-dns support for dhcp
clients), and it's dhcp server can easily be configured to serve
anything like auto-proxy config or different boot options for
different thin clients. It's small enough to run on a OpenWRT router
or QNAP NAS.

The two other things a small network like this really needs is a
light-ldap server that just serves up /etc/passwd (a totally different
project), and a light combined http server/proxy. They need to be
small and efficient with enough grunt to support at least 100 clients
when running on something like a router or NAS. They need to be
powerful enough to support most commonly used functionality (CGI,
rewrite, etc), but familiar and simple to configure. They absolutely
must be reliable and secure enough for the home/classroom/school
environment. I believe that they should be written in C (widely used
and fast) using an event-loop design (threads suck).

I think right now polipo is very close to filling the proxy part of
this, but it is missing a few important things. It is nice and small,
uses an efficient/scalable async design, and seems to have the most
advanced http 1.1 support of anything out there. The bits it is
missing are;

1) Reliability. I'm running the Debian package version 1.0.4.1-1.1 and
it cannot stay up more than about 5mins in use before it segfaults.
Bugs like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655851
effectively render it useless for real use, and this is not the only
thing that kills it, as it seems to die randomly for all sorts of
requests. It may be that these kinds of problems have been fixed in
the git head, in which case it needs a new release and/or some Debian
maintainer love.

2) Logging. See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655852. The logging
output from polipo is random and virtually useless, with things like
"Couldn't parse ETag." not even mentioning the problematic url. There
is no logging of requests and/or fetches, making it impossible to
analyse proxy traffic/performance.

3) Http serving. The http server part of polipo is very basic, making
it unsuitable for anything more than serving static files. For it to
be useful as the webserver on a network router/NAS it needs to support
at least CGI for driving the configuration interface.

4) ftp fetching/serving?. Not that important, but since nearly all
browsers "simple" proxy config

5) Support. The Debian bug tracker
http://bugs.debian.org/cgi-bin/pkgreport.cgi?package=polipo has many
open bugs, some of them have been open for years. The last "release"
of polipo seems to have been years ago. There does seem to have been
some recent commit activity on the master branch, but it looks like it
needs someone to pull together a new release to fix the serious bugs,
and some more debian maintainer support with more active forwarding of
debian bugs to polipo devs.

I think that polipo would be more widely used and thus attract more
support if the reliability issue was fixed. I suspect many people will
have tried the Debian package as an alternative to squid and then
abandoned it 15mins later after it segfaulted a couple of times. The
reputation damage this has done will probably be hard to overcome.

Other things I've looked at and rejected are;

* squid: too old, too big, crappy http 1.1 support, and no caching of
partial fetches. These features are essential in a modern caching
proxy. The caching performance of squid has declined to be almost
useless as the size of downloads and use of ranged fetching/resuming
has increased. You have to really tune squid aggressively in a small
network to prevent it from *increasing* download traffic. It's lack of
http 1.1 pipelining means it adds more latency than it saves. Studies
show that the biggest improvements a proxy can make are efficient
handling of disconnections/resumes/retries to save bandwidth and
pipelining to save latency.

* varnish: Designed as a http accelerator so not as good as polipo as
a caching proxy. I've read
http://www.varnish-cache.org/trac/wiki/ArchitectNotes and agree with
the basic premise. However, the use of VM for storage meant it doesn't
do persistent caching (being added, currently experimental), and
threading doesn't scale as well. IMHO it would have been better to use
the filesystem for hierarchical object storage, relying on kernel
buffering to cache it in memory, and use an event-loop design. The
lack of persistent caching makes it useless for the above application
because you don't want a proxy restart to throw away all those cached
big windows/debian/whatever package downloads that all the clients
need.

* thttpd: good small http sever with neat codebase, but no decent
caching http proxy functionality and lacking the flexibility and
support of lighttpd.

Other options I'm considering;

The best http server solution for what I want is currently lighttpd,
but it doesn't have any sort of decent caching http proxy support.
Maybe the polipo codebase could be canabalized to write a decent
caching proxy module for it. Alternatively maybe lighttpd could be
canibalized to extend polipo's http serving.

Write something totally new, using existing well supported libs as
much as possible and canabalizing code from things like polipo and
lighttpd. Libs like http://libev.schmorp.de/,
http://curl.haxx.se/libcurl/ (too bloated?), http://c-ares.haxx.se/
etc. Alternatively maybe the polipo code could be simplified to
leverage off these libs (and thus get their bug-fixes for free). Maybe
also leverage off varnish's point that OS's do efficient disk/memory
paging and simplify the memory management bits to just serve directly
from disk files using sendfile. Or maybe lighttpd could be simplifed
to use these libs and merged with polipo.

I'm seriously thinking of dedicating my Google 20% time (when I go
back after my paternal leave) to this... the lack of decent caching
proxies is encouraging web development to ignore caching, and an
uncacheable web will not scale in the long term. If every router sold
included a mini caching proxy (optionally using a usb drive for
storage) things might change... imagine all the windows update and
steam downloads from households/classrooms with more than one PC that
caching would save.

I'm open to any sort of suggestions, advice, or assistance. Feel free
to call me crazy, or tell me to f*ck off, or whatever :-)

-- 
Donovan Baarda <a...@minkirri.apana.org.au>

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Polipo-users mailing list
Polipo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/polipo-users

Reply via email to