G'day, New to the list, only just started using polipo, and was wondering what the future of it is. I think it maybe could fill an important niche that is begging for a solution.
With dnsmasq there is a very small, simple, powerful, and reliable combined dns server, dns cache, dhcp server, and tftp server. It's got everything you need to serve a small home/classroom/school sized network that can include thin clients. It's very easy to configure (dns just serves up the /etc/hosts file with auto-dns support for dhcp clients), and it's dhcp server can easily be configured to serve anything like auto-proxy config or different boot options for different thin clients. It's small enough to run on a OpenWRT router or QNAP NAS. The two other things a small network like this really needs is a light-ldap server that just serves up /etc/passwd (a totally different project), and a light combined http server/proxy. They need to be small and efficient with enough grunt to support at least 100 clients when running on something like a router or NAS. They need to be powerful enough to support most commonly used functionality (CGI, rewrite, etc), but familiar and simple to configure. They absolutely must be reliable and secure enough for the home/classroom/school environment. I believe that they should be written in C (widely used and fast) using an event-loop design (threads suck). I think right now polipo is very close to filling the proxy part of this, but it is missing a few important things. It is nice and small, uses an efficient/scalable async design, and seems to have the most advanced http 1.1 support of anything out there. The bits it is missing are; 1) Reliability. I'm running the Debian package version 1.0.4.1-1.1 and it cannot stay up more than about 5mins in use before it segfaults. Bugs like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655851 effectively render it useless for real use, and this is not the only thing that kills it, as it seems to die randomly for all sorts of requests. It may be that these kinds of problems have been fixed in the git head, in which case it needs a new release and/or some Debian maintainer love. 2) Logging. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655852. The logging output from polipo is random and virtually useless, with things like "Couldn't parse ETag." not even mentioning the problematic url. There is no logging of requests and/or fetches, making it impossible to analyse proxy traffic/performance. 3) Http serving. The http server part of polipo is very basic, making it unsuitable for anything more than serving static files. For it to be useful as the webserver on a network router/NAS it needs to support at least CGI for driving the configuration interface. 4) ftp fetching/serving?. Not that important, but since nearly all browsers "simple" proxy config 5) Support. The Debian bug tracker http://bugs.debian.org/cgi-bin/pkgreport.cgi?package=polipo has many open bugs, some of them have been open for years. The last "release" of polipo seems to have been years ago. There does seem to have been some recent commit activity on the master branch, but it looks like it needs someone to pull together a new release to fix the serious bugs, and some more debian maintainer support with more active forwarding of debian bugs to polipo devs. I think that polipo would be more widely used and thus attract more support if the reliability issue was fixed. I suspect many people will have tried the Debian package as an alternative to squid and then abandoned it 15mins later after it segfaulted a couple of times. The reputation damage this has done will probably be hard to overcome. Other things I've looked at and rejected are; * squid: too old, too big, crappy http 1.1 support, and no caching of partial fetches. These features are essential in a modern caching proxy. The caching performance of squid has declined to be almost useless as the size of downloads and use of ranged fetching/resuming has increased. You have to really tune squid aggressively in a small network to prevent it from *increasing* download traffic. It's lack of http 1.1 pipelining means it adds more latency than it saves. Studies show that the biggest improvements a proxy can make are efficient handling of disconnections/resumes/retries to save bandwidth and pipelining to save latency. * varnish: Designed as a http accelerator so not as good as polipo as a caching proxy. I've read http://www.varnish-cache.org/trac/wiki/ArchitectNotes and agree with the basic premise. However, the use of VM for storage meant it doesn't do persistent caching (being added, currently experimental), and threading doesn't scale as well. IMHO it would have been better to use the filesystem for hierarchical object storage, relying on kernel buffering to cache it in memory, and use an event-loop design. The lack of persistent caching makes it useless for the above application because you don't want a proxy restart to throw away all those cached big windows/debian/whatever package downloads that all the clients need. * thttpd: good small http sever with neat codebase, but no decent caching http proxy functionality and lacking the flexibility and support of lighttpd. Other options I'm considering; The best http server solution for what I want is currently lighttpd, but it doesn't have any sort of decent caching http proxy support. Maybe the polipo codebase could be canabalized to write a decent caching proxy module for it. Alternatively maybe lighttpd could be canibalized to extend polipo's http serving. Write something totally new, using existing well supported libs as much as possible and canabalizing code from things like polipo and lighttpd. Libs like http://libev.schmorp.de/, http://curl.haxx.se/libcurl/ (too bloated?), http://c-ares.haxx.se/ etc. Alternatively maybe the polipo code could be simplified to leverage off these libs (and thus get their bug-fixes for free). Maybe also leverage off varnish's point that OS's do efficient disk/memory paging and simplify the memory management bits to just serve directly from disk files using sendfile. Or maybe lighttpd could be simplifed to use these libs and merged with polipo. I'm seriously thinking of dedicating my Google 20% time (when I go back after my paternal leave) to this... the lack of decent caching proxies is encouraging web development to ignore caching, and an uncacheable web will not scale in the long term. If every router sold included a mini caching proxy (optionally using a usb drive for storage) things might change... imagine all the windows update and steam downloads from households/classrooms with more than one PC that caching would save. I'm open to any sort of suggestions, advice, or assistance. Feel free to call me crazy, or tell me to f*ck off, or whatever :-) -- Donovan Baarda <a...@minkirri.apana.org.au> ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Polipo-users mailing list Polipo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/polipo-users