Re: [squid-users] Is it possible to mark tcp_outgoing_mark (server side) with SAME MARK as incoming packet (client side)?

2014-03-27 Thread Ed W

Hi

So documentation is right but placement of the statement is possibly 
wrong. Its not highlighted right infront. i.e qos_flows applies only 
for packets from server to client(squid) NOT from client to server.


Is it possible to do reverse too? Or atleast have an acl where I can 
check incoming MARK on packet? So then I can make use of 
tcp_outgoing_mark.


I just noticed that there was same discussion done in list previously 
as well (in 2013), here is the link:


http://www.squid-cache.org/mail-archive/squid-users/201303/0421.html


Yes, I'm still really interested to implement this.  I got as far as 
doing some investigation a few weeks back.  It seems *most* of the 
groundwork is there. I think there is space to store the incoming client 
connection mark, there are facilities to set the outgoing upstream mark 
(to an acl value). What is needed is:

- code to connect the two, ie set a default outgoing mark
- some thought on handling connection pipelining and re-use. At present 
squid maintains a pool of connections say to an upstream proxy, these 
now need to be selected not just because they are idle, but also because 
they have the correct connection mark set. This looks do-able, but 
slightly more tricky


Ed W


Re: [squid-users] Sponsor etag/vary support for Squid 3.3

2013-04-03 Thread Ed W

On 02/04/2013 23:38, Alex Rousskov wrote:


I hope this will also be useful to others than just me!

Yes, I believe most of the ETag improvements you want will be generally
useful, including in environments where ETags come from origin servers
(rather than being added by Squid in violation of HTTP rules).


Thanks Alex.  Yes, it seems a small number of frameworks are finally 
using etags correctly, so there is some possibility of real usage in the 
wild!




I still think that the earlier comprehensive design with a new/dedicated
checksum headers would be an overall better solution for your specific
problem, but it would take a lot more time to implement while you should
be able to get quite a bit by simply [ab]using ETags. And since the
changes you need are mostly generally useful, I do not see any big
problems with this simplified approach, at least as the first step.


Sure - I don't disagree, but this will be a great first step to prove 
whether it can work or not.  Also, at least this bit will be standards 
compliant, so we can work on bending the rules outside of that change 
later on.  I think this will be really good groundwork (and benefit squid)


Cheers

Ed W


Re: [squid-users] Sponsor etag/vary support for Squid 3.3

2013-04-02 Thread Ed W

Hi Alex

Sorry, I didn't notice your reply!


I'm picking up an old thread from some time back.  I remain interested
in getting support for etag into squid (and related revalidate support).

My main requirement is that I have two proxies on either side of a
bandwidth limited link (with high cost).  I want the situation that when
a client GETs some object,

A client GETs some object currently in the cache and with ETag, but that
cached object is either stale or being forcefully reloaded by the
client, right?


Yes. Or some second client requests the same object so we need to do a 
freshness check, or client clears their cache, or upstream doesn't 
correctly implement IF-MODIFIED-SINCE, etc, etc


I'm not trying to decrease the incidence of squid asking the upstream 
server if the object is fresh (which could also trigger non idempotent 
changes), however, I will try to reduce the amount of bandwidth used 
over the proxy-to-proxy middle link (which crosses an expensive sat 
connection) by ensuring that etags are set on important resources (eg 
creating one where it doesn't exist, using some hash of the content body)


What I have probably failed to consider properly is a change in headers 
between two otherwise identical responses (ie same bodies), but I guess 
that will become clear later.


Also I think VARY support will either drop out or be required.  I have a 
use in mind which would become dependent on browser version (eg serving 
webp graphics to chrome)





we can convert this to an IF-NONE-MATCH and
trust the etag confirms that the object is unchanged.




Note, I am aware of the limitations of trusting etags. In my setup I
will have control over the proxy on the high speed side of the
connection and we can use various methods on that side to ensure that
the etags are sane. The main goal is to minimise bandwidth across the
intermediate (expensive) link.

Previously we discussed all kinds of complex ideas including
implementing trailers, and custom headers with hash values.  On
reflection I think everything required can be done using only etag
revalidation (and some tweaking of etags, but squid needs know nothing
about that...)

Yes, reload-into-If-None-Match and stale-into-If-None-Match features
sound simple. The latter may even be supported already (will check). If
something outside of Squid provides reliable-enough ETags to all
cachable responses, then the complexities discussed earlier go away.

Please confirm whether my understanding of your updated requirements is
correct.


I believe so.

So, the situation is a downstream client talking to two squid proxies in 
a chain, through to the eventual upstream web server. Between the two 
squid proxies is an expensive internet link (charged by the byte) and so 
we desire to minimise bytes across the link.


Essentially an upstream adaption proxy will used on the fast (ie 
cheap) side of the connection. This will examine all responses before 
they are handed to fast side squid and in this proxy we will beat the 
etag into shape, eg adding an SHA hash if none exists, etc. Obviously I 
have to accept all breakage which occurs if I change the upstream's etag 
- however, I think we have this covered.


My goal is that if an object has the same response body, and it's 
already in squid cache on the slow side of the link, then we freshen 
the resource by going back to the origin server via our pair of squid 
servers, however, we avoid the transfer of the body back across the 
expensive link (between the two squid proxies) if the etag still matches


I hope this will also be useful to others than just me!

Thanks

Ed W



Re: [squid-users] Re: squid qos_flows - copying mark from client side to upstream request?

2013-04-02 Thread Ed W

On 02/04/2013 21:14, Andrew Beverley wrote:


(I will always have another proxy as my upstream).  If so then actually
I need to reset the mark for each request?

I *think* you could just set the mark on the upstream connection for
each request.


I think this is the correct answer (so that it handles persistent 
connections).


I confess to not yet having continued to explore the code, but I guess I 
need recommendations on a good place to insert this - basically I need 
the point where a new request hits the network in the direction of the 
upstream


A leg up would be welcome, I will continue to explore the code in the 
meantime.


Thanks

Ed W


Re: [squid-users] Re: squid qos_flows - copying mark from client side to upstream request?

2013-03-28 Thread Ed W

Hi



So for example I mark clients that have passed a captive portal test
with some mark, I need that mark copying up to requests coming from
squid so that I know they effectively come from a validated client

As Amos says, this is probably the wrong way to do it. If you want to
see an example of how I did it, then check out this page:

http://andybev.com/index.php/PortalShaper

I use iptables to drop (or redirect) all packets that are received from
clients that have not passed the captive portal.


Technically I don't just track pass/fail...

Users have a choice of gateways to use the internet via (each will have 
a cost). Their choice of gateway is marked on packets from their 
machine, we then route through the appropriate gateway based on the 
connection mark (hence why I need it passed upstream through squid)


Also we mark each connection with a unique per user mark so that 
iptables can account for the traffic they consume and bill them. 
Technically this could be done inside squid, but all other traffic is 
accounted in iptables and there is some hairy calculations needed to 
bill differently for different gateways, so I don't want to reproduce 
this in multiple locations


Hence I think I need to implement the reverse of the current code?


Now, as for implementation, I don't have the code in front of me, but I 
think I noticed there is a single code path to open a new upstream 
connection? At present this applies a packet mark based on 
tcp_outgoing_mark.  Is the client connection information available at 
this point, so that I could mark the connection at this point based on 
the client connection mark?


However, I think squid uses persistent connections to upstream? (I will 
always have another proxy as my upstream).  If so then actually I need 
to reset the mark for each request? Where would be the correct location 
to put the marking code in this case, ie I guess where the packet is 
sent to the upstream socket? (I guess I need to be careful about 
pipelining also?)


Thanks for your thoughts

Ed W




[squid-users] squid qos_flows - copying mark from client side to upstream request?

2013-03-26 Thread Ed W
Hi Andy, Sorry to bug you, but I finally got round to trying the 
qos_flows feature and I think my understanding is completely back to front?


What I need is to copy the packet/connection mark from the client 
request, and apply it to the upstream request. So for example I mark 
clients that have passed a captive portal test with some mark, I need 
that mark copying up to requests coming from squid so that I know they 
effectively come from a validated client


Near as I can tell the current qos_flows applies this all backwards, ie 
it assumes that the upstream has some mark on it, and copies this back 
to the client response connection?


How tricky would it be to offer this option in both directions? Does 
anyone else have a use for this kind of feature?


Thanks

Ed W


[squid-users] Sponsor etag/vary support for Squid 3.3

2013-03-20 Thread Ed W

Hi, Alex,

I'm picking up an old thread from some time back.  I remain interested 
in getting support for etag into squid (and related revalidate support).


My main requirement is that I have two proxies on either side of a 
bandwidth limited link (with high cost).  I want the situation that when 
a client GETs some object, we can convert this to an IF-NONE-MATCH and 
trust the etag confirms that the object is unchanged.


Note, I am aware of the limitations of trusting etags. In my setup I 
will have control over the proxy on the high speed side of the 
connection and we can use various methods on that side to ensure that 
the etags are sane. The main goal is to minimise bandwidth across the 
intermediate (expensive) link.


Previously we discussed all kinds of complex ideas including 
implementing trailers, and custom headers with hash values.  On 
reflection I think everything required can be done using only etag 
revalidation (and some tweaking of etags, but squid needs know nothing 
about that...)



If anyone else is interested in such support then please shout. Alex, 
would you mind picking this up with me again with a view to sponsoring 
development?


Thanks

Ed W


Re: [squid-users] Certificate server validation

2013-02-27 Thread Ed W

Hi Alex


Can squid handle a slightly simpler case where we want to restrict
CONNECT access to servers which meet/fail to match a certain SSL cname?
eg I want to block facebook access, but without sslbump, so I allow SSL
proxying, but deny connections to servers with an SSL cname *.facebook.com?


If your blocking decision is based on information coming from the HTTP
CONNECT request alone, then you can block CONNECT requests using regular
http_access rules. For example, you can block CONNECT requests for a
given origin server name or a given IP address. The only caveat I am
aware of is that most browsers will not display Squid's error page in
this case because browsers cannot separate secure server response
context from insecure proxy CONNECT response context (and there have
been attacks based on the lack of context separation before browsers
stopped displaying CONNECT responses).

If your blocking decision is based on information coming from the SSL
server certificate itself, then you have to bump the transaction for
Squid to see that certificate. Without bumping, Squid only sees CONNECT
headers and raw opaque-to-Squid TCP payload bytes. For example, the
SslServerCertValidator feature that Amos recommended (thanks Amos!)
requires bumping the transaction.

SSL server certificate is not available at HTTP CONNECT inspection time.
You can hack a helper script that will connect to the server using the
CONNECT address, receive an SSL certificate, terminate the connection,
and tell Squid what the certificate was, but doing so is bad for
servers, so I would expect that some of them will eventually complain to
your ISP, block your source IPs, or even feed your helper with bogus
info. It will also not work with servers using SNI.


FWIW, we are working on a Peek and Splice feature that allows decisions
based on server SSL certificate without bumping the connection, but we
still have to solve a few difficult problems before I can be certain
that it is doable at all:
 http://www.mail-archive.com/squid-dev@squid-cache.org/msg19574.html


Finally, I think it is technically possible to peek at the certificate
with no intention of bumping the connection (but with the intention of
possibly terminating it). I am not aware of anybody working on this, but
the Peek and Splice feature mentioned above will provide this
functionality as a side effect.

However, please note that without bumping, you still will not be able to
serve an error page to the blocked user because the browser will expect
to communicate with the [secure] origin server, not Squid (the browser
already sent its SSL Hello request).




Apologies for the slow reply - this is REALLY interesting

At the moment I'm playing with ndpi (part of ntop), which does some 
simple parsing of the raw tcp packets to try and dig out the certificate 
cname. I'm fairly sure a determined attacker could get past this, but 
this is probably acceptable for it's main requirement


My situation is that I need to restrict access to certain classes of 
connection. Mostly we have a situation where the bandwidth costs are 
expensive the user themselves is volunteering to be restricted so that 
they don't spend on connections they don't value.  So for example they 
will wish to restrict antivirus scanner updates, windows updates, Skype 
registrations, IOS push, etc.


So users request a specific firewall profile to be applied, but that 
increasingly everything looks like http/https these days... So we desire 
to standardise on one method to police, log and restrict http protocols 
and ideally that would be squid.


So yes, Peek with restrict (and a rather abrupt disconnect) would be 
superb for our purposes.  Will wait and watch!


Cheers

Ed w


Re: [squid-users] Certificate server validation

2013-02-09 Thread Ed W

On 20/01/2013 01:24, Amos Jeffries wrote:

On 19/01/2013 3:37 a.m., vincent viard wrote:

Hello,

I ask you about the feasibility of achieving an validation server
certificates used during session establishment SSL/TLS in HTTPS at the
level of SQUID proxy ?
The idea is not to break the SSL session with a man-in-the-middle (ex.
SSLBump), but to authenticate (and to authorize) the target with a
white or black list of CAs. In other words, realize with Squid, the
first validation of the SSL handshake logically made by the client
browser on the certificate of server.

In advance, thank you and good day.

Vince


Please see http://wiki.squid-cache.org/Features/SslServerCertValidator

This feature is merged and will be in 3.4 series when it is released. 
To use it now you need to build the 3.HEAD Squid sources.




Can squid handle a slightly simpler case where we want to restrict 
CONNECT access to servers which meet/fail to match a certain SSL cname? 
eg I want to block facebook access, but without sslbump, so I allow SSL 
proxying, but deny connections to servers with an SSL cname *.facebook.com?


Thanks

Ed W


Re: [squid-users] Re: squid with pdns, bandwidth control issue

2012-07-02 Thread Ed W

On 02/07/2012 14:12, Muhammad Yousuf Khan wrote:

after limiting my bandwidth using delaypool, things seems OK.
BTW  i thought that Squid would be doing it automatically.


All the research on TCP up until very recently has been how to 
*maximise* the amount of data flowing down a given pipe.  Only recently 
has there been a lot more thought into single streams NOT trying to 
maximise the flow to match the pipe size...


No systems right now really do anything other than try to download as 
fast as possible (arguably latest bittorrent clients with this fancy new 
reinvention of a tcp alike protocol are the only exception I can think 
of).  If you DONT want to max out a given pipe then you need to 
configure this in some way


Note you may have other problems which you might want to tune as well as 
delay pools...


Ed W



Re: [squid-users] Re: squid with pdns, bandwidth control issue

2012-06-29 Thread Ed W

On 29/06/2012 14:12, Muhammad Yousuf Khan wrote:

i have made some test and here is some detail and results
ok i am using two machine

1, Gateway IPcop (linux)
2. Debian lenny (squid)

i am using download manager to download a 50MB file.

IPCOP
---
when i do it VIA IPCOP my download  burst rate up to 270 KB
not ping delay and other can also brows easily.

Squid on Lenny


VIA SQuid (proxy mode) my download reach 365 which is full throughput
and faster then IPCOP but
ping delay reach 4000 which is considered almost near to death.

and no other users can brown and getting time out message on there browser.


i think this shows that issue is with squid box and i don't know
weather i have to tweak the squid or TCP buffer or anything



Run a download using wget from both boxes and observe the download 
speeds and effect on ping.  This might help you figure out if it's an 
operating system configuration setting


The effect is clear though - one of your machines is managing to max out 
the entire inbound connection (which is exactly what TCP is supposed to 
try and do).  The other machine is only partially using the connection 
(I know that feels more desirable, but it's likely an accident and it's 
not how tcp tries to behave)


So your problem seems to be reduced to figuring out why one machine is 
performing optimally and hence hogging the whole internet connection.


Reduce the problem to the basics and debug from there.  Just remember 
that tcp is supposed to learn how to hog the entire connection, 
allocating traffic more evenly is a tricky problem and you might want to 
use the various features in squid delay pools and linux traffic control 
to control this..?


Good luck

Ed W


Re: [squid-users] Outlook 2010 crashing on gzip-encoded proxied internet calendars

2012-06-29 Thread Ed W

On 29/06/2012 14:14, Pim Zandbergen wrote:

Could it be squid is feeding Outlook a gzip encoded cached calendar,
which was previously received by Thunderbird? Would that be a squid bug?


Aha, yes, isn't there only partial support for vary in squid right now?  
You might want to dump the vary headers for the response with the 
different Accept-Encoding headers and compare?


Ed W


[squid-users] Conditional cache_peer based on transparent/non transparent connection?

2012-03-25 Thread Ed W
Hi, I have an upstream cache_peer which requires authentication.  My 
local network squid accepts explicit proxy clients, and does transparent 
redirection on everyone else.


Clearly I can't use the cache_peer for the transparently proxied 
clients, however, how could I use it only for explicit proxy clients? 
(ie those with a proxy set in their browser).  The cache_peer line uses 
login=PASSTHRU so we never get involved with the authentication on the 
local squid


Thoughts on how I could achieve this please (without two squid 
instances)? Squid 3.2.0.16


Thanks

Ed W


Re: [squid-users] Roadmap Squid 3.2

2012-03-05 Thread Ed W
Is Squid-3.2.0.15 the most stable release to be using for deployment 
on the bleeding edge, or is 3.2.0.12 still the safest bet?  In the past 
you have given some guidance as builds have moved into new functionality 
vs bug squashing phases?


Are you imminently about to release 3.2.016?

Does someone have some big picture comments on rock store - benefits, 
any known issues?


Cheers

Ed W


Re: [squid-users] How many proxies to run?

2012-01-30 Thread Ed W

On 13/01/2012 13:29, Eliezer Croitoru wrote:

On 12/01/2012 19:58, Gerson Barreiros wrote:

I have an unique server doing this job. My scenario is most the same
as mentioned above.

I just want to know if i can make this server a Virtual Machine, that
will use shared hard disk / memory / cpu with another VMs.

web proxy on a vm is not the best choice in a mass load environment.
you can use on a vm but in most cases it will mean lower performances.

i have a place with 40Mbps atm line that is using 2 squid servers on a 
vm and it works fine.
another one is an ISP with 4 machines with a total of 800Mbps output 
to the clients.


statistics is one of the musts before getting a conclusion.

Eliezer



I quite like Linux-vservers for my virtualisation solution.  It's 
basically a kind of fancy chroot with kernel enhancements to make the 
separation almost as complete as full virtualisation.  Since it IS 
basically just a chroot, you are still running on bare metal and there 
is no virtualisation overhead as such.


For my requirements it works very nicely and I don't have any needs that 
mean I need a full virtualisation solution (KVM, etc).  The main reasons 
a container solution such as linux-vservers isn't suitable is when you 
need full separation from host OS, eg kernel version is important, or 
where hardware virtualisation is useful (although there are ways to kind 
of virtualise the network card with vservers), also where you need 
features of an enterprise virtualisation solution, so as live 
migration.  On the flip side, the performance is very high with a 
container solution and my machines boot in 1-2 seconds, so it's 
really very easy for me to manage without a full virtualisation solution


Good luck

Ed W


Re: [squid-users] Facebook page very slow to respond

2011-10-20 Thread Ed W
On 20/10/2011 06:11, Wilson Hernandez wrote:

 To tell you the truth I don't know whats the deal: bandwithd or squid
 but, is really getting in my nerve loosing users left and right every
 week I need to come up with a solution before my whole network
 goes down the drain


You need to get a reproducible situation and work from there.  Find a
tame user with the problem and get network timings, etc.  Trace it on
the server, setup direct access vs proxy access for them and compare
performance, etc.

You can use things like Chromes developer mode, or Firefox
Firebug/Tamperdata to see network traffic and timings - that alone would
probably help you a lot (pages can seem sluggish if certain assets such
as javascript or css are slow to load and block page rendering - often
these assets can be advertising things, or other things which might be
handled differently in your proxying situation)

Good luck

Ed W


Re: [squid-users] How to filter response in squid-3.1.x?

2011-10-20 Thread Ed W
On 20/10/2011 09:11, Amos Jeffries wrote:
 On 20/10/11 20:11, Kaiwang Chen wrote:

 So Squid without the adapter will cache one copy of responses in only
 one encoding.

 Yes.

  Will Vary:Accept-Encoding request header enable
 multiply copies?

 No. It tells Squid there are multiple variants with the same URL, and
 to check the Accept-Encoding header against the one stored already
 when deciding if it is a HIT.

Hi, can I just double check your response above.  Whilst Squid support
might not be working (correctly) for Vary caching, my understanding is
that Vary:Accept-Encoding is correct here to effectively cache both
gzip'ed and non gzip'ed versions of the content?

That said, if so, then this probably a good example of where post-cache
response adaption (or code in squid) is useful? 

The problem seems to be that: ecap runs pre-cache, the ecap adaption
gzips the data, the gzip'ed version is cached and now all cache hits
return the gzip version regardless of Accept-Encoding?

I think it's a non problem in that *I think* all modern browsers handle
almost all assets gzip encoded whether they asked for it or not?  In the
past IE (6 and prior) was the odd one out with some bugs, and I have
seen certain ISP proxies somehow mangle the response encoding header
causing gziped content to be treated as plain text by the browser (not
sure what happens, only have non technical customer reports).  I *think*
it's probably safe these days to blindly gzip everything in sight?

Also when I last looked at the code, I think the ecap module a) only
gzips a very small number of content types (ideally it should be
expanded), and b) from memory I don't think it respected Accept-Encoding
and compressed everything regardless?  These are probably incorrect
assertions, didn't recheck the code

Thanks

Ed W

P.S.  If someone wanted to investigate adding gzip to Squid directly
then they would be looking for a hook into the code somewhere that
handled all the client bound response bodies, with access to the
original request to check Accept-Encoding headers. Where might someone
look to add such code?


Re: [squid-users] handing off usernames to parent proxies

2011-10-20 Thread Ed W
On 20/10/2011 18:11, E.S. Rosenberg wrote:
 On the whole I just need the backend to know the username, or what
 'browsing plan' the session is using, sometimes plans are also
 determined based on src IP (ie. certain stations aren't allowed to
 browse no matter who's logged in, or are supposed to only have access
 to a whitelist even when staff are using them), so I think a
 'NAT'-like method is most likely what i need.

Just to highlight a feature that not everyone yet knows about, but in
the 3.2 series there is support for conntrack marking both to copy the
original connection mark to the output and also to mark connections
based on various squid criteria.  Conntrack marks don't affect anything
outside of the network stack they are running on (ie next hop knows
nothing), but they can be used to help integrate a firewall to achieve
various clever effects. 

I'm not sure that they help you that much, so this was more to add an
idea on the off chance it helps...  At a pinch you can use your firewall
to change IP address or TOS marks to communicate conntrack marks outside
of the box, but it's a bit crude...

The other thing is that I believe you can use the auth helpers to set
the upstream auth username to be somewhat different to the logged in
user? So I *believe* you can achieve the effect that you can do some
database lookup on users in group X to get a group name X and pass
that X upstream as the auth user. The point is that you don't need to
use IP as your upstream signaling criteria, you can use the auth user,
but pre-grouped to the service class names that you need.  As an
extension to this basic idea I believe you can use the auth helpers to
derive these usernames from other criteria such as client IP address,
etc. Does this help?

Good luck

Ed W


Re: [squid-users] Add top information to all webpages (like godaddy AD)

2011-10-11 Thread Ed W
So you could be smarter and instead inject some javascript which checks
if you are in a frameset and if not creates one.  This of course has
some subtleties with ajax...

Ed W


On 11/10/2011 12:28, Hasanen AL-Bana wrote:
 I believe yes ! but it will cause lots of troubles with pages like
 facebook  gmail
 you can redirect all requests to a url_rewriter script.
 squid will pass the requested url to the script , then the script must
 generate a page with 2 iFrames , first iFrame will hold the Ad, the
 second iFrame goes bellow the first one and will contain the original
 requested page.
 but think of the problems you will face because squid will add that to
 each request which will break all the page ,hence the script must be
 smart enough to process only root pages like index.php index.html 

 On Tue, Oct 11, 2011 at 2:20 PM, Jorge Bastos mysql.jo...@decimal.pt wrote:
 Howdy,

 I'd like to do something that I don't know if that's possible somehow.
 I have squid configured as transparent, and I'd like to add on everypage 
 that the user visits, information on the top of the pages, like an AD.

 Is this possible?
 For example Godaddy has this on the free hosting they provide.

 Thanks in advanced,
 Jorge Bastos,



Re: [squid-users] Facebook page very slow to respond

2011-10-11 Thread Ed W
On 08/10/2011 20:25, Wilson Hernandez wrote:
 Thanks for replying.

 Well, our cache.log looks ok. No real problems there but, will be
 monitoring it closely to check if there is something unusual.

 As for the DNS, we have local DNS server inside our LAN that is used
 by 95% of the machines. This server uses our provider's servers as
 well as google's:

  forwarders {
 8.8.8.8;
 196.3.81.5;
 196.3.81.132;
 };

 Our users are just driving me crazy with calls regarding facebook: is
 slow, doesn't work, and a lot other complaints...


Occasionally you will find that Google DNS servers get poisoned and
take you to a non local facebook page.  I guess run dig against specific
servers and be sure you are ending up on a server which doesn't have
some massive ping to it?  I spent a while debugging a similar problem
where the BBC home page got suddenly slow on me because I was being
redirected to some german akamai site rather than the UK one...

This is likely to make a difference between snappy and sluggish though,
not dead...

Good luck

Ed W



Re: [squid-users] Tuning for very expensive bandwidth links

2011-04-01 Thread Ed W
Hi


 So the remote (client) side proxy would need an eCAP plugin that would
 modify the initial request to include an ETag.  This would require some
 ability to interrogate what we have in cache and generate/request the
 ETag associated with what we have already - do you have a pointer to any
 API/code that I would need to look at to do this?
 
 I'm unsure sorry. Alex at The Measurement Factory has better info on
 specific details of what the eCAP API can do.

If I wanted to hack on Squid 3.2... Do you have a 60 second overview on
the code points to examine with a view to basically:

a) create an etag and insert the relevant header on any response content
(although, perhaps done only in the case that an etag is not provided by
upstream server)

b) add an etag header to requests (without one) - ie we are looking at
the case that client 2 requests content we have cached, but client 2
doesn't know that, only local squid does.

Just looking for a quick heads up on where to start investigating?


 IIRC we have Dimitry with The Measurement Factory assisting with HTTP
 compliance fixes. I'm sure sponsorship towards a specific fix will be
 welcomed.

How do I get in contact with Dimitry?


 The one public eCAP adapter we have bee notified about happens to be for
 doing gzip. http://code.google.com/p/squid-ecap-gzip/

Hmm.. I did already look this over a bit - very nice and simple API,
shame there aren't a huge bunch of ecap plugins sprung up?

The limitation seems to be that the API is really around mangling
requests/responses, but there isn't obviously a way to interrogate squid
and ask it questions about what it's caching? Even if there were then
you also have a race condition that you might say to upstream that we
have content X in cache, but by the time the response comes back that
content might have been removed..?

Seems that at least parts of this might need to be done internally to squid?

Just to be clear, the point is that few web servers generate useful
etags, and under the condition that bandwidth is the limiting constraint
(plus a hierarchy of proxies), then it might be useful to generate (and
later test) etags based on some consistent hash algorithm?


Thanks

Ed W


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-31 Thread Ed W
On 30/03/2011 19:17, Marcus Kool wrote:
 If your users do not mind, you can block ads and user tracking
 sites of which many produce 1x1 gifs.
 Most ads and tracking codes are not cacheable and may consume a lot.
 This all depends on which sites your users visit of course.

Thanks - got that covered to some extent.  So far I was looking at these
two lists for a simple domain blocking system to catch adverts and
tracking

http://www.mvps.org/winhelp2002/hosts.htm
http://hosts-file.net/

Any other suggestions/comments?

Additionally we will offer the option to do image recompression and
upstream gzip of content (actually probably we will use our own
compressing tunnel across the slow link)

Anyone know of any already written ecap/icap servers I might want to
investigate?

Cheers

Ed W


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-31 Thread Ed W
Hi

 My thought was to investigate having the internet side proxy add etag
 headers to all content based on some quality hash function. Then have
 the (expensive) remote side proxy rewrite the request headers to always
 use If-None-Match?  The idea is that the bandwidth is cheap on internet
 connected side, so it can refresh it's cache of the whole page, generate
 a new hash, but still return a not modified response if the end result
 is the same string of bytes.  How much of that can I implement in Squid
 3.x today..?
 
 3.1.10+ will validate If-None-Match and ETag, but will not add them to
 requests itself.

Thanks - can you expand on what it means to validate in this case?

I think you mean that if the content is cached with a given eTag then
requests for that content will be returned from cache if the request has
an appropriate If-None-Match - is this the case?


 Note, I realise this could lead to some side effects where the action of
 visiting the web page itself causes some other side effect, however, I
 think this is a manageable problem for this requirement?

 Thanks for any pointers to ideas or other products that might help?
 
 ICAP or eCAP would be the way to go here for quick results. Making a
 plugin to do the ETag generation and alterations before sending off.

Understood.

So the remote (client) side proxy would need an eCAP plugin that would
modify the initial request to include an ETag.  This would require some
ability to interrogate what we have in cache and generate/request the
ETag associated with what we have already - do you have a pointer to any
API/code that I would need to look at to do this?

Then on the internet side proxy we would do whatever we need to retrieve
the content, say fetch the asset.  Then our eCap on that side would
generate a consistent ETag using our favourite hash function?

The part I'm unsure how to implement would be examining what's in
squid's cache in order to generate an ETag based on what we have got (ie
for remote side)?


 You could also look at cutting bodies off 304 replies at the Internet
 side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses.

Hmm, yes that would be very sensible.  Apart from via eCAP are there
other ways I might do that?


 NP: if you want to go ahead and alter Squid code adding If-None-Match on
 outbound requests is an open bug. As is proper ETag variant caching
 support.

I don't know if I have the time/ability to hack on squid code? Is there
someone who might be interested on working on this for an affordable fee?

Thanks for the very helpful feedback. Note if there are any existing
ecap/icap modules I should look at then please educate me?  (I'm
currently using Ziproxy and looking at moving the interesting bits to
a Squid ecap module. I have also used Rabbit proxy which is somewhat
similar)

Thanks for your comments

Ed W


[squid-users] Tuning for very expensive bandwidth links

2011-03-30 Thread Ed W
Hi, Just investigating some tuning for squid for use with satellite
links (which are relatively slow + bandwidth can be charged at $10-100/MB)

I'm pondering having a dual proxy configuration with a proxy at both
ends of the satellite link.  A desired goal would be to force serving
from local cache anything which hasn't actually changed (byte for byte)
on the internet side.

My thought was to investigate having the internet side proxy add etag
headers to all content based on some quality hash function. Then have
the (expensive) remote side proxy rewrite the request headers to always
use If-None-Match?  The idea is that the bandwidth is cheap on internet
connected side, so it can refresh it's cache of the whole page, generate
a new hash, but still return a not modified response if the end result
is the same string of bytes.  How much of that can I implement in Squid
3.x today..?

Note, I realise this could lead to some side effects where the action of
visiting the web page itself causes some other side effect, however, I
think this is a manageable problem for this requirement?

Thanks for any pointers to ideas or other products that might help?

Ed W


[squid-users] Support for detecting if-modified using SHA digest or similar?

2010-07-22 Thread Ed W
 Hi, I am plotting a hierarchical cache with a proxy at the client end 
of a slow expensive satellite internet connection, and another on the 
fast cheap internet side (goal is to optimise traffic passing through 
the slow link).  I would specifically like to address the issue that 
many (smaller, dynamic) sites do not properly support if-modified type 
headers and always send the same content each time..


I think the only way this can be solved is if the client end cache 
notices it has a cached version of a resource, adds it's own 
if-modified-sha header stating which content it's got, the upstream 
proxy then may need to fetch the object again, but if the upstream finds 
the content actually is the same then it commutes the response to a 304 
(Something like a dynamic proxy generated e-tag really)


Someone may tell me this is already in an RFC?  If so great.  If not, 
could someone advise how difficult this feature might be to add to Squid 
3.1?  Bonus marks if it doesn't break streaming resources...


Any other ways to achieve the same effect?

Thanks

Ed W


[squid-users] Modifying content passing through proxy?

2005-06-08 Thread Ed W
For various reasons I am interested in writing a proxy for users on very 
low bandwidth connections. 

What ability do I have to modify the content passing through squid?  I'm 
interested in emulating the ability to resize pictures and generally 
mangle the HTML that you find in a proxy like Rabbit:

http://www.khelekore.org/rabbit/

Grateful for any thoughts

Ed W


Re: [squid-users] Modifying content passing through proxy?

2005-06-08 Thread Ed W



I've been looking at similar options, primarily to speed up web browsing
on small screen devices connected via GPRS.
 



I'm looking at normal sized clients: IE/firefox on a laptop/desktop.  
Again over satellite or GPRS.




What ability do I have to modify the content passing through squid?  I'm
interested in emulating the ability to resize pictures and generally
mangle the HTML that you find in a proxy like Rabbit:
http://www.khelekore.org/rabbit/
   



Have you considered using an instance of the transcoding proxy as a
parent proxy for Squid?
 



Do you mean the IBM software or to you mean transcoding proxy as a 
generic term for Rabbit?


Rabbit doesn't quite do what I want, in particular it doesn't easily let 
the user change back and forward to higher quality versions.  I want 
more control over this and am quite prepared to write something.  I also 
have other requirements and will probably need a proxy client on the 
laptop end and also something else at the server end because I implement 
a compressing tunnel using an advanced compression algorithm.


So I really just wondered how much of this functionality I could push 
into Squid and how much needs to be on the external proxy...


If anyone knows of any other good opensource proxy applications 
(commercial or otherwise) that I could use for this process then please 
let me know


Ed W