Ian,
thanks for the feedback.
Before I get to the details, I'd like to clarify that we have (at least)
*two* issues here, only one of which being in ISSUE-1.
1) ISSUE-1 is about whether it's ok to use POST for this,
2) The other issue is about whether this feature is needed at all, how
to expose it in UAs and so on.
For now, I'll just reply to what is in ISSUE-1.
On Sat, 27 Oct 2007, Julian Reschke wrote:
We're long past that. It's trivial for a page to trigger a POST
without the user knowing.
I consider that a bug in User Agents.
This is not a widely held opinion.
Well, it's what RFC2616 says. I would argue that if the HTML WG thinks
there is a problem in what RFC2616 has to say about how to use unsafe
methods, it should bring this to the attention of the newly formed HTTP WG.
Please do not add more of this.
While I understand that you believe that silent POSTs are somehow harmful,
I believe that on the balance the proposed feature is a net user benefit,
and that this instance of automatic POST is no more dangerous than other
automatic POSTs being proposed (e.g. in the cross-site XMLHttpRequest
specification being developed at the W3C in the WebAPI WG). Indeed, in
this instance I would argue the danger is significantly reduced, since no
POST data is sent with the request.
Agreed. But just because it's less bad than some other use cases,
doesn't make it a good design to use it, if there's an alternative
approach which works without POST at all.
[Quoting HTTP:]
"9.1.1 Safe Methods
Implementors should be aware that the software represents the user in
their interactions over the Internet, and should be careful to allow the
user to be aware of any actions they might take which may have an
unexpected significance to themselves or others.
I agree that ping="" should be made visible to users. Indeed, the spec
explicitly makes that a SHOULD, going far outside its usual boundary of
not specifying user interface requirements.
Currently, the standard way in HTML UAs to distinguish safe (GET) from
unsafe (POST) is a link vs a button.
So yes, if all "audited" links turn into buttons, that concern would be
dealt with. Somehow however I feel this is not what people have in mind.
In particular, the convention has been established that the GET and HEAD
methods SHOULD NOT have the significance of taking an action other than
retrieval. These methods ought to be considered "safe". This allows user
agents to represent other methods, such as POST, PUT and DELETE, in a
special way, so that the user is made aware of the fact that a possibly
unsafe action is being requested.
Indeed; and in fact part of the goal here is to make the possibly unsafe
action (user tracking and conversion tracking, with the potential effect
on future performance or the potential material financial effect) be one
that can be explicitly brought to the user's attention if he so desires,
something that is not possible in legacy tracking techniques. (For
example, using redirects make the whole process very opaque.)
Following that, the spec should make any UA that makes an audited link
indistinguishable from a regular link non-conforming.
Naturally, it is not possible to ensure that the server does not
generate side-effects as a result of performing a GET request; in fact,
some dynamic resources consider that a feature. The important
distinction here is that the user did not request the side-effects, so
therefore cannot be held accountable for them."
I think it's clear that this paragraph is trying to convey that having
side-effects with a GET request is a poor state of affairs, which I agree
I disagree. A server can implement any side-effects it wants for GET,
the important part is that it can't complain if people follow these
links (pre-fetching, spidering, etc).
with, and which is one of the other things that the ping="" proposal
attempts to address -- legacy tracking mechanisms typically abuse GET in
an unsafe way, which causes a number of problems for the server (mostly
around unpredictable caching effects like pre-caching, session history
navigation, and transparent cache proxies), which can then affect the user
in undesirable ways (e.g. if tracking is used to determine preference
towards one link or another, and the user's browser precaches one more
often than the other, then the server will act as if the user had
indicated a preference where in fact he had not).
That's an argument in favor of separating the link target from the ping
target, not an argument to use an unsafe method.
As soon as the ping target lives in an attribute that is not
automatically followed by at all, that problem would go away.
In conclusion I think HTTP supports the design of the feature as is.
I still disagree.
Could you please clarify why the ping attribute wouldn't work equally
well with a safe method?
No, I'm not suggesting that.
In this scenario, there are three parties involved:
A: the user
B: the visited site
C: the site being linked to
If the link from B to C needs to be audited for the purpose of paying
ads, money will be exchanged between the owners of B and C. A is not
involved in that transaction.
How the contract between B and C is implemented should be outside the
scope of the stuff sent to A.
While that would be nice in practice, it is not the case today, and it is
not clear that it ever could be the case. We have to work within the
limitations we are presented with, and in this case it seems that the
ping="" proposal is the closest one can get to solving the problems seen
by both the users and the authors.
Again, this is not about safe vs unsafe, but about ping vs href.
Again:
"The important distinction here is that the user did not request the
side-effects, so therefore cannot be held accountable for them."
When A follows the link, he is *not* accountable for the cost of the ad,
being transferred from C and B.
The HTTP specification just says that a user can never be held accountable
for GET side-effects. It says nothing about the user being held
accountable for anything else, including automatic POST requests.
If the UA decides to invoke an unsafe method *without* the user's
consent, that *may* be a problem. With a safe method, it's guaranteed
not to be. Thus, there's a clear advantage in using a safe method.
I personally think that the attribute in itself is a Very Bad Idea,
but if it stays in, by all means do not use POST for it.
We can't use GET... what other method would be appropriate?
You shouldn't do it at all. If you insist in doing it, use a safe
method. Everything else is in conflict with RFC2616. And yes, you can
use GET.
I think it has been explained why using GET is undesirable.
I haven't seen an explanation that was convincing to me.
BTW: I just checked, and the Google Ads on www.google.de work with
GET and a Redirect (302). Only safe methods from the user's point of
view. Are you saying this is a problem?
Yes.
Interesting -- good that I asked. It seems we'll not be able to make
progress on this unless we clarify this issue first.
The problems from the server-side are that it is unreliable (due to
pre-caching, transparent caching, and session history navigation), it
obfuscates the user experience (the actual target URL is hidden), it is
slow (there's at least one extra HTTP round-trip, possibly with an
additional DNS hit as well), and it uses an idempotent method for a
distinctly non-idempotent action.
Again, this is about ping vs href, not about safe vs unsafe.
Please keep these topics separate.
On Sat, 27 Oct 2007, Geoffrey Sneddon wrote:
Having read this entire thread, I don't see why anything is actually
wrong. In this context the difference between GET and POST is negligible
� both can technically be used to do what is desired, though using GET
would be breaking RFC 2616 (or rather, breaking a SHOULD NOT). If we
disallow it to be used on external servers, people will just continue to
use Javascript to achieve this, which CANNOT be disabled by a UA without
breaking behaviour that sites rely upon.
Indeed.
Incorrect - a misunderstanding about what it means for method to be "safe".
On Sat, 27 Oct 2007, Julian Reschke wrote:
No, sorry, that's incorrect.
If you want to do something silently (without the user's consent), you
simply have to use a safe method.
We don't want to do it without the user's consent. The whole point of
making ping="" explicit is to allow the user to have the final decision.
Once in the configuration, or on each navigation event? Per site?
And if you consider the desired effect non-safe (which I don't), then
the consequence is that you just can't do it.
We can't stop tracking from occurring. We can, however, make it better for
users. I think we have a responsibility to do so.
No disagreement about the goal here.
On Sun, 28 Oct 2007, Henri Sivonen wrote:
The ping attribute does have the same security risks that cross-domain
XHR POST with empty entity body would have if the access-control
Method-Check weren't there. That is, if a POST handler has been
programmed to trigger stuff on mere POST without a body, a malicious
ping attribute could be used to trigger that action.
(As could an empty scripted <form>.)
And if you consider the desired effect non-safe (which I don't), then
the consequence is that you just can't do it.
It is about idempotent vs. non-idempotent and side effects.
If you are counting ad impressions, clearly you don't want to
a) count Google Web Accelerator (or similar) prefetches
b) leave impressions uncounted due to an intermediate cache satisfying
the request.
Indeed.
Again, this is about ping vs href, not about safe vs unsafe.
On Sun, 28 Oct 2007, Julian Reschke wrote:
So would you ban XHR POST and script-initiated form submissions?
I would want the XHR spec to clarify that it's not OK to initiate unsafe
methods without the user's consent. I would also deprecate
script-initiated form submissions from something like onload().
Please bring this up with the Web API working group.
I did, and I was ignored.
and even if you use "ping", you still could do it with a safe method
(HEAD/Cache-Control:no-cache).
Unfortunately HEAD is typically implemented in servers (e.g. Apache)
without running the relevant CGI scripts, which makes them hard to
implement at all. I also disagree that this would be a correct application
of the HEAD method's semantics.
HEAD and GET have the same semantics - the only difference being that
for HEAD the response body is not transmitted. Servers that implement
HEAD differently technically are not compliant.
For link tracking, my understanding was that there is no response body
expected. Thus, for a server that implements a "link auditing resource",
both GET and HEAD actually will do the same -- invoke some kind of
tracking (minimally dumping the URI into a log file), and just return
with an HTTP 2xx status and no body.
Thus, I would expect that GET and HEAD can be used interchangeably.
On Sun, 28 Oct 2007, Henri Sivonen wrote:
That might work and could be a tad safer. It isn't in any way
theoretically pure from the RFC 2616 point of view, though, to make HEAD
and GET have different semantics beyond the response body presence.
Indeed.
Yes, and nobody suggested that.
On Sun, 28 Oct 2007, Julian Reschke wrote:
That might work and could be a tad safer. It isn't in any way
theoretically pure from the RFC 2616 point of view, though, to make
HEAD and GET have different semantics beyond the response body
presence.
I wasn't suggesting that.
You suggested that we should use HEAD for request tracking, which indeed
makes GET and HEAD have different semantics in a way that does not match
(at least my interpretation of) RFC2616.
Again, no I didn't. I suggested HEAD, but that doesn't mean I was trying
to have different semantics for GET -- please assume for a moment that I
have *some* expertise both with HTTP servers and the protocol definition.
Fundamentally, the ping being sent is not a user request of any kind at
all, it is a third-party request for information about what the user is
doing. This is not a transaction between a server and a client in the
sense that HTTP usually offers, it is a one-way message from the client
to a third party. So we are just using HTTP as a transport method of
convenience since it is there. This is probably reasonable in the
circumstances, but I don't yet understand how it matters which method we
decide to turn into a one-way message in the absence of a mechanism for
such.
Hopefully the points put forward earlier in this e-mail cover this in
sufficient detail.
Nope.
On Sun, 28 Oct 2007, Charles McCathieNevile wrote:
You mean POST, right? As far as I am concerned, the HEAD request
suggestion is the least departure from normal HTTP (since there is
already llttle expectation that HEAD will pass a response to the user),
but I still don't see
(I'm not sure if you meant to stop here or not.) HEAD seems even less
desirable that head from the point of view of HTTP -- it's only supposed
to get the HTTP headers of the resource, without doing anything at all!
That's incorrect. The semantics of HEAD and GET are *exactly* the same,
except for the response body not being transmitted.
So if GET "does" something, "HEAD" will need to do so as well.
On Mon, 29 Oct 2007, Julian Reschke wrote:
So the scenario is:
1) User A browses web site B.
2) A follows an HTML link to site C.
3) The owner of B wants to be informed of that event in order to charge
the owner of C for an online ad linking to C.
That's one scenario; there are other, possibly more important ones, for
example: tracking results in search, so that more popular entries can have
subsequent rankings boosted, or usability studies tracking which links
users prefer on a site.
Understood. I didn't mention them here, because it seems you were mainly
concerned about the ad issue. In *this* case, there's even less reason
to use an unsafe method.
My position is that although money may be exchanged between B and C due
to the notification (ping), this is a transaction between B and C, and A
MUST NOT be involved. In other words, following a hyperlink MUST stay
"safe" in the RFC2616 sense.
The hyperlink does stay safe; however, the ping is not idempotent, and
should not use an idempotent method.
I still do not understand why it needs to be unsafe. You seem to be
concerned about the ping being executed when the user *didn't* navigate
-- but what does this have to do with safe vs unsafe?
Just state in the spec that the GET/HEAD operation on the ping target
MUST happen at most one time per user-initiated navigation to the href'd
URI.
(emphasis on the last paragraph!)
The last paragraph actually doesn't apply -- it gives reasons not to use
GET, or to be careful with GET, and doesn't actually give advice on other
methods.
I do not understand how it "does not" apply, and I also disagree that it
gives reasons not use GET. On the contrary, it explicitly allows servers
to do something with GET that has side-effects -- as long as the user is
not made accountable for it. Exactly this case, it seems to me.
On Sun, 28 Oct 2007, Kornel Lesinski wrote:
OTOH ping is all about creating side-effects, and only non-safe methods
should cause them.
Indeed.
And again: it depends on who is made accountable for the side effect.
The user following the link shouldn't be.
Or are we suddenly talking about a vehicle for micro payments here?
The root cause why using POST is unsafe is CSRF, and there should be a
separate effort dealing with that (covering all cases, not only ping).
I agree with all of the above.
I don't follow. Just because there are other problems with POST, there's
no reason to invent a whole new feature using POST when we don't have to.
On Sun, 28 Oct 2007, Julian Reschke wrote:
Following a link should not cause side effects the user (A) can be made
accountable for.
Agreed. And nothing says that the user can be made accountable for POSTs
made for ping="" attributes. Just that the user _can't_ ever be made
accountable for side-effects made in response to GETs.
Wait a second.
So you are willing to state that ping-initiated HTTP method invocations
must not cause an action the user can be made accountable for. I agree
with that.
But then, why don't you use a safe method in the first place?
And, fortunately this is not the case here. The only party for which the
side effect is relevant is the site owner (B), and potentially the party
(C) the link points to.
And sometimes the user, e.g. when the tracking is used to improve search
results in future searches, or to personalise a site to the user's habbits
by promoting areas of a site that the user uses the most.
All of these are cases where HTTP experts will tell you that GET or HEAD
is just fine.
I hope this clarifies the issues surrounding the post="" attribute. I
understand that not everybody agrees on this, but when there are requests
that are mutually exclusive, we can't make everyone happy. I hope that the
explanations above address most of the concerns that were raised, but I
understand that they might not. I would ask anyone who still disagrees
with what the spec says to please consider the above explanations
carefully; simply raising the same issue that has already been raised,
with no new information or reasoning, is unlikely to result in a different
reply. I try to base the design of the spec on the balance of all input,
not on the volume of input.
I have to say it didn't help me.
I've seen no evidence why ping has to use an unsafe method at all.
Best regards, Julian