Re: [whatwg] Feedback on the ping="" attribute (ISSUE-1)

Julian Reschke Fri, 02 Nov 2007 15:10:50 -0800

Ian,

thanks for the feedback.

Before I get to the details, I'd like to clarify that we have (at least)*two* issues here, only one of which being in ISSUE-1.


1) ISSUE-1 is about whether it's ok to use POST for this,

2) The other issue is about whether this feature is needed at all, howto expose it in UAs and so on.


For now, I'll just reply to what is in ISSUE-1.

On Sat, 27 Oct 2007, Julian Reschke wrote:
We're long past that. It's trivial for a page to trigger a POSTwithout the user knowing.
I consider that a bug in User Agents.
This is not a widely held opinion.

Well, it's what RFC2616 says. I would argue that if the HTML WG thinksthere is a problem in what RFC2616 has to say about how to use unsafemethods, it should bring this to the attention of the newly formed HTTP WG.

Please do not add more of this.
While I understand that you believe that silent POSTs are somehow harmful,I believe that on the balance the proposed feature is a net user benefit,and that this instance of automatic POST is no more dangerous than otherautomatic POSTs being proposed (e.g. in the cross-site XMLHttpRequestspecification being developed at the W3C in the WebAPI WG). Indeed, inthis instance I would argue the danger is significantly reduced, since noPOST data is sent with the request.

Agreed. But just because it's less bad than some other use cases,doesn't make it a good design to use it, if there's an alternativeapproach which works without POST at all.

[Quoting HTTP:]
"9.1.1 Safe Methods
Implementors should be aware that the software represents the user intheir interactions over the Internet, and should be careful to allow theuser to be aware of any actions they might take which may have anunexpected significance to themselves or others.
I agree that ping="" should be made visible to users. Indeed, the specexplicitly makes that a SHOULD, going far outside its usual boundary ofnot specifying user interface requirements.

Currently, the standard way in HTML UAs to distinguish safe (GET) fromunsafe (POST) is a link vs a button.

So yes, if all "audited" links turn into buttons, that concern would bedealt with. Somehow however I feel this is not what people have in mind.

In particular, the convention has been established that the GET and HEADmethods SHOULD NOT have the significance of taking an action other thanretrieval. These methods ought to be considered "safe". This allows useragents to represent other methods, such as POST, PUT and DELETE, in aspecial way, so that the user is made aware of the fact that a possiblyunsafe action is being requested.
Indeed; and in fact part of the goal here is to make the possibly unsafeaction (user tracking and conversion tracking, with the potential effecton future performance or the potential material financial effect) be onethat can be explicitly brought to the user's attention if he so desires,something that is not possible in legacy tracking techniques. (Forexample, using redirects make the whole process very opaque.)

Following that, the spec should make any UA that makes an audited linkindistinguishable from a regular link non-conforming.

Naturally, it is not possible to ensure that the server does notgenerate side-effects as a result of performing a GET request; in fact,some dynamic resources consider that a feature. The importantdistinction here is that the user did not request the side-effects, sotherefore cannot be held accountable for them."
I think it's clear that this paragraph is trying to convey that havingside-effects with a GET request is a poor state of affairs, which I agree

I disagree. A server can implement any side-effects it wants for GET,the important part is that it can't complain if people follow theselinks (pre-fetching, spidering, etc).

with, and which is one of the other things that the ping="" proposalattempts to address -- legacy tracking mechanisms typically abuse GET inan unsafe way, which causes a number of problems for the server (mostlyaround unpredictable caching effects like pre-caching, session historynavigation, and transparent cache proxies), which can then affect the userin undesirable ways (e.g. if tracking is used to determine preferencetowards one link or another, and the user's browser precaches one moreoften than the other, then the server will act as if the user hadindicated a preference where in fact he had not).

That's an argument in favor of separating the link target from the pingtarget, not an argument to use an unsafe method.

As soon as the ping target lives in an attribute that is notautomatically followed by at all, that problem would go away.

In conclusion I think HTTP supports the design of the feature as is.


I still disagree.

Could you please clarify why the ping attribute wouldn't work equallywell with a safe method?

No, I'm not suggesting that.

In this scenario, there are three parties involved:

A: the user
B: the visited site
C: the site being linked to
If the link from B to C needs to be audited for the purpose of payingads, money will be exchanged between the owners of B and C. A is notinvolved in that transaction.
How the contract between B and C is implemented should be outside thescope of the stuff sent to A.
While that would be nice in practice, it is not the case today, and it isnot clear that it ever could be the case. We have to work within thelimitations we are presented with, and in this case it seems that theping="" proposal is the closest one can get to solving the problems seenby both the users and the authors.


Again, this is not about safe vs unsafe, but about ping vs href.

Again:

"The important distinction here is that the user did not request the
side-effects, so therefore cannot be held accountable for them."
When A follows the link, he is *not* accountable for the cost of the ad,being transferred from C and B.
The HTTP specification just says that a user can never be held accountablefor GET side-effects. It says nothing about the user being heldaccountable for anything else, including automatic POST requests.

If the UA decides to invoke an unsafe method *without* the user'sconsent, that *may* be a problem. With a safe method, it's guaranteednot to be. Thus, there's a clear advantage in using a safe method.

I personally think that the attribute in itself is a Very Bad Idea,but if it stays in, by all means do not use POST for it.
We can't use GET... what other method would be appropriate?
You shouldn't do it at all. If you insist in doing it, use a safemethod. Everything else is in conflict with RFC2616. And yes, you canuse GET.
I think it has been explained why using GET is undesirable.


I haven't seen an explanation that was convincing to me.

BTW: I just checked, and the Google Ads on www.google.de work withGET and a Redirect (302). Only safe methods from the user's point ofview. Are you saying this is a problem?
Yes.
Interesting -- good that I asked. It seems we'll not be able to makeprogress on this unless we clarify this issue first.
The problems from the server-side are that it is unreliable (due topre-caching, transparent caching, and session history navigation), itobfuscates the user experience (the actual target URL is hidden), it isslow (there's at least one extra HTTP round-trip, possibly with anadditional DNS hit as well), and it uses an idempotent method for adistinctly non-idempotent action.


Again, this is about ping vs href, not about safe vs unsafe.

Please keep these topics separate.

On Sat, 27 Oct 2007, Geoffrey Sneddon wrote:
Having read this entire thread, I don't see why anything is actuallywrong. In this context the difference between GET and POST is negligible� both can technically be used to do what is desired, though using GETwould be breaking RFC 2616 (or rather, breaking a SHOULD NOT). If wedisallow it to be used on external servers, people will just continue touse Javascript to achieve this, which CANNOT be disabled by a UA withoutbreaking behaviour that sites rely upon.
Indeed.


Incorrect - a misunderstanding about what it means for method to be "safe".

On Sat, 27 Oct 2007, Julian Reschke wrote:
No, sorry, that's incorrect.
If you want to do something silently (without the user's consent), yousimply have to use a safe method.
We don't want to do it without the user's consent. The whole point ofmaking ping="" explicit is to allow the user to have the final decision.


Once in the configuration, or on each navigation event? Per site?

And if you consider the desired effect non-safe (which I don't), thenthe consequence is that you just can't do it.
We can't stop tracking from occurring. We can, however, make it better forusers. I think we have a responsibility to do so.


No disagreement about the goal here.

On Sun, 28 Oct 2007, Henri Sivonen wrote:
The ping attribute does have the same security risks that cross-domainXHR POST with empty entity body would have if the access-controlMethod-Check weren't there. That is, if a POST handler has beenprogrammed to trigger stuff on mere POST without a body, a maliciousping attribute could be used to trigger that action.
(As could an empty scripted <form>.)
And if you consider the desired effect non-safe (which I don't), thenthe consequence is that you just can't do it.
It is about idempotent vs. non-idempotent and side effects.

If you are counting ad impressions, clearly you don't want to
 a) count Google Web Accelerator (or similar) prefetches
b) leave impressions uncounted due to an intermediate cache satisfyingthe request.
Indeed.


Again, this is about ping vs href, not about safe vs unsafe.

On Sun, 28 Oct 2007, Julian Reschke wrote:
So would you ban XHR POST and script-initiated form submissions?
I would want the XHR spec to clarify that it's not OK to initiate unsafemethods without the user's consent. I would also deprecatescript-initiated form submissions from something like onload().
Please bring this up with the Web API working group.


I did, and I was ignored.

and even if you use "ping", you still could do it with a safe method(HEAD/Cache-Control:no-cache).
Unfortunately HEAD is typically implemented in servers (e.g. Apache)without running the relevant CGI scripts, which makes them hard toimplement at all. I also disagree that this would be a correct applicationof the HEAD method's semantics.

HEAD and GET have the same semantics - the only difference being thatfor HEAD the response body is not transmitted. Servers that implementHEAD differently technically are not compliant.

For link tracking, my understanding was that there is no response bodyexpected. Thus, for a server that implements a "link auditing resource",both GET and HEAD actually will do the same -- invoke some kind oftracking (minimally dumping the URI into a log file), and just returnwith an HTTP 2xx status and no body.


Thus, I would expect that GET and HEAD can be used interchangeably.

On Sun, 28 Oct 2007, Henri Sivonen wrote:
That might work and could be a tad safer. It isn't in any waytheoretically pure from the RFC 2616 point of view, though, to make HEADand GET have different semantics beyond the response body presence.
Indeed.


Yes, and nobody suggested that.

On Sun, 28 Oct 2007, Julian Reschke wrote:
That might work and could be a tad safer. It isn't in any waytheoretically pure from the RFC 2616 point of view, though, to makeHEAD and GET have different semantics beyond the response bodypresence.
I wasn't suggesting that.
You suggested that we should use HEAD for request tracking, which indeedmakes GET and HEAD have different semantics in a way that does not match(at least my interpretation of) RFC2616.

Again, no I didn't. I suggested HEAD, but that doesn't mean I was tryingto have different semantics for GET -- please assume for a moment that Ihave *some* expertise both with HTTP servers and the protocol definition.

Fundamentally, the ping being sent is not a user request of any kind atall, it is a third-party request for information about what the user isdoing. This is not a transaction between a server and a client in thesense that HTTP usually offers, it is a one-way message from the clientto a third party. So we are just using HTTP as a transport method ofconvenience since it is there. This is probably reasonable in thecircumstances, but I don't yet understand how it matters which method wedecide to turn into a one-way message in the absence of a mechanism forsuch.
Hopefully the points put forward earlier in this e-mail cover this insufficient detail.


Nope.

On Sun, 28 Oct 2007, Charles McCathieNevile wrote:
You mean POST, right? As far as I am concerned, the HEAD requestsuggestion is the least departure from normal HTTP (since there isalready llttle expectation that HEAD will pass a response to the user),but I still don't see
(I'm not sure if you meant to stop here or not.) HEAD seems even lessdesirable that head from the point of view of HTTP -- it's only supposedto get the HTTP headers of the resource, without doing anything at all!

That's incorrect. The semantics of HEAD and GET are *exactly* the same,except for the response body not being transmitted.


So if GET "does" something, "HEAD" will need to do so as well.

On Mon, 29 Oct 2007, Julian Reschke wrote:
So the scenario is:

1) User A browses web site B.

2) A follows an HTML link to site C.
3) The owner of B wants to be informed of that event in order to chargethe owner of C for an online ad linking to C.
That's one scenario; there are other, possibly more important ones, forexample: tracking results in search, so that more popular entries can havesubsequent rankings boosted, or usability studies tracking which linksusers prefer on a site.

Understood. I didn't mention them here, because it seems you were mainlyconcerned about the ad issue. In *this* case, there's even less reasonto use an unsafe method.

My position is that although money may be exchanged between B and C dueto the notification (ping), this is a transaction between B and C, and AMUST NOT be involved. In other words, following a hyperlink MUST stay"safe" in the RFC2616 sense.
The hyperlink does stay safe; however, the ping is not idempotent, andshould not use an idempotent method.

I still do not understand why it needs to be unsafe. You seem to beconcerned about the ping being executed when the user *didn't* navigate-- but what does this have to do with safe vs unsafe?

Just state in the spec that the GET/HEAD operation on the ping targetMUST happen at most one time per user-initiated navigation to the href'dURI.

(emphasis on the last paragraph!)
The last paragraph actually doesn't apply -- it gives reasons not to useGET, or to be careful with GET, and doesn't actually give advice on othermethods.

I do not understand how it "does not" apply, and I also disagree that itgives reasons not use GET. On the contrary, it explicitly allows serversto do something with GET that has side-effects -- as long as the user isnot made accountable for it. Exactly this case, it seems to me.

On Sun, 28 Oct 2007, Kornel Lesinski wrote:
OTOH ping is all about creating side-effects, and only non-safe methodsshould cause them.
Indeed.

And again: it depends on who is made accountable for the side effect.The user following the link shouldn't be.


Or are we suddenly talking about a vehicle for micro payments here?

The root cause why using POST is unsafe is CSRF, and there should be aseparate effort dealing with that (covering all cases, not only ping).
I agree with all of the above.

I don't follow. Just because there are other problems with POST, there'sno reason to invent a whole new feature using POST when we don't have to.

On Sun, 28 Oct 2007, Julian Reschke wrote:
Following a link should not cause side effects the user (A) can be madeaccountable for.
Agreed. And nothing says that the user can be made accountable for POSTsmade for ping="" attributes. Just that the user _can't_ ever be madeaccountable for side-effects made in response to GETs.


Wait a second.

So you are willing to state that ping-initiated HTTP method invocationsmust not cause an action the user can be made accountable for. I agreewith that.


But then, why don't you use a safe method in the first place?

And, fortunately this is not the case here. The only party for which theside effect is relevant is the site owner (B), and potentially the party(C) the link points to.
And sometimes the user, e.g. when the tracking is used to improve searchresults in future searches, or to personalise a site to the user's habbitsby promoting areas of a site that the user uses the most.

All of these are cases where HTTP experts will tell you that GET or HEADis just fine.

I hope this clarifies the issues surrounding the post="" attribute. Iunderstand that not everybody agrees on this, but when there are requeststhat are mutually exclusive, we can't make everyone happy. I hope that theexplanations above address most of the concerns that were raised, but Iunderstand that they might not. I would ask anyone who still disagreeswith what the spec says to please consider the above explanationscarefully; simply raising the same issue that has already been raised,with no new information or reasoning, is unlikely to result in a differentreply. I try to base the design of the spec on the balance of all input,not on the volume of input.


I have to say it didn't help me.

I've seen no evidence why ping has to use an unsafe method at all.

Best regards, Julian

Re: [whatwg] Feedback on the ping="" attribute (ISSUE-1)

Reply via email to