Added some comments inline.

From: [email protected] [mailto:[email protected]] On Behalf Of ttuttle
Sent: Thursday, July 24, 2014 2:35 PM
To: [email protected]
Subject: Proposing changes to Navigation Error Logging

Hi,

I'd like to propose a few changes to the Navigation Error Logging draft spec 
(https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/NavigationErrorLogging/Overview.html):

1. I'd like to allow more than one error report to be uploaded at once, and 
allow the browser to delay that upload to collect multiple reports. When a page 
is failing to load, users will often try multiple times, and it would reduce 
server load if the error reports could be sent together.

Aaron: When a page is failing long enough to get multiple failures for the same 
user/error, it’s probably the exact same error message. I’d be more inclined to 
allow your “delay to collect more reports”, but then dedup identical errors 
within that window and send a count of how many times it occurred. But I really 
feel like 1 sample of the error is likely enough.

I’m also not an advocate for the automatic telemetry send, but not outright 
against it. I’m more interested in the js access on the next page request, the 
refresh. Then I can do what you suggested, but have total control over it.

2. Format-wise, to support that, instead of sending a single entry as a JSON 
dictionary, I'd like to send a dictionary with a single entry called "entries", 
with an array of entries. (I'm suggesting a dictionary so that future versions 
of the spec can add additional fields; the server would be expected to ignore 
unknown keys in the dictionary.)


3. I'd like to allow the user-agent to retry the uploads if they fail. If the 
issue is a transient network issue (i.e. a route is flapping), it's a waste to 
throw out the error report just because the network was still glitched the 
first time the upload was attempted.

Aaron: This reads like a denial of service attack. We did discuss it 
originally, but how do you control the retries when an origin has a short lived 
but widespread spike in errors, especially when the origin for the error is 
also the origin/logging endpoint for these navigation error calls. A few 
seconds after it recovers it gets hit with a global surge in telemetry request, 
knocking if offline, more errors…... Also goes back to #1, any error that is 
stable enough to repro is going to be reported by a large number of users. I 
expect this system to be lossy telemetry wise. Optimized to protect the origin, 
not the error telemetry. And if you wait for the next successful page load, 
then you can get the errors from the queue.

4. I'd like to figure out a way to support logging errors involving requests 
that were not top-level navigations. There are plenty of other things that can 
fail to load that the site owner might not necessarily have control over. (For 
example, Facebook might want to know when parts of their social plugin fail to 
load, even if they are not hosted on a site where Facebook can add an error 
handler.)

Aaron: I completely agree, and this was one of my largest goals with this spec. 
But it basically became a CORS problem and we agreed that it likely wasn’t 
going to be solved in this first round. So as not to delay getting top-level 
errors, getting a foothold on the problem, we went ahead without CORS level 
errors. I really hope we can change that in the long run. It is very 
compelling. I’d love to discuss that and sort it out.

Thoughts?

Thanks,

ttuttle

Reply via email to