> Date is the time of the server response and not last data update. Data > is definitely time of server response to my request and bears no > relation to when the live XML data was updated. I know this for a fact > because right now there is no active race meeting and any data still > available is static and many hours old. I would not feel confident > rejecting incoming data as duplicate based only on same content length > criterion. Am I missing something here?
It looks like the data is dynamically generated on the server, so the web server doesn't know if/when the data changed. You will usually see this for static content (images, html files, etc). You could go by the Cache-Control line and only fetch data every 30 seconds, but it's possible for you to miss some updates this way. Another thing you could try (if necessary, this is a bit of an overkill) - download the first part of the XML (GET request with a range header), and check the timestamp you mentinoed. If that changed then re-request the doc (a download resume is risky, the XML might change between your 2 requests). David. -- http://mail.python.org/mailman/listinfo/python-list