Unfortunately, not receiving all changes could problematic. Remember, the 
sensitive nature of updating ensures all implementing Subscribers may avoid 
polling completely. Aggregators could be using data as small as an atom thread 
comment count, and removing that as a change to be distributed from the Hub 
might be undesireable.

Maybe this could be clarified somewhere? Build up a list of possible 
"duplicated" updates from Hubs.

In any case, since polling itself drags in all of these changes it should be 
easy enough to eliminate duplicates, or more accurately, eliminate updates 
where the data you find relevant remains unchanged. I know a few aggregators 
who are really content focused just make a quick check using something like the 
MD5 hash of content+title+description/summary to detect relevant changes that 
would cause an update in the backend database. Outside those three elements, 
the rest of the feed isn't considered relevant. If necessary, the original 
entry can simply be overwritten in the store.

Paddy

 Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
OpenID Europe Foundation Irish Representative





________________________________
From: Ravi Pinjala <[email protected]>
To: Brett Slatkin <[email protected]>
Cc: [email protected]
Sent: Thu, October 8, 2009 7:03:16 PM
Subject: [pubsubhubbub] Re: duplicated entries v3


Guess so, yeah. It'd be nice to not get these kinds of small updates,
but I can't think of any even remotely good way for that to work. XD The
hub can't just ignore extensions, obviously, and any kind of filtering
that I can think of would add way more complexity than it's worth.

Also, CCing back to the list since it's not initially obvious why it
happens.

--Ravi

Brett Slatkin wrote:
> There ya go. So these aren't actually duplicates! You're getting
> up-to-date information about the threading extensions.
>
> On Thu, Oct 8, 2009 at 10:53 AM, Ravi Pinjala <[email protected]> wrote:
>  
>> May have found it!
>>
>> Looking at my logs again, I went through a few of the duplicated entries
>> more carefully and found this:
>>
>> 5'>http://purl.org/syndication/thread/1.0";>5
>> <thr:total xmlns:thr="http://purl.org/syndication/thread/1.0";>9</thr:total>
>>
>> It seems that each of the feeds I get duplicated updates for includes a
>> count of the comments on the entry, which also explains why I don't get
>> duplicates for rarely-updated feeds - they're also rarely commented-on.
>>
>> --Ravi
>>
>>    

Reply via email to