Re: [Standards] Pub/Sub & RSS

Kelly S Thu, 22 May 2008 10:52:20 -0700
> Date: Thu, 22 May 2008 10:53:34 -0600> From: [EMAIL PROTECTED]> To: 
> [email protected]> Subject: Re: [Standards] Pub/Sub & RSS> > On 05/21/2008 
> 10:09 PM, Kelly S wrote:> > Thanks for the reply!> > > > I took a look at 
> what your saying about NodeID and I understand a bit> > more clearly now. Wow 
> the Pub/Sub spec is large!> > Don't be afraid, it's just comprehensive. :)> > 
> > Anyways. I'm thinking of writing a service where users can> > "subscribe" 
> to feeds off the web. A service will be monitoring all> > "feeds" and pulling 
> the RSS/Atom/whatever off of the web and> > populating the Pub/Sub nodes in 
> batch so users get notifications.> > > > So really the service will create 
> the new node to represent the news> > feed if its not already found in the 
> Pub/Sub query and then add it to> > its "queue" of feeds to poll.> > > > I'm 
> not quite sure what XEP to use to allow a user to "request"> > pulling of a 
> feed. I haven't figured that part out yet.> > You can request the items in a 
> feed by using this:> > 
> http://www.xmpp.org/extensions/xep-0060.html#subscriber-retrieve> 
 
Sorry I don't think I explained too well. Users will be requesting to add feeds 
to their account, which in turn will subscribe them to the corresponding 
pub/sub node.
 
But if someone requests a feed we are not "publishing" we need to begin 
publishing that new feed and subscribe the user to it.
 
>From then on that feed will be publishing new entries for when additional 
>people start subscribing to it.
 
I'm not sure how to handle the user requesting a new feed that doesn't exist 
yet so that we can start pulling/publishing it.
> > I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I> > 
> > could create 1 format to use for the "entry" and transform them to> > all 
> > match but for sites which add extra metadata to entries such as> > Digg 
> > with its DiggCount I would like to maintain that. That is the> > beauty of 
> > XML :)> > But RSS isn't XML. :P> 
 
RSS isn't XML? I'm not sure what you mean lol. Are they not exposed as XML 
documents? lol.
> > That way Jabber clients who understand extensible items can display> > 
> > them.> > > > I like the idea of the NodeID being the feeds url because then 
> > this> > "polling" service can use the feeds URL easily as the SET after> > 
> > downloading content.> > Sure. :)> > > Any suggestions / recommendations 
> > would be great to hear!Another item> > I am unclear about is as I am 
> > polling for news data, how can I easily> > check if an "entry" exists 
> > already? I'd rather not have to keep a> > cache somewhere of all the items 
> > I have created already.> > > > Although performance is going to suck if I 
> > have to check every entry> > before inserting. Is there a way to batch 
> > insert, and disallow> > duplicate entries based on *something* like entry 
> > title or something?> > > > > > It's nice to be back in Jabber land :) I'm 
> > back into the old mindset> > where I have a ZILLION ideas rushing into my 
> > head about all the crazy> > things I can do with these XEPs lol.> > > > Any 
> > help would be great!> > > > Thanks so much!> > You might want to join this 
> > list for discussion:> > http://mail.jabber.org/mailman/listinfo/social> > 
> > "XMPP and Social Networking, Two Great Tastes That Taste Great Together!"> 
> > > /psa> 
Also a bunch of concerns pop into my head that I'm still unclear about when 
maintaining all this "feed" data.
 
1. Is there any way I can just publish the latest feeds I pulled down and 
pubsub discard any duplicates that may already exist? Or is the right way to 
handle this is to query every single entry individually to check if they exist 
before publishing them?
 
1.a If I have to query each individually. Would I make the entry id the url of 
the entry so that I can check for duplicates as I pull them down off the web 
and read the entries? (example 4 old entries, but 1 new entry since last pull).
 
2. Entry requesting. Is there any sort of querying we can use against them? If 
we are publishing tons of entries, someone may want to browse/read them but 
only request X amount at a time, or only newer than a certain date etc. I don't 
think pulling the entire entry history of a couple months is going to be too 
efficient.
 
Some of my concerns for #2 is because we are going to mobilize this, data plans 
are expensive for mobile devices. In this country it can be $25/1.5MB/month.
 
We want to try and make our mobile client query data efficiently but do these 
capabilities exist in Pub/Sub?
 
Thanks so much for the help :)
_________________________________________________________________
Re: [Standards] Pub/Sub & RSS

Reply via email to