Craig Andrews wrote:
So... how can we implement federated, cross OMB search? I have some crazy
ideas, but I'm interested in what others think first.
(Ideally, search would search OMB microblogs, not just StatusNet instances.)
There are three ways this can work that I can think of.
1. *Peer-to-peer search*, like Gnutella. When you do a global search
on a site, it asks all the sites it knows about (through
subscriptions or whatever) for /their/ results for the search
term. They maybe ask /their/ peers for the search term, out to
some arbitrary depth. At an outer limit, we could demand that
/all/ sites respond to the request, either by keeping them in a
master list somewhere, or some other weird way.
This would perform terribly -- a matter of minutes or hours. It
wouldn't get /all/ notices, and it would be hard for the site
where the search originated to present the results in any
reasonable way. As far as I'm concerned, distributed peer-to-peer
search is not a problem we should have to solve.
2. *Centralized search, pull model*. A centralized search engine
provides the UI for search (and maybe an API that OMB sites can
embed into their sites, viz. Collecta's API). The search engine
discovers OMB servers on its own and polls the public feed or
individuals' feeds at regular intervals. It may use SUP to make
this crawling more efficient; it may even use PuSH or RSSCloud to
do push-mode subscriptions. We could also support subscriptions to
the public feed through OMB.
The great part about this is that we (statusnet developers) don't
have to do anything. The bad part is that it's hard to run a
search engine. You have to have a lot of resources and smarts.
3. *Centralized search, push model*. We could configure StatusNet to,
by default, ping or push public notices to one or more centralized
search engines. For status.net hosted sites, we ping pingomatic,
Google Blog Search, and weblogs.com by default. We could probably
add these and others to lib/common.php for pings by default. We
could do the same thing with the public XMPP feed (identi.ca feeds
most search engines using XMPP, but very few StatusNet sites are
XMPP-enabled). Pings are a little inefficient; we could use PuSH,
RSSCloud, or a custom notification system.
We could also run a big weblogs.com-style "reflector" for all
sites. So, every StatusNet site (and other OMB or even non-OMB
microblogging sites) could ping "reflector.status.net" or
whatever, and that site could in turn ping lots of other search
services. We could even host a Free Network Service for search
(although I would still want to have the reflector service.)
Of the three, I think #3 makes the most sense. I think we should add
some default ping targets in 0.9.0, and try to get a reflector working
for status.net ASAP.
-Evan
--
Evan Prodromou
CEO, StatusNet, Inc.
[email protected] - http://identi.ca/evan - +1-514-554-3826
_______________________________________________
StatusNet-dev mailing list
[email protected]
http://lists.status.net/mailman/listinfo/statusnet-dev