[web2py:20691] Strategy for exposing CMS Content from db to Google spider crawler

dlypka Mon, 27 Apr 2009 15:47:20 -0700

I see that for the web2py Wiki, Content Search is a high priority
Action Item, and I imagine for T3 as well.
But since most of the content is in the database, not in easy to crawl
static pages,
I wonder what would be a good technique for exposing the content from
the db to the crawler?


Myself I would suppose one could create a private app-specific
extracter process over the content db in a Cron task which would
create some artifcial 'crawler-friendly' static pages into a special
directory, and then it would generate a 'Google XML Sitemap' (see
http://www.gcwweb.com/search-engine-optimisation/google-sitemaps-search-engine-website-submission.htm)
to point to the special crawlable pages.  These pages would have to
contain some type of index of links to the actual content, perhaps
driven by tags which hopefully the content creators have been creating
for each page.

Does anyone out there have more ideas?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:20691] Strategy for exposing CMS Content from db to Google spider crawler

Reply via email to