Hi Nate,
Would the solution you propose serve other search indexers as well,
or just Google?
This is a good question and hopefully Simon can jump in here. I
believe the hashbang approach will only allow us to be indexed by
Google, so perhaps it's worth exploring whether or not using the User
Agents and detecting anything that's not a proper browser makes sense.
I was also wondering what the effect of the solution you propose
would be on the cached versions of pages Google stores that are
available via Google search, eg:
http://webcache.googleusercontent.com/search?q=cache:ELwAKCibsysJ:www.sakaiproject.org/+&cd=1&hl=en&ct=clnk&gl=us
I think this should be fine. We will return the HTML exactly like it
would be rendered in a browser, so that will be shown as the cached
version as well.
Hope that helps,
Nicolaas
On Thu, Aug 2, 2012 at 11:03 AM, Simon Gaeremynck <[email protected]
> wrote:
Hi all,
I've been working on KERN-3084 [1] which tries to add support for
Google's AJAX crawler [2].
When Google notices you're using AJAX/Javascript to display content
on your page it sends a request to the server asking for a
completely rendered page. The idea is that we then run the page
trough a headless browser and sent that response back to Google.
I've created an implementation [3] [4] that does this but I'd like
some feedback before I send a PR.
This commit would, much like the preview processor, bring in yet
another dependency. I'm using PhantomJS as it fires up a headless
WebKit browser and exposes a nice little nodejs api that you can
(ab)use.
I tried using the same toolset as the previewprocessor (wkhtmltopdf)
but that just seems to generate PDF's and doesn't allow access to
the generated DOM?
(PhantomJS supports PDF creation but it's nowhere near as good as
wkhtmltopdf though.)
What's the feeling about this? Does anyone have a recommendation for
a better tool/approach?
Regards,
Simon
[1] https://jira.sakaiproject.org/browse/KERN-3084
[2] https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
[3]
https://github.com/simong/nakamura/commit/83212d6fe814ee32be7dd3d9cd771c40dff6f69f
[4]
https://confluence.sakaiproject.org/display/KERNDOC/KERN-3084+Making+OAE+indexable+by+Google
[5] http://phantomjs.org/
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev