Re: [oae-dev] Making OAE indexable by Google

Simon Gaeremynck Thu, 02 Aug 2012 15:05:19 -0700

Hey,

@Nate
From doing a quick google search it looks like Bing and Yahoo support the 
Google "standard" regarding hashbangs [1].
Afaict that's anecdotal evidence though, so I'll need to have a look whether 
that's actually the case.


As Nico mentioned, Google's Cache should probably be fine as the Filter doesn't 
change the HTML in any way.


@Christian
From the tools I've tried it seems to be one of the better ones. They are 
looking into improving the PDF generator which could possibly be an option to 
replace the preview processor.
I was under the impression that BSD would be compatible with the Sakai License 
but it might actually be not. As IANAL somebody with a deeper understanding of 
licensing should probably check. 

Regards,

Simon



[1] 
http://searchengineland.com/bing-now-supports-googles-crawlable-ajax-standard-84149

On 2 Aug 2012, at 21:50, Nicolaas Matthijs <[email protected]> 
wrote:

> Hi Nate,
> 
>> Would the solution you propose serve other search indexers as well, or just 
>> Google?
> 
> This is a good question and hopefully Simon can jump in here. I believe the 
> hashbang approach will only allow us to be indexed by Google, so perhaps it's 
> worth exploring whether or not using the User Agents and detecting anything 
> that's not a proper browser makes sense.
> 
>> I was also wondering what the effect of the solution you propose would be on 
>> the cached versions of pages Google stores that are available via Google 
>> search, eg:
>> http://webcache.googleusercontent.com/search?q=cache:ELwAKCibsysJ:www.sakaiproject.org/+&cd=1&hl=en&ct=clnk&gl=us
> 
> I think this should be fine. We will return the HTML exactly like it would be 
> rendered in a browser, so that will be shown as the cached version as well.
> 
> Hope that helps,
> Nicolaas
> 
> 
> 
>> On Thu, Aug 2, 2012 at 11:03 AM, Simon Gaeremynck <[email protected]> 
>> wrote:
>> Hi all,
>> 
>> I've been working on KERN-3084 [1] which tries to add support for Google's 
>> AJAX crawler [2].
>> When Google notices you're using AJAX/Javascript to display content on your 
>> page it sends a request to the server asking for a completely rendered page. 
>> The idea is that we then run the page trough a headless browser and sent 
>> that response back to Google.
>> 
>> I've created an implementation [3] [4] that does this but I'd like some 
>> feedback before I send a PR.
>> This commit would, much like the preview processor, bring in yet another 
>> dependency. I'm using PhantomJS as it fires up a headless WebKit browser and 
>> exposes a nice little nodejs api that you can (ab)use.
>> I tried using the same toolset as the previewprocessor (wkhtmltopdf) but 
>> that just seems to generate PDF's and doesn't allow access to the generated 
>> DOM? 
>> (PhantomJS supports PDF creation but it's nowhere near as good as 
>> wkhtmltopdf though.)
>> 
>> 
>> What's the feeling about this? Does anyone have a recommendation for a 
>> better tool/approach?
>> 
>> Regards,
>> 
>> Simon
>> 
>> 
>> 
>> [1] https://jira.sakaiproject.org/browse/KERN-3084
>> [2] 
>> https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
>> [3] 
>> https://github.com/simong/nakamura/commit/83212d6fe814ee32be7dd3d9cd771c40dff6f69f
>> [4] 
>> https://confluence.sakaiproject.org/display/KERNDOC/KERN-3084+Making+OAE+indexable+by+Google
>> [5] http://phantomjs.org/
>> 
>> _______________________________________________
>> oae-dev mailing list
>> [email protected]
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>> 
>> 
>> _______________________________________________
>> oae-dev mailing list
>> [email protected]
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Re: [oae-dev] Making OAE indexable by Google

Reply via email to