Search engines are only interested in crawling (probably) visible HTML 
content, so anything to be crawled must be in HTML, and that spoils the 
whole point of separating data from presentation. I think the only way 
to have both separation of data and presentation as well as 
crawl-ability is to store the data in JSON files or whatever, and have a 
cached rendering of *some* of the data in HTML. Maybe you can specify 
some ordering of the items as well as a cut-off limit, and that 
determines which items--potentially the most interesting ones--get 
rendered into HTML. That way you won't duplicate the data 100%.

So your PHP file will look something like this

    <html>
        <head>
       
            <link rel="exhibit/data" href="data1.json" 
type="application/json" />
            <link rel="exhibit/data" href="data2.rdf" 
type="application/rdf+xml" />
           
        </head>
        <body>
            ...
           
            <div ex:role="lens" id="template-1" ...>...</div>

    <noscript>       
    <?php
    $curl_handle=curl_init();
    
curl_setopt($curl_handle,CURLOPT_URL,'http://service.simile-widgets.org/exhibit-render?');
    curl_exec($curl_handle);
    curl_close($curl_handle);
    ?>
    </noscript>

        </body>
    </html>

The trouble is how to pass data1.json, data2.rdf, and the lens template 
to the web service exhibit-render. We could potentially make a php 
library file that when you include it into another php file, it parses 
the containing php file, extracts out the data links and lens templates, 
and calls the web service exhibit-render automatically.

    <?php
       include("exhibit-rendering-lib.php");
       renderExhibit("template-1", ".age", true, 10); #id of lens 
template to use, sort by expression, sort ascending, limit
    ?>

I don't know enough php to know if that's possible / easy.

David


John Clarke Mills wrote:
> Vincent,
>
> Although the idea of detecting user agent is a sound one, this can
> also be construed as cloaking, which if caught, you will be penalized
> by Google.  I often flip a coin my head on a subject like this because
> what you are saying makes perfect sense; however, we dont always know
> how Googlebot is going to react.
>
> Just some food for thought.  There's a good chance I will be
> attempting to combat this problem in the near future and I will report
> back.
>
> Cheers.
>
> On May 26, 1:02 am, Vincent Borghi <vincent.borgh...@gmail.com> wrote:
>   
>> Hi,
>>
>>
>>
>> On Sat, May 23, 2009 at 2:36 AM, David Huynh <dfhu...@alum.mit.edu> wrote:
>>
>>     
>>> Hi all,
>>>       
>>> Google recently introduced "rich snippets", which are basically
>>> microformats and RDFa:
>>>       
>>> http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-s...
>>>       
>>> The idea is that if your web page is marked up with certain attributes
>>> then search results from your web page will look better on Google.
>>>       
>>> So far exhibits' contents are not crawl-able at all by search engines,
>>> because they are contained inside JSON files rather than in HTML, and
>>> they are then rendered dynamically in the browser.
>>>       
>>> Since Google is starting to pay attention to structured data within web
>>> pages, I think it might be a really good time to start thinking about
>>> how to make exhibits crawl-able *and* compatible with Google's support
>>> for microformats and RDFa at the same time. Two birds with one stone.
>>>       
>>> One possible solution is that if you use Exhibit within a php file, then
>>> you could make the php file get some service like Babel to take your
>>> JSON file and generate HTML with microformats or RDFa, and inject that
>>> into a <noscript> block.
>>>       
>>> Please let me know if you have any thought on that!
>>>       
>> AFAI understand, in the possible solution you mention, you finally
>> always double the volume of the served data: you serve the original json
>> plus a specially tagged version in a <noscript>.
>>
>> This works and is surely appropriate in many cases,
>>
>> I just add as a remark that, since it may cost bandwidth just to serve
>> additional data (data specially tagged for Google) that in the general case
>> (a human visitor using a browser) is not used, an alternative solution
>> may be preferable in certain cases, and when this is possible:
>>
>> For those of us who can customize their httpd.conf configuration
>> of their apache server, we may prefer to implement the solution
>> which is to serve appropriately, on the same URL, two different versions:
>>  - one version being the "normal" exhibit, for "normal" human visitors,
>>  - and the other, for (google)bots, being an ad-hoc html (either static or
>> dynamically generated by cgi or similar, using or not babel).
>>
>> This assumes we configure apache to serve, for the same given URL,
>> the first or the other version, depending on the user-agent that visits this 
>> URL
>> (using appropriate "RewriteCond %{HTTP_USER_AGENT} .../ rewriterule..
>> in the apache httpd.conf).
>>
>> Regards
>>     
> >
>   


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"SIMILE Widgets" group.
To post to this group, send email to simile-widgets@googlegroups.com
To unsubscribe from this group, send email to 
simile-widgets+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/simile-widgets?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to