Maybe instead of physical separation we can settle for logical separation.
Suppose we enable <link rel="exhibit/data" href="#local"> to specify that the data can be found in the element with name or id #local in the html doc? that data can be cdata encoded and meets the goal of being machine readable. It does require xml parsing but that's a relatively small cost. David Huynh wrote: > Search engines are only interested in crawling (probably) visible HTML > content, so anything to be crawled must be in HTML, and that spoils the > whole point of separating data from presentation. I think the only way > to have both separation of data and presentation as well as > crawl-ability is to store the data in JSON files or whatever, and have a > cached rendering of *some* of the data in HTML. Maybe you can specify > some ordering of the items as well as a cut-off limit, and that > determines which items--potentially the most interesting ones--get > rendered into HTML. That way you won't duplicate the data 100%. > > So your PHP file will look something like this > > <html> > <head> > > <link rel="exhibit/data" href="data1.json" > type="application/json" /> > <link rel="exhibit/data" href="data2.rdf" > type="application/rdf+xml" /> > > </head> > <body> > ... > > <div ex:role="lens" id="template-1" ...>...</div> > > <noscript> > <?php > $curl_handle=curl_init(); > > curl_setopt($curl_handle,CURLOPT_URL,'http://service.simile-widgets.org/exhibit-render?'); > curl_exec($curl_handle); > curl_close($curl_handle); > ?> > </noscript> > > </body> > </html> > > The trouble is how to pass data1.json, data2.rdf, and the lens template > to the web service exhibit-render. We could potentially make a php > library file that when you include it into another php file, it parses > the containing php file, extracts out the data links and lens templates, > and calls the web service exhibit-render automatically. > > <?php > include("exhibit-rendering-lib.php"); > renderExhibit("template-1", ".age", true, 10); #id of lens > template to use, sort by expression, sort ascending, limit > ?> > > I don't know enough php to know if that's possible / easy. > > David > > > John Clarke Mills wrote: > >> Vincent, >> >> Although the idea of detecting user agent is a sound one, this can >> also be construed as cloaking, which if caught, you will be penalized >> by Google. I often flip a coin my head on a subject like this because >> what you are saying makes perfect sense; however, we dont always know >> how Googlebot is going to react. >> >> Just some food for thought. There's a good chance I will be >> attempting to combat this problem in the near future and I will report >> back. >> >> Cheers. >> >> On May 26, 1:02 am, Vincent Borghi <[email protected]> wrote: >> >> >>> Hi, >>> >>> >>> >>> On Sat, May 23, 2009 at 2:36 AM, David Huynh <[email protected]> wrote: >>> >>> >>> >>>> Hi all, >>>> >>>> Google recently introduced "rich snippets", which are basically >>>> microformats and RDFa: >>>> >>>> http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-s... >>>> >>>> The idea is that if your web page is marked up with certain attributes >>>> then search results from your web page will look better on Google. >>>> >>>> So far exhibits' contents are not crawl-able at all by search engines, >>>> because they are contained inside JSON files rather than in HTML, and >>>> they are then rendered dynamically in the browser. >>>> >>>> Since Google is starting to pay attention to structured data within web >>>> pages, I think it might be a really good time to start thinking about >>>> how to make exhibits crawl-able *and* compatible with Google's support >>>> for microformats and RDFa at the same time. Two birds with one stone. >>>> >>>> One possible solution is that if you use Exhibit within a php file, then >>>> you could make the php file get some service like Babel to take your >>>> JSON file and generate HTML with microformats or RDFa, and inject that >>>> into a <noscript> block. >>>> >>>> Please let me know if you have any thought on that! >>>> >>>> >>> AFAI understand, in the possible solution you mention, you finally >>> always double the volume of the served data: you serve the original json >>> plus a specially tagged version in a <noscript>. >>> >>> This works and is surely appropriate in many cases, >>> >>> I just add as a remark that, since it may cost bandwidth just to serve >>> additional data (data specially tagged for Google) that in the general case >>> (a human visitor using a browser) is not used, an alternative solution >>> may be preferable in certain cases, and when this is possible: >>> >>> For those of us who can customize their httpd.conf configuration >>> of their apache server, we may prefer to implement the solution >>> which is to serve appropriately, on the same URL, two different versions: >>> - one version being the "normal" exhibit, for "normal" human visitors, >>> - and the other, for (google)bots, being an ad-hoc html (either static or >>> dynamically generated by cgi or similar, using or not babel). >>> >>> This assumes we configure apache to serve, for the same given URL, >>> the first or the other version, depending on the user-agent that visits >>> this URL >>> (using appropriate "RewriteCond %{HTTP_USER_AGENT} .../ rewriterule.. >>> in the apache httpd.conf). >>> >>> Regards >>> >>> >>> >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "SIMILE Widgets" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/simile-widgets?hl=en -~----------~----~----~----~------~----~------~--~---
