Giovanni Tummarello wrote:
Jay,
actually, as Kingsley was suggesting already, the truly best way to
expose this data would be by embedding RDFa in the actual web pages
that bestbuy has.
One would get :
a) the exact same benefits than publishing the files alone (afterall
the RDF is just a transformation away)
b) the certainty of metadata being the same that the user sees
c) getting away from ambiguity of identifiers, the page would be used
as identifier for the item, period. much easier for other people to
find identifiers and link to them
d) totally ready for structured snippets, yahoo searchmonkey etc and
future semantic search engine optimizations.
e) a true enabler for client side applications, e.g. a firefox plugin
which acts as side shopping "assistants" e.g.allowin rich searching,
faceted comparison in the browser history or all sort of user centric
advanced use of structured data (e.g. a la "piggybank", for the
semantic web historians)
Amen! But s/piggybank/ode/g :-)
all this just some RDFa away :-) . Is it thinkable that this can
happen? Afterall its totally invisible for the user.
My guess is that it will happen. Note, that <http://stores.bestbuy.com>
already has some RDFa in place :-)
of course the dumps would still be very useful!! (for the purpose of
not recrawling) and so the sitemap/semantic sitemap.
for entities that bestbuy does not intend to expose as pages (e.g. a
URI about a company) the pure RDF/XML would still be useful.
Hmm but the description (About) company is already exposed, so even
that's just a case of marking up the existing HTML based "About" page
with RDFa :-)
Hoping that others also agree on these benefits.
URIBurner home page is updated, and I am hoping the virtues of HTML
representation of Metadata become clearer, especially as you can deliver
these benefits via proxy/wrapper style HTTP URIs.
Kingsley
thanks again for your efforts
Giovanni
On Tue, Sep 1, 2009 at 3:43 PM, Myers, Jay<[email protected]> wrote:
All,
Thanks for the insight. As far as the sitemap is concerned, I used the
current sitemap protocol (http://www.sitemaps.org/schemas/sitemap/0.9).
Since we are publishing around 452K documents, it seemed like the correct
route to use sitemap index files, as one file would certainly contain over
50,000 URIs and be over 10MB. I’m not aware of another method in which to
publish this amount of data in a sitemap J
At this point, we have no SPARQL endpoint, we are simply publishing product
data out via RDF. I am hoping that attention to this effort will be noticed
by senior leadership, convincing them to sponsor a greater, more complete
effort that could serve as a model for big business. Any suggestions on this
would be welcome.
Thanks,
Jay
Jay Myers
Lead Web Development Engineer
Online Solutions, BestBuy.com
[email protected]
(w) 612-291-4007
(c) 612-296-5836
(twitter) @jaymyers
(skype) jaymmyers
________________________________
From: Martin Hepp (UniBW) [mailto:[email protected]]
Sent: Tuesday, September 01, 2009 8:14 AM
To: [email protected]
Cc: [email protected]
Subject: Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML
using GoodRelations - 27 million triples
Hi Giovanni:
Giovanni Tummarello wrote:
Hi Martin, all,
the sitemap exposed is not a Semantic Sitemap
Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
but simply gives the location of the dumps.
As far as I see, the sitemap at
http://products.semweb.bestbuy.com/sitemap.xml
gives the locations of the compressed semantic sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://products.semweb.bestbuy.com/sitemap1.xml.gz</loc>
<lastmod>2009-07-31T18:23:17+00:00</lastmod>
</sitemap>
Each one of those seems to be a proper semantic sitemap
E.g.
http://products.semweb.bestbuy.com/sitemap1.xml.gz
-->
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Sitemap data for Best Buy Co., Inc., products. Data
based on http://purl.org/goodrelations/</sc:datasetLabel>
<sc:datasetURI>http://products.semweb.bestbuy.com/</sc:datasetURI>
<sc:linkedDataPrefix
slicing="subject-object">http://products.semweb.bestbuy.com/</sc:linkedDataPrefix>
<sc:sampleURI>http://products.semweb.bestbuy.com/products/9380001/semanticweb.rdf</sc:sampleURI>
<sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/43900/semanticweb.rdf</sc:dataDumpLocation>
<sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48521/semanticweb.rdf</sc:dataDumpLocation>
<sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48530/semanticweb.rdf</sc:dataDumpLocation>
<sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/54256/semanticweb.rdf</sc:dataDumpLocation>
in theory if this information is exposed as linked data then one would
like to have a semantic sitemap exposed,
As said - I understand BestBuy is using the main sitemap to bundle the
individual semantic sitemaps. Note that they are dealing with 450,000
documents. A single sitemap file would be pretty large.
which includes other details
e.g. a sparql endpoint some information on the datasets etc. [1]
There is, to my knowledge, no SPARQL endpoint offered by BestBuy.com, but
you can soon simply use the Linked Open Commerce dataspace at
http://loc.openlinksw.com/sparql
This will contain a current copy of the bestbuy graphs.
has this been considered and decided against?
As far as I know, the combination of a sitemap and 23 semantic sitemaps was
a pragmatic decision. If it causes major problems, Jay Myers from BestBuy
will for sure be open to improvements for suggestions.
should we just live with
it and fit sindice to do some guesswork and process those instead? (i
am not necessarely against this last solution really.. )
You simply have to fetch and un-gzip the 23 semantic sitemaps at
http://products.semweb.bestbuy.com/sitemap<n>.xml.gz
with <n> being a number from 1 to 23.
Note that
http://products.semweb.bestbuy.com/sitemap5.xml.gz
seems to have a syntactical problem (fix is already requested).
In other words are you suggesting the use of semantic sitemaps
We usually recommend using semantic sitemaps. But actually I think that a
consolidated dataspace like the LOC will become more important in the
future, because it creates to much overhead for each agent and application
to crawl and consolidate the whole Web of Linked Data on his/her own.
or
should we just come to term to this? The disavantage is that linked
data browser that wants to use an index to find information will be
able to do so less reliably (hope that our guesswork works)
As said - I understand (without a thorough analyis, though), that BestBuy's
usage of a single sitemap and multiple semantic sitemaps is okay.
Giovanni
[1] http://sw.deri.org/2007/07/sitemapextension/
On Mon, Aug 31, 2009 at 8:08 PM, Martin Hepp
(UniBW)<[email protected]> wrote:
Dear all:
BestBuy.com has just started to serve a complete RDF/XML dump of their
products and price information to the Web of Linked Data, using the
GoodRelations vocabulary for e-commerce. The data dump is updated on a
daily basis and contains detailed descriptions for roughly 450,000
individual items. With about 60 triples per item, this totals to about
27 million RDF triples.
Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
Examples:
a) Software:
http://products.semweb.bestbuy.com/products/8182593/semanticweb.rdf
b) "Hardgoods":
http://products.semweb.bestbuy.com/products/8794691/semanticweb.rdf
c) Movies:
http://products.semweb.bestbuy.com/products/7590289/semanticweb.rdf
d) Games:
http://products.semweb.bestbuy.com/products/9223752/semanticweb.rdf
Other than many existing large RDF transcripts, the data very dynamic,
holding the daily prices for all items.
According to Wikipedia, BestBuy.com is the largest specialty retailer of
consumer electronics in the United States accounting for 19% of the market.
It is likely the first Fortune 500 company to start publishing offer
details on the Web of Linked Data.
Congratulations to Jay Myers from BestBuy.com for this excellent
contribution, and a big thanks to Andreas Radinger and Alex Stolz for
their support,
Best wishes
Martin Hepp
--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen
e-mail: [email protected]
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp
Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/
Recipe for Yahoo SearcMonkey:
http://tr.im/rAbN
Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://tinyurl.com/semtech-hepp
Overview article on Semantic Universe:
http://tinyurl.com/goodrelations-universe
Project page:
http://purl.org/goodrelations/
Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations
Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
http://tr.im/grcec09
--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen
e-mail: [email protected]
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp
Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/
Recipe for Yahoo SearcMonkey:
http://tr.im/rAbN
Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://tinyurl.com/semtech-hepp
Overview article on Semantic Universe:
http://tinyurl.com/goodrelations-universe
Project page:
http://purl.org/goodrelations/
Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations
Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
http://tr.im/grcec09
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com