Steve Judkins wrote:
I found Medline to have a pretty nice model for this. Every so often they
ship a full DB dump in XML as chunked zip files (not more than a 1Gb each if
I remember). Subscribers just synchronize the FTP directories between the
Medline server and local server. After that you can process daily diff
dumps. The downloads were just XML with a stream of record URIs with an
Add/Modify/Delete attribute, and the data fields that changed. A well
known graph where you can look for changes to the LOD datasources you care
about, and get SIOC markup for this that describes the Items, Date, and
Agents/People doing the modifications. This is a great use case for the
FOAF+SSL & OAuth because you may only automatically process updates from
Agents you trust (e.g. Wikipedia might only take changes from DBPedia).
Steve,
You're very much on the ball here, this is very much the kind of thing
foaf+ssl [1] is about :-) I was going to unveil similar capabilities re.
DBpedia endpoint down the line i.e. SPARQL endpoint behavior aligned to
trusted identities etc..
Links:
1. http://esw.w3.org/topic/foaf+ssl - FOAF+SSL
Kingsley
-Steve
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf
Of Kingsley Idehen
Sent: Monday, March 23, 2009 3:34 PM
To: Steve Judkins
Cc: 'Hugh Glaser'; [email protected]
Subject: Re: Potential Home for LOD Data Sets
Steve Judkins wrote:
It seems like this has the potential to become a nice collaborative
production pipeline. It would be nice to have a feed for data updates, so
we
can fire up our EC2 instance when the data has been processed and packaged
by the providers we are interested in. For example, if Openlink wants to
fire up their AMI to processes the raw dumps from
http://wiki.dbpedia.org/Downloads32 into this cloud storage, we can wait
until a virtuoso ready package has been produced before we update. As
more
agents get involved in processing the data, this will allow for more
automation notifications of updated dumps or SPARQL endpoints.
Yes, certainly.
Kingsley
-Steve
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf
Of Kingsley Idehen
Sent: Thursday, December 04, 2008 9:20 PM
To: Hugh Glaser
Cc: [email protected]
Subject: Re: Potential Home for LOD Data Sets
Hugh Glaser wrote:
Thanks for the swift response!
I'm still puzzled - sorry to be slow.
http://aws.amazon.com/publicdatasets/#2
Says:
Amazon EC2 customers can access this data by creating their own personal
Amazon EBS volumes, using the public data set snapshots as a starting
point.
They can then access, modify and perform computation on these volumes
directly using their Amazon EC2 instances and just pay for the compute and
storage resources that they use.
Does this not mean it costs me money on my EC2 account? Or is there some
other way of accessing the data? Or am I looking at the wrong bit?
Okay, I see what I overlooked: the cost of paying for an AMI that mounts
these EBS volumes, even though Amazon is charging $0.00 for uploading
these huge amounts of data where it would usually charge.
So to conclude, using the loaded data sets isn't free, but I think we
have to be somewhat appreciative of a value here, right? Amazon is
providing a service that is ultimately pegged to usage (utility model),
and the usage comes down to value associated with that scarce resource
called time.
Ie Can you give me a clue how to get at the data without using my credit
card please? :-)
You can't you will need someone to build an EC2 service for you and eat
the costs on your behalf. Of course such a service isn't impossible in a
"Numerati" [1] economy, but we aren't quite there yet, need the Linked
Data Web in place first :-)
Links:
1. http://tinyurl.com/64gsan
Kingsley
Best
Hugh
On 05/12/2008 02:28, "Kingsley Idehen" <[email protected]> wrote:
Hugh Glaser wrote:
Exciting stuff, Kingsley.
I'm not quite sure I have worked out how I might use it though.
The page says that hosting data is clearly free, but I can't see how to
get at it without paying for it as an EC2 customer.
Is this right?
Cheers
Hugh,
No, shouldn't cost anything if the LOD data sets are hosted in this
particular location :-)
Kingsley
Hugh
On 01/12/2008 15:30, "Kingsley Idehen" <[email protected]> wrote:
All,
Please see: <http://aws.amazon.com/publicdatasets/> ; potentially the
final destination of all published RDF archives from the LOD cloud.
I've already made a request on behalf of LOD, but additional requests
from the community will accelerate the general comprehension and
awareness at Amazon.
Once the data sets are available from Amazon, database constructions
costs will be significantly alleviated.
We have DBpedia reconstruction down to 1.5 hrs (or less) based on
Virtuoso's in-built integration with Amazon S3 for backup and
restoration etc.. We could get the reconstruction of the entire LOD
cloud down to some interesting numbers once all the data is situated in
an Amazon data center.
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com