Re: linked data hosted somewhere

2008-11-26 Thread रविंदर ठाकुर (ravinder thakur)
can we use this data on EC2 form environments outside EC2.


I thoguht we already have the LOD hosted somewhere with nice SPARQL etc
endpoints avialable :) As an developer trying to make some useful apps on
semantic web, i would like to concentrate on the apps logic rather than
hosting the data and maintaing it. But it seems that we have the hosting
problem here !!!


Anybody have suggestions/solutions to hosting the LOD data publically ?


Re: linked data hosted somewhere

2008-11-26 Thread Hugh Glaser

Thanks, Kingsley and Aldo.
I have to say you raise quite a lot of concerns, or at least matters of
interest.
I really don't think it is a big deal that I asked someone to consider
resources when accessing my web site, and I am a bit uncomfortable that I
then get messages effectively telling me that my software is poor and I
should be using (buying?) something else.

On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote:



 Hugh Glaser wrote:
 I thought that might be the answer.
 So what is the ontology of the error, so that my SW application can deal with
 it appropriately?
 If it ain¹t RDF it ain¹t sensible in the Semantic Web.
 ;-|
 And the ³entitlement² to spend lots of money by accident; a bit worrying,
 although I assume there are services that allow me to find out at least
 estimates of the cost.

 If you are querying via iSQL or the Virtuoso Conductor you wont be
 moving lots of data between your desktop and EC2. If you do large
 constructs over the sparql protocol or anything else that produces large
 HTTP workloads between EC2 and your location, then you will incur the
 costs (btw - Amazon are quite aggressive re. the costs, so you really
 have to be serving many client i.e., offering a service for costs being
 a major concern).
Er, yes, that was the question we were discussing.
Large constructs over the sparql prototcol.
With respect to costs, I never mentioned Amazon, so I am not sure why that
is the benchmark for comparison.
But I don't want to have a go at the Openlink software (I often recommend it
to people); I was just asking about limitations.
All software has limitations.

 Anyway, Virtuoso let's you control lots of things, including shutting
 down the sparql endpoint. In addition, you will soon be able to offer
 OAuth access to sparql endpoint etc..
Yes, and I didn't really want to have the overhead of interacting with
Ravinder to explain why I had shut down his access to the SPARQL endpoint.
 I suspect that your comment about a bill is a bit of a joke, in that normal
 queries do not require money?
 But it does raise an interesting LOD question.
 Ravinder asked for LOD sets; if I have to pay for the query service, is it
 LOD?

 You pay for traffic that goes in and out of your data space.

 (effective November 26, 2008)
 Fixed Costs ($)
snip amazon costs/
 Here is a purchase link that also exposes the items above.
 https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe
 ringCode=6CB89F71

 Of course, you can always use the Open Source Edition as is and
 reconstruct DBpedia from scratch, the cost-benefit analysis factors come
 down to:

 1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs)
 2. On / Off edition variant of live DBpedia instance that's fully tuned
 and in sync with the master
 Getting back to dealing with awkward queries.
 Detecting what are effectively DoS attacks is not always the easiest thing to
 do.
 Has Bezzo really solved it for a SPARQL endpoint while providing a useful
 service to users with a wide variety of requirements?

 I believe so based on what we can do with Virtuoso on EC2.  One major
 example is the backup feature where we can sync from a Virtuoso instance
 into S3 buckets. Then perform a restore from those buckets (what we do
 re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for
 bucket access.
I don't think this contributes to helping to service complex SPARQL queries,
or have I missed somthing?
 In fact, people don¹t usually offer open SQL access to Open Databases for
 exactly this reason.
 I like to think the day will come when the Semantic Web is so widely used
 that we will have the same problem with SPARQL endpoints.

 The Linked Data Web is going to take us way beyond anything SQL could
 even fantasize about (imho). And one such fantasy is going to be
 accessible sparql endpoints without bringing the house down :-)
Now there I agree.
The power of LD/SW or whatever you call it will indeed take us a long way
further.
And I agree on the fantasy, which is actually what I was saying all along.
It is a fantasy to suggest that you can do all the wrong you want.

But I think it is sensible to take the question to a new thread...

Best
Hugh

 Kingsley




Re: Some FOAF services

2008-11-26 Thread Libby Miller


Mischa,

On 26 Nov 2008, at 15:25, [EMAIL PROTECTED] wrote:


Hello,

Am mailing round to announce some FOAF related services Garlik are  
hosting at foaf.qdos.com.




Very cool stuff.

1. FOAF Validator[1]: We have put together a page which can be used  
to validate foaf documents. We put this together based on common  
errors we found in FOAF documents online. Any suggestions for  
further tests are welcomed.


2. FOAF Reverse Search[2]: This service outputs foaf:knows  
relationships from our KB stating who claims to know the  
foaf:Person in question. You can present the service with either a  
foaf:Person URI like so:


http://foaf.qdos.com/reverse/?path=http://www.w3.org/People/Berners- 
Lee/card%23i


or you can search for an Inverse Function Property (IFP) if the  
foaf:Person URI is not known for example, you can find who claims  
to know the foaf:Person with the following homepage: http:// 
plugin.org.uk/

like so (notice the inclusion of the GET argument ifp):

http://foaf.qdos.com/reverse/?path=http://plugin.org.uk/ifp=

Given the decentralised nature of foaf data, this allows data to be  
presented regarding who claims to know a foaf:Person.


3. FOAF Social Verification [3]: This allows you to make use of the  
foaf social network to act as a whitelist for blog, email, and  
other online activity


4. FOAF Viewer: You can also use our GUI [5]  to visual your foaf  
network. For example, see Steve's foaf file here :

http://foaf.qdos.com/find/?q=http%3A%2F%2Fplugin.org.uk



Are there any plans to offer this service as RDF? (i.e. a non-reverse  
lookup, similar to what google social graph offers?). I've been  
looking at some options for using foaf to pre-populate social sites  
which could use something like this (http://planb.nicecupoftea.org/ 
2008/11/18/foaf-slurper/).


5. FOAF pinger: And finally, If we dont have your foaf file in our  
KB, you can use our ping [4] service to upload your foaf:Document  
to our KB so that you can make use of our services.


Any thoughts/suggestions welcomed,

Mischa

[1] http://foaf.qdos.com/validator/

[2] http://foaf.qdos.com/reverse

[3] http://foaf.qdos.com/verify-demo  http://foaf.qdos.com/verify- 
about


[4] http://foaf.qdos.com/find/

[5] http://foaf.qdos.com/ping
___
Mischa Tuffield
Email: [EMAIL PROTECTED]
Homepage - http://mmt.me.uk/
FOAF - http://mmt.me.uk/foaf.rdf



Libby




Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-26 Thread John Graybeal


On Nov 19, 2008, at 5:34 PM, Richard Cyganiak wrote:

Interestingly, this somewhat echoes an old argument often heard in  
the days of the “URI crisis” a few years ago: ““We must avoid a  
proliferation of URIs. We must avoid having lots of URIs for the  
same thing. Re-use other people's identifiers wherever you can.  
Don't invent your own unless you absolutely have to.””


I think that the emergence of linked data has shattered that  
argument. One of the key practices of linked data is: ““Mint your  
own URIs when you publish new data. *Then* interlink it with other  
data by setting sameAs links to existing identifiers.””


So this sounds like you are saying there is a near-consensus of the  
semantic web community.  Except, the previous thread on URIs and  
Unique IDs emphasized the view of a number of people that multiple  
URIs for the same concept was bad (technical term), especially if  
they are generated en masse.


Do you think the argument is mostly settled, or would you agree that  
duplicating a massive set of URIs for 'local technical simplification'  
is a bad practice? (In which case, is the question just a matter of  
scale?)


John

--
John Graybeal   mailto:[EMAIL PROTECTED]  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org




Re: linked data hosted somewhere

2008-11-26 Thread Kingsley Idehen


Hugh Glaser wrote:

Thanks, Kingsley and Aldo.
I have to say you raise quite a lot of concerns, or at least matters of
interest.
I really don't think it is a big deal that I asked someone to consider
resources when accessing my web site, and I am a bit uncomfortable that I
then get messages effectively telling me that my software is poor and I
should be using (buying?) something else.
  

Hugh,

You're losing me a little, I don't think Aldo or I were making any 
comments about your software per se. or making suggestions about 
alternatives.


Anyway more comments inline below.

On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote:

  

Hugh Glaser wrote:


I thought that might be the answer.
So what is the ontology of the error, so that my SW application can deal with
it appropriately?
If it ain¹t RDF it ain¹t sensible in the Semantic Web.
;-|
And the ³entitlement² to spend lots of money by accident; a bit worrying,
although I assume there are services that allow me to find out at least
estimates of the cost.

  

If you are querying via iSQL or the Virtuoso Conductor you wont be
moving lots of data between your desktop and EC2. If you do large
constructs over the sparql protocol or anything else that produces large
HTTP workloads between EC2 and your location, then you will incur the
costs (btw - Amazon are quite aggressive re. the costs, so you really
have to be serving many client i.e., offering a service for costs being
a major concern).


Er, yes, that was the question we were discussing.
Large constructs over the sparql prototcol.
With respect to costs, I never mentioned Amazon, so I am not sure why that
is the benchmark for comparison.
But I don't want to have a go at the Openlink software (I often recommend it
to people); I was just asking about limitations.
All software has limitations.
  

Anyway, Virtuoso let's you control lots of things, including shutting
down the sparql endpoint. In addition, you will soon be able to offer
OAuth access to sparql endpoint etc..


Yes, and I didn't really want to have the overhead of interacting with
Ravinder to explain why I had shut down his access to the SPARQL endpoint.
  

I suspect that your comment about a bill is a bit of a joke, in that normal
queries do not require money?
But it does raise an interesting LOD question.
Ravinder asked for LOD sets; if I have to pay for the query service, is it
LOD?

  

You pay for traffic that goes in and out of your data space.

(effective November 26, 2008)
Fixed Costs ($)


snip amazon costs/
  

Here is a purchase link that also exposes the items above.
https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe
ringCode=6CB89F71

Of course, you can always use the Open Source Edition as is and
reconstruct DBpedia from scratch, the cost-benefit analysis factors come
down to:

1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs)
2. On / Off edition variant of live DBpedia instance that's fully tuned
and in sync with the master


Getting back to dealing with awkward queries.
Detecting what are effectively DoS attacks is not always the easiest thing to
do.
Has Bezzo really solved it for a SPARQL endpoint while providing a useful
service to users with a wide variety of requirements?

  

I believe so based on what we can do with Virtuoso on EC2.  One major
example is the backup feature where we can sync from a Virtuoso instance
into S3 buckets. Then perform a restore from those buckets (what we do
re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for
bucket access.


I don't think this contributes to helping to service complex SPARQL queries,
or have I missed somthing?
  


Hugh:  I certainly had my response above a little tangled :-(

To clarify, re.  Bezos and DOS bit. 


1.  EC2 instances can be instantiated and destroyed at will
2.  Virtuoso (and I assume other SPARQL engines) have DOS busting 
features such as SPARQL Query Cost Analysis and Rate Limits for HTTP 
requests.




In fact, people don¹t usually offer open SQL access to Open Databases for
exactly this reason.
I like to think the day will come when the Semantic Web is so widely used
that we will have the same problem with SPARQL endpoints.

  

The Linked Data Web is going to take us way beyond anything SQL could
even fantasize about (imho). And one such fantasy is going to be
accessible sparql endpoints without bringing the house down :-)


Now there I agree.
The power of LD/SW or whatever you call it will indeed take us a long way
further.
And I agree on the fantasy, which is actually what I was saying all along.
It is a fantasy to suggest that you can do all the wrong you want.
  

Exactly!

But I think it is sensible to take the question to a new thread...
  


No problem :-)


Kingsley

Best
Hugh
  

Kingsley




  



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 
OpenLink Software Web: 

A VoCamp Galway 2008 success story

2008-11-26 Thread Michael Hausenblas


Just to let you know. One of the outcomes of the recent VoCamps [1] was 
that we have agreed on a final layout for voiD (Vocabulary of 
Interlinked Datasets). It is now available at [2] - please note that the 
actual (final) namespace will be 'http://rdfs.org/ns/void#' ... can't 
fix everything within two days, right :)


A more detailed user guide to follow soon!

Cheers,
Michael

PS: A big thanks to the Neologism (http://neologism.deri.ie/) people for 
creating such an awesome tool and John Breslin for the great support re 
rdfs.org!


[1] http://vocamp.org/wiki/VoCampGalway2008#Outcomes
[2] http://rdfs.org/ns/neologism/void

--
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland
--



Re: linked data hosted somewhere

2008-11-26 Thread Aldo Bucchi

Hi Hugh,

I don't intend to fight at all.
And I don't speak for Kingsley, btw. My views are my own.

On Wed, Nov 26, 2008 at 5:52 PM, Hugh Glaser [EMAIL PROTECTED] wrote:

 Thanks, Kingsley and Aldo.
 I have to say you raise quite a lot of concerns, or at least matters of
 interest.
 I really don't think it is a big deal that I asked someone to consider
 resources when accessing my web site, and I am a bit uncomfortable that I
 then get messages effectively telling me that my software is poor and I
 should be using (buying?) something else.

Too much reading between the lines...
no commercial interests or comparisons intended, honestly.

I think your comment suggesting that he should consider resources is
totally in place but was enough of a cue to jump in and point out that
there is a simple and accessible way to get a use as you pay endpoint
( that can do some nice tricks too ). I was just going over the AMIs
at the moment.

Just look at the tone of the question. Mr Ravinder appeared very
motivated to me, I can imagine him coding a couple of nested loops

Other than that, I think the comment was totally in place ( there is
also the free version too BTW ).


 On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote:



 Hugh Glaser wrote:
 I thought that might be the answer.
 So what is the ontology of the error, so that my SW application can deal 
 with
 it appropriately?
 If it ain¹t RDF it ain¹t sensible in the Semantic Web.
 ;-|
 And the ³entitlement² to spend lots of money by accident; a bit worrying,
 although I assume there are services that allow me to find out at least
 estimates of the cost.

 If you are querying via iSQL or the Virtuoso Conductor you wont be
 moving lots of data between your desktop and EC2. If you do large
 constructs over the sparql protocol or anything else that produces large
 HTTP workloads between EC2 and your location, then you will incur the
 costs (btw - Amazon are quite aggressive re. the costs, so you really
 have to be serving many client i.e., offering a service for costs being
 a major concern).
 Er, yes, that was the question we were discussing.
 Large constructs over the sparql prototcol.
 With respect to costs, I never mentioned Amazon, so I am not sure why that
 is the benchmark for comparison.
 But I don't want to have a go at the Openlink software (I often recommend it
 to people); I was just asking about limitations.
 All software has limitations.

 Anyway, Virtuoso let's you control lots of things, including shutting
 down the sparql endpoint. In addition, you will soon be able to offer
 OAuth access to sparql endpoint etc..
 Yes, and I didn't really want to have the overhead of interacting with
 Ravinder to explain why I had shut down his access to the SPARQL endpoint.
 I suspect that your comment about a bill is a bit of a joke, in that normal
 queries do not require money?
 But it does raise an interesting LOD question.
 Ravinder asked for LOD sets; if I have to pay for the query service, is it
 LOD?

 You pay for traffic that goes in and out of your data space.

 (effective November 26, 2008)
 Fixed Costs ($)
 snip amazon costs/
 Here is a purchase link that also exposes the items above.
 https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe
 ringCode=6CB89F71

 Of course, you can always use the Open Source Edition as is and
 reconstruct DBpedia from scratch, the cost-benefit analysis factors come
 down to:

 1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs)
 2. On / Off edition variant of live DBpedia instance that's fully tuned
 and in sync with the master
 Getting back to dealing with awkward queries.
 Detecting what are effectively DoS attacks is not always the easiest thing 
 to
 do.
 Has Bezzo really solved it for a SPARQL endpoint while providing a useful
 service to users with a wide variety of requirements?

 I believe so based on what we can do with Virtuoso on EC2.  One major
 example is the backup feature where we can sync from a Virtuoso instance
 into S3 buckets. Then perform a restore from those buckets (what we do
 re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for
 bucket access.
 I don't think this contributes to helping to service complex SPARQL queries,
 or have I missed somthing?
 In fact, people don¹t usually offer open SQL access to Open Databases for
 exactly this reason.
 I like to think the day will come when the Semantic Web is so widely used
 that we will have the same problem with SPARQL endpoints.

 The Linked Data Web is going to take us way beyond anything SQL could
 even fantasize about (imho). And one such fantasy is going to be
 accessible sparql endpoints without bringing the house down :-)
 Now there I agree.
 The power of LD/SW or whatever you call it will indeed take us a long way
 further.
 And I agree on the fantasy, which is actually what I was saying all along.
 It is a fantasy to suggest that you can do all the wrong you want.

Now, just to be 

Re: A VoCamp Galway 2008 success story

2008-11-26 Thread Juan Sequeda
Neologism is crucial!! That is an awesome tool!

Really looking forward to do VoCamp in Austin after we do a Linked Data
tutorial!

Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda
[EMAIL PROTECTED]

http://www.juansequeda.com/

Semantic Web in Austin: http://juansequeda.blogspot.com/


On Wed, Nov 26, 2008 at 4:38 PM, Michael Hausenblas 
[EMAIL PROTECTED] wrote:


 Just to let you know. One of the outcomes of the recent VoCamps [1] was
 that we have agreed on a final layout for voiD (Vocabulary of Interlinked
 Datasets). It is now available at [2] - please note that the actual (final)
 namespace will be 'http://rdfs.org/ns/void#' ... can't fix everything
 within two days, right :)

 A more detailed user guide to follow soon!

 Cheers,
Michael

 PS: A big thanks to the Neologism (http://neologism.deri.ie/) people for
 creating such an awesome tool and John Breslin for the great support re
 rdfs.org!

 [1] http://vocamp.org/wiki/VoCampGalway2008#Outcomes
 [2] http://rdfs.org/ns/neologism/void

 --
 Dr. Michael Hausenblas
 DERI - Digital Enterprise Research Institute
 National University of Ireland, Lower Dangan,
 Galway, Ireland
 --




Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Juan Sequeda
Hugh,

Nice point brought up.

What I do see is that even though we can't make a sql query to the facebook
db, we can use the api's to obtain data. Same as some many different
applications that offer api.

Now imagine LOD as the data that a developer can obtain from an api. In this
case, instead of learning the api of several different applications, he
learns the vocabulary. The same way nowadays developers get the data from
different apis to make mashups, LOD is another form of making mashups, but
much better.

I agree that having a sparql endpoint for everyhting may not be safe. That
is why we started thinking about SQUIN [1]. If the data is out there, it is
linked, it is dereferenacble, and it's open, well make the query on SQUIN,
and let SQUIN get the data for you.

My two cents and my vision on the LOD

[1] http://squin.sourceforge.net/
Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda
[EMAIL PROTECTED]

http://www.juansequeda.com/

Semantic Web in Austin: http://juansequeda.blogspot.com/


On Wed, Nov 26, 2008 at 6:18 PM, Hugh Glaser [EMAIL PROTECTED] wrote:


 Prompted by the thread on linked data hosted somewhere I would like to
 ask
 the above question that has been bothering me for a while.

 The only reason anyone can afford to offer a SPARQL endpoint is because it
 doesn't get used too much?

 As abstract components for studying interaction, performance, etc.:
 DB=KB, SQL=SPARQL.
 In fact, I often consider the components themselves interchangeable; that
 is, the first step of the migration to SW technologies for an application
 is
 to take an SQL-based back end and simply replace it with a SPARQL/RDF back
 end and then carry on.

 However.
 No serious DB publisher gives direct SQL access to their DB (I think).
 There are often commercial reasons, of course.
 But even when there are not (the Open in LOD), there are only search
 options
 and possibly download facilities.
 Even government organisations that have a remit to publish their data don't
 offer SQL access.

 Will we not have to do the same?
 Or perhaps there is a subset of SPARQL that I could offer that will allow
 me
 to offer a safer service that conforms to other's safer service (so it is
 well-understood?
 Is this defined, or is anyone working on it?

 And I am not referring to any particular software - it seems to me that
 this
 is something that LODers need to worry about.
 We aim to take over the world; and if SPARQL endpoints are part of that
 (maybe they aren't - just resolvable URIs?), then we should make damn sure
 that we think they can be delivered.

 My answer to my subject question?
 No, not as it stands. And we need to have a story to replace it.

 Best
 Hugh

 ===
 Sorry if this is a second copy, but the first, sent as a new post, seemed
 to
 only elicit a message from [EMAIL PROTECTED] and I can't work out
 or
 find out whether it means the message was rejected or something else, such
 as awaiting moderation.
 So I've done this as a reply.
 ===
 And now a response to the message from Aldo, done here to reduce traffic:

 Very generous of you to write in this way.
 And yes, humour is good.
 And sorry to all for the traffic.

 On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote:

  OK Hugh,
 
  I see what you mean and I understand you being upset. Just re-read the
  conversation word by word because I felt something was not right.
  I did say wacky... is that it?
 
  In that case, and if this caused the confusion, I am really sorry.
 
  I was not talking about your software, this was just a joke. Talking in
  general.
  You replied to my joke with an absurd reply.
 
  My point was simply that, if you want to push things over the edge,
  why not get your own box. We all take care of our infrastructure and
  know its limitations.
 
  So, I formally apologize.
  I am by no means endorsing one piece of software over another ( save
  for mine, but it does't exist yet ;).
  My preferences for virtuoso come from experiential bias.
 
  I hope this clears things up.
  I apologize for the traffic.
 
  However, I do make a formal request for some sense of humor.
  This list tends to get into this kind of discussions, and we will
  start getting more and more visits from outsiders who are not used to
  this sort of sharpness.
 
  Best,
  A
 





Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Peter Ansell
2008/11/27 Hugh Glaser [EMAIL PROTECTED]


 Prompted by the thread on linked data hosted somewhere I would like to
 ask
 the above question that has been bothering me for a while.

 The only reason anyone can afford to offer a SPARQL endpoint is because it
 doesn't get used too much?

 As abstract components for studying interaction, performance, etc.:
 DB=KB, SQL=SPARQL.
 In fact, I often consider the components themselves interchangeable; that
 is, the first step of the migration to SW technologies for an application
 is
 to take an SQL-based back end and simply replace it with a SPARQL/RDF back
 end and then carry on.

 However.
 No serious DB publisher gives direct SQL access to their DB (I think).
 There are often commercial reasons, of course.
 But even when there are not (the Open in LOD), there are only search
 options
 and possibly download facilities.
 Even government organisations that have a remit to publish their data don't
 offer SQL access.

 Will we not have to do the same?
 Or perhaps there is a subset of SPARQL that I could offer that will allow
 me
 to offer a safer service that conforms to other's safer service (so it is
 well-understood?
 Is this defined, or is anyone working on it?

 And I am not referring to any particular software - it seems to me that
 this
 is something that LODers need to worry about.
 We aim to take over the world; and if SPARQL endpoints are part of that
 (maybe they aren't - just resolvable URIs?), then we should make damn sure
 that we think they can be delivered.

 My answer to my subject question?
 No, not as it stands. And we need to have a story to replace it.

 Best
 Hugh


I don't think we can afford to offer the actual public grade infrastructure
for free unless there is corporate backing for particular endpoints.
However, we can still tentatively roll out SPARQL endpoints and resolvers in
mirror configurations together with software which can round robin across
the endpoints to get information without overloading a particular endpoint
to at least get some redundancy and figure out what needs to be done to fine
tune the methods for distributed queries. Once you have the ability to round
robin across sparql endpoints and still choose them intelligently based on a
knowledge of what is inside each one you can distribute the source RDF to
anyone and have them give back the information about how to access the
endpoint, and if people are found to be overloading an endpoint send them a
polite message to either round robin across the available endpoints or get
their own local SPARQL installation which can be configured to respond to
work the same as the public endpoint.

An example implementation of this functionality is the distribution of
queries across endpoints for Bio2RDF [1] which together with the
distribution of a combination of Virtuoso DB files [2] and source NTriples
files [3] make it relatively simple for people to download the software [4],
and the resolver package and redirect the configuration file to their own
local versions for large scale private use of semantics using exactly the
same URI's that resolve using a combination of the publically available
resolvers which may or may not be contacting public SPARQL endpoints. An
example of a public resolver contacting a combination of public and private
SPARQL endpoints is [5]. (Please don't go and overload it though because as
Hugh says, the threat of overloading is quite real for any particular
endpoint :) ).

I do agree that arbitrary SPARQL queries should be localised to private
installations, but before you do that you have to provide easy ways for
people to get private installations which resolve URI's in the same way that
they are in the public web.

Cheers,

Peter

[1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml
[2] http://quebec.bio2rdf.org/download/virtuoso/indexed/
[3] http://quebec.bio2rdf.org/download/n3/
[4] http://sourceforge.net/project/platformdownload.php?group_id=142631
[5] http://bio2rdf.mquter.qut.edu.au/


Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Peter Ansell
2008/11/27 Juan Sequeda [EMAIL PROTECTED]

 Hugh,

 Nice point brought up.

 What I do see is that even though we can't make a sql query to the facebook
 db, we can use the api's to obtain data. Same as some many different
 applications that offer api.

 Now imagine LOD as the data that a developer can obtain from an api. In
 this case, instead of learning the api of several different applications, he
 learns the vocabulary. The same way nowadays developers get the data from
 different apis to make mashups, LOD is another form of making mashups, but
 much better.

 I agree that having a sparql endpoint for everyhting may not be safe. That
 is why we started thinking about SQUIN [1]. If the data is out there, it is
 linked, it is dereferenacble, and it's open, well make the query on SQUIN,
 and let SQUIN get the data for you.


There is still the issue of people wanting to do more advanced things with
the data in an efficient manner. Resolvable URI's are great, but if you have
to resolve every single URI to finish queries which should be simple with
sparql like reverse links (eg.
http://bio2rdf.mquter.qut.edu.au/links/geneid:12345), then you either have
to make up URI's that stand in for the queries, as Bio2RDF have done (see
[1]), or you provide SPARQL access.

A system that just resolves data URI's to a local cache won't be able to
efficiently perform global queries as efficiently as one where the queries
are converted to URI's and access to the actual SPARQL endpoint is
effectively prevented for long running, performance hampering queries
because it is behind the resolver.

Cheers,

Peter

[1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml


Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Kingsley Idehen


Hugh Glaser wrote:

Prompted by the thread on linked data hosted somewhere I would like to ask
the above question that has been bothering me for a while.

The only reason anyone can afford to offer a SPARQL endpoint is because it
doesn't get used too much?
  

No.

For instance DBpedia has offered a SPARQL endpoint in public view from 
day one to demonstrate what a public sparql endpoint can deliver.


The SPARQL Engine has to be able to work out the Cost of a Query and 
and have intelligence re. resultset (solution) sizes and final deliver 
of the resultsets. In short, it has to construct a query fulfillment 
matrix that is server side configurable and enforceable.


In the SQL realm of ODBC/JDBC/etc. we had to do the same thing with our 
Drivers knowing the high probability of deliberate or inadvertent DOS 
via Cartesian products.  Naturally, this approach is intrinsic to Virtuoso.


Any public facing query interface needs to have the capabilities above. 
Even Google uses similar techniques when delivering its document 
database realm search engine services.

As abstract components for studying interaction, performance, etc.:
DB=KB, SQL=SPARQL.
In fact, I often consider the components themselves interchangeable; that
is, the first step of the migration to SW technologies for an application is
to take an SQL-based back end and simply replace it with a SPARQL/RDF back
end and then carry on.

However.
No serious DB publisher gives direct SQL access to their DB (I think).
  
It really depends on the task at hand, and the factors allotted to 
change sensitivity. If the change sensitivity factor has a  high 
weighting  then some form of cursoring against the main server  offers a 
viable solution, but most don't go there because only a handful of DBMS 
Drivers actually support all the cursor models (Keyset, Dynamic, Mixed, 
and Static). Even worse, most of the Drivers (bar ours) aren't equipped 
with the fulfillment matrix capabilities I described above.  If 
scrollable cursors aren't workable, you also have highly granular 
transactional replication as a change sensitivity issue handler re. 
indirect access, but  these aren't common across all DBMS engines.



There are often commercial reasons, of course.
But even when there are not (the Open in LOD), there are only search options
and possibly download facilities.
Even government organisations that have a remit to publish their data don't
offer SQL access.

  


From my vantage point exposing SQL wouldn't have really solved the 
issue at hand (putting the DOS issues aside) anyhow. The data source 
name granularity offered  in the RDBMS realm simply isn't there. This is 
fundamentally why HTTP based Data Source Naming (using URIs) and HTTP 
based Data Access by Reference (Linked Data) is ultimately so powerful. 
It addresses what open SQL RDBMS access would never have been able to 
deliver re. open data access and connectivity.

Will we not have to do the same?
Or perhaps there is a subset of SPARQL that I could offer that will allow me
to offer a safer service that conforms to other's safer service (so it is
well-understood?
Is this defined, or is anyone working on it?
  
I really think this is going to come down the the RDF DBMS Engine (as 
per my initial comments).

And I am not referring to any particular software - it seems to me that this
is something that LODers need to worry about.
  

LODers are not necessarily DBMS people, I think it's important to note :-)

It's one thing to know how to query a DBMS and a totally different 
kettle of fish re. building one. 

What LOD needs to do is take engagement of the broader DBMS community 
very seriously.



We aim to take over the world; and if SPARQL endpoints are part of that
(maybe they aren't - just resolvable URIs?), then we should make damn sure
that we think they can be delivered.
  


I would say we aim to open up data access to world via the World Wide 
Web :-)


Kingsley

My answer to my subject question?
No, not as it stands. And we need to have a story to replace it.

Best
Hugh

===
Sorry if this is a second copy, but the first, sent as a new post, seemed to
only elicit a message from [EMAIL PROTECTED] and I can't work out or
find out whether it means the message was rejected or something else, such
as awaiting moderation.
So I've done this as a reply.
===
And now a response to the message from Aldo, done here to reduce traffic:

Very generous of you to write in this way.
And yes, humour is good.
And sorry to all for the traffic.

On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote:

  

OK Hugh,

I see what you mean and I understand you being upset. Just re-read the
conversation word by word because I felt something was not right.
I did say wacky... is that it?

In that case, and if this caused the confusion, I am really sorry.

I was not talking about your software, this was just a joke. Talking in
general.
You replied to my joke with an absurd 

Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Aldo Bucchi

Hugh,

Let's just look forward. This is not the same world, not the same game
and definitely not the same problem.
The comparison stops the minute you realize we now have billions of
computers connected, and a globally distributed DSN.

The trick is understanding that we are not exposing SQL endpoints,
standing on the shore and throwing stones at the ocean hoping to fill
it up. We are throwing powder jelly that will create solid land over
which we will be able to walk very soon. What we are doing is
assembling ONE big database, because we have ONE namespace that meshes
everything ( thanks to the URIs ) and one transport mechanism.

And the force that will drive us to open data is economic.

Your database contains facts that complement my records and we both
benefit from the mutual exchange, and this happens more efficiently in
an unplanned manner.

Serendipity and unplanned knowledge generation.

So, if you consider the URI and the WWW, the comparison between SPARQL
and SQL and Databases is not enough. However, I admit it is fair and
sometimes necessary at a micro-level.

The tech details will be solved in a snap. As you can see from
Kingsley's response, this is not a new problem, but rather a new
opportunity.

I guess the question is: Why would anyone open up their data before?
when the integration had to be done manually...

Best.,
A

On Wed, Nov 26, 2008 at 9:18 PM, Hugh Glaser [EMAIL PROTECTED] wrote:
 Prompted by the thread on linked data hosted somewhere I would like to ask
 the above question that has been bothering me for a while.

 The only reason anyone can afford to offer a SPARQL endpoint is because it
 doesn't get used too much?

 As abstract components for studying interaction, performance, etc.:
 DB=KB, SQL=SPARQL.
 In fact, I often consider the components themselves interchangeable; that
 is, the first step of the migration to SW technologies for an application is
 to take an SQL-based back end and simply replace it with a SPARQL/RDF back
 end and then carry on.

 However.
 No serious DB publisher gives direct SQL access to their DB (I think).
 There are often commercial reasons, of course.
 But even when there are not (the Open in LOD), there are only search options
 and possibly download facilities.
 Even government organisations that have a remit to publish their data don't
 offer SQL access.

 Will we not have to do the same?
 Or perhaps there is a subset of SPARQL that I could offer that will allow me
 to offer a safer service that conforms to other's safer service (so it is
 well-understood?
 Is this defined, or is anyone working on it?

 And I am not referring to any particular software - it seems to me that this
 is something that LODers need to worry about.
 We aim to take over the world; and if SPARQL endpoints are part of that
 (maybe they aren't - just resolvable URIs?), then we should make damn sure
 that we think they can be delivered.

 My answer to my subject question?
 No, not as it stands. And we need to have a story to replace it.

 Best
 Hugh

 ===
 Sorry if this is a second copy, but the first, sent as a new post, seemed to
 only elicit a message from [EMAIL PROTECTED] and I can't work out or
 find out whether it means the message was rejected or something else, such
 as awaiting moderation.
 So I've done this as a reply.
 ===
 And now a response to the message from Aldo, done here to reduce traffic:

 Very generous of you to write in this way.
 And yes, humour is good.
 And sorry to all for the traffic.

 On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote:

 OK Hugh,

 I see what you mean and I understand you being upset. Just re-read the
 conversation word by word because I felt something was not right.
 I did say wacky... is that it?

 In that case, and if this caused the confusion, I am really sorry.

 I was not talking about your software, this was just a joke. Talking in
 general.
 You replied to my joke with an absurd reply.

 My point was simply that, if you want to push things over the edge,
 why not get your own box. We all take care of our infrastructure and
 know its limitations.

 So, I formally apologize.
 I am by no means endorsing one piece of software over another ( save
 for mine, but it does't exist yet ;).
 My preferences for virtuoso come from experiential bias.

 I hope this clears things up.
 I apologize for the traffic.

 However, I do make a formal request for some sense of humor.
 This list tends to get into this kind of discussions, and we will
 start getting more and more visits from outsiders who are not used to
 this sort of sharpness.

 Best,
 A






-- 
Aldo Bucchi
U N I V R Z
Office: +56 2 795 4532
Mobile:+56 9 7623 8653
skype:aldo.bucchi
http://www.univrz.com/
http://aldobucchi.com

PRIVILEGED AND CONFIDENTIAL INFORMATION
This message is only for the use of the individual or entity to which it is
addressed and may contain information that is privileged and confidential. If
you are not the 

Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)

2008-11-26 Thread Kingsley Idehen


Peter Ansell wrote:

2008/11/27 Hugh Glaser [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


Prompted by the thread on linked data hosted somewhere I would
like to ask
the above question that has been bothering me for a while.

The only reason anyone can afford to offer a SPARQL endpoint is
because it
doesn't get used too much?

As abstract components for studying interaction, performance, etc.:
DB=KB, SQL=SPARQL.
In fact, I often consider the components themselves
interchangeable; that
is, the first step of the migration to SW technologies for an
application is
to take an SQL-based back end and simply replace it with a
SPARQL/RDF back
end and then carry on.

However.
No serious DB publisher gives direct SQL access to their DB (I think).
There are often commercial reasons, of course.
But even when there are not (the Open in LOD), there are only
search options
and possibly download facilities.
Even government organisations that have a remit to publish their
data don't
offer SQL access.

Will we not have to do the same?
Or perhaps there is a subset of SPARQL that I could offer that
will allow me
to offer a safer service that conforms to other's safer service
(so it is
well-understood?
Is this defined, or is anyone working on it?

And I am not referring to any particular software - it seems to me
that this
is something that LODers need to worry about.
We aim to take over the world; and if SPARQL endpoints are part of
that
(maybe they aren't - just resolvable URIs?), then we should make
damn sure
that we think they can be delivered.

My answer to my subject question?
No, not as it stands. And we need to have a story to replace it.

Best
Hugh


I don't think we can afford to offer the actual public grade 
infrastructure for free unless there is corporate backing for 
particular endpoints. However, we can still tentatively roll out 
SPARQL endpoints and resolvers in mirror configurations together with 
software which can round robin across the endpoints to get information 
without overloading a particular endpoint to at least get some 
redundancy and figure out what needs to be done to fine tune the 
methods for distributed queries. Once you have the ability to round 
robin across sparql endpoints and still choose them intelligently 
based on a knowledge of what is inside each one you can distribute the 
source RDF to anyone and have them give back the information about how 
to access the endpoint, and if people are found to be overloading an 
endpoint send them a polite message to either round robin across the 
available endpoints or get their own local SPARQL installation which 
can be configured to respond to work the same as the public endpoint.


An example implementation of this functionality is the distribution of 
queries across endpoints for Bio2RDF [1] which together with the 
distribution of a combination of Virtuoso DB files [2] and source 
NTriples files [3] make it relatively simple for people to download 
the software [4], and the resolver package and redirect the 
configuration file to their own local versions for large scale private 
use of semantics using exactly the same URI's that resolve using a 
combination of the publically available resolvers which may or may not 
be contacting public SPARQL endpoints. An example of a public resolver 
contacting a combination of public and private SPARQL endpoints is 
[5]. (Please don't go and overload it though because as Hugh says, the 
threat of overloading is quite real for any particular endpoint :) ).

Peter,

If you configure the Virtuoso INI file appropriately the deliberate or 
inadvertent DOS vulnerability is alleviated.


You can append this to your Virtuoso INI (if not there already):

[SPARQL]
ResultSetMaxRows   = 1000
DefaultGraph   = http://bio2rdf.org
MaxQueryExecutionTime  = 60  ; seconds
MaxQueryCostEstimationTime = 400 ; seconds
DefaultQuery   = select distinct ?Concept where {[] a ?Concept}




I do agree that arbitrary SPARQL queries should be localised to 
private installations, but before you do that you have to provide easy 
ways for people to get private installations which resolve URI's in 
the same way that they are in the public web.
We have also made this part of the DBpedia on EC2 solution, thus, the 
URIs are localized while retaining original data source links by 
attribution etc.
So http://ec2-cname/resource/Berlin will be resolved locally will 
using an attribution link (dc:source) to 
http://dbpedia.org/resource/Berlin . The attribution triple doesn't 
exist in the quad store (so it doesn't result in one for each resource 
thereby increasing size unnecessarily), we simply produce it on the 
fly via a re-write rule.


Kingsley





Cheers,

Peter

[1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml
[2]