Re: linked data hosted somewhere
can we use this data on EC2 form environments outside EC2. I thoguht we already have the LOD hosted somewhere with nice SPARQL etc endpoints avialable :) As an developer trying to make some useful apps on semantic web, i would like to concentrate on the apps logic rather than hosting the data and maintaing it. But it seems that we have the hosting problem here !!! Anybody have suggestions/solutions to hosting the LOD data publically ?
Re: linked data hosted somewhere
Thanks, Kingsley and Aldo. I have to say you raise quite a lot of concerns, or at least matters of interest. I really don't think it is a big deal that I asked someone to consider resources when accessing my web site, and I am a bit uncomfortable that I then get messages effectively telling me that my software is poor and I should be using (buying?) something else. On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote: Hugh Glaser wrote: I thought that might be the answer. So what is the ontology of the error, so that my SW application can deal with it appropriately? If it ain¹t RDF it ain¹t sensible in the Semantic Web. ;-| And the ³entitlement² to spend lots of money by accident; a bit worrying, although I assume there are services that allow me to find out at least estimates of the cost. If you are querying via iSQL or the Virtuoso Conductor you wont be moving lots of data between your desktop and EC2. If you do large constructs over the sparql protocol or anything else that produces large HTTP workloads between EC2 and your location, then you will incur the costs (btw - Amazon are quite aggressive re. the costs, so you really have to be serving many client i.e., offering a service for costs being a major concern). Er, yes, that was the question we were discussing. Large constructs over the sparql prototcol. With respect to costs, I never mentioned Amazon, so I am not sure why that is the benchmark for comparison. But I don't want to have a go at the Openlink software (I often recommend it to people); I was just asking about limitations. All software has limitations. Anyway, Virtuoso let's you control lots of things, including shutting down the sparql endpoint. In addition, you will soon be able to offer OAuth access to sparql endpoint etc.. Yes, and I didn't really want to have the overhead of interacting with Ravinder to explain why I had shut down his access to the SPARQL endpoint. I suspect that your comment about a bill is a bit of a joke, in that normal queries do not require money? But it does raise an interesting LOD question. Ravinder asked for LOD sets; if I have to pay for the query service, is it LOD? You pay for traffic that goes in and out of your data space. (effective November 26, 2008) Fixed Costs ($) snip amazon costs/ Here is a purchase link that also exposes the items above. https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe ringCode=6CB89F71 Of course, you can always use the Open Source Edition as is and reconstruct DBpedia from scratch, the cost-benefit analysis factors come down to: 1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs) 2. On / Off edition variant of live DBpedia instance that's fully tuned and in sync with the master Getting back to dealing with awkward queries. Detecting what are effectively DoS attacks is not always the easiest thing to do. Has Bezzo really solved it for a SPARQL endpoint while providing a useful service to users with a wide variety of requirements? I believe so based on what we can do with Virtuoso on EC2. One major example is the backup feature where we can sync from a Virtuoso instance into S3 buckets. Then perform a restore from those buckets (what we do re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for bucket access. I don't think this contributes to helping to service complex SPARQL queries, or have I missed somthing? In fact, people don¹t usually offer open SQL access to Open Databases for exactly this reason. I like to think the day will come when the Semantic Web is so widely used that we will have the same problem with SPARQL endpoints. The Linked Data Web is going to take us way beyond anything SQL could even fantasize about (imho). And one such fantasy is going to be accessible sparql endpoints without bringing the house down :-) Now there I agree. The power of LD/SW or whatever you call it will indeed take us a long way further. And I agree on the fantasy, which is actually what I was saying all along. It is a fantasy to suggest that you can do all the wrong you want. But I think it is sensible to take the question to a new thread... Best Hugh Kingsley
Re: Some FOAF services
Mischa, On 26 Nov 2008, at 15:25, [EMAIL PROTECTED] wrote: Hello, Am mailing round to announce some FOAF related services Garlik are hosting at foaf.qdos.com. Very cool stuff. 1. FOAF Validator[1]: We have put together a page which can be used to validate foaf documents. We put this together based on common errors we found in FOAF documents online. Any suggestions for further tests are welcomed. 2. FOAF Reverse Search[2]: This service outputs foaf:knows relationships from our KB stating who claims to know the foaf:Person in question. You can present the service with either a foaf:Person URI like so: http://foaf.qdos.com/reverse/?path=http://www.w3.org/People/Berners- Lee/card%23i or you can search for an Inverse Function Property (IFP) if the foaf:Person URI is not known for example, you can find who claims to know the foaf:Person with the following homepage: http:// plugin.org.uk/ like so (notice the inclusion of the GET argument ifp): http://foaf.qdos.com/reverse/?path=http://plugin.org.uk/ifp= Given the decentralised nature of foaf data, this allows data to be presented regarding who claims to know a foaf:Person. 3. FOAF Social Verification [3]: This allows you to make use of the foaf social network to act as a whitelist for blog, email, and other online activity 4. FOAF Viewer: You can also use our GUI [5] to visual your foaf network. For example, see Steve's foaf file here : http://foaf.qdos.com/find/?q=http%3A%2F%2Fplugin.org.uk Are there any plans to offer this service as RDF? (i.e. a non-reverse lookup, similar to what google social graph offers?). I've been looking at some options for using foaf to pre-populate social sites which could use something like this (http://planb.nicecupoftea.org/ 2008/11/18/foaf-slurper/). 5. FOAF pinger: And finally, If we dont have your foaf file in our KB, you can use our ping [4] service to upload your foaf:Document to our KB so that you can make use of our services. Any thoughts/suggestions welcomed, Mischa [1] http://foaf.qdos.com/validator/ [2] http://foaf.qdos.com/reverse [3] http://foaf.qdos.com/verify-demo http://foaf.qdos.com/verify- about [4] http://foaf.qdos.com/find/ [5] http://foaf.qdos.com/ping ___ Mischa Tuffield Email: [EMAIL PROTECTED] Homepage - http://mmt.me.uk/ FOAF - http://mmt.me.uk/foaf.rdf Libby
Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)
On Nov 19, 2008, at 5:34 PM, Richard Cyganiak wrote: Interestingly, this somewhat echoes an old argument often heard in the days of the URI crisis a few years ago: “We must avoid a proliferation of URIs. We must avoid having lots of URIs for the same thing. Re-use other people's identifiers wherever you can. Don't invent your own unless you absolutely have to.” I think that the emergence of linked data has shattered that argument. One of the key practices of linked data is: “Mint your own URIs when you publish new data. *Then* interlink it with other data by setting sameAs links to existing identifiers.” So this sounds like you are saying there is a near-consensus of the semantic web community. Except, the previous thread on URIs and Unique IDs emphasized the view of a number of people that multiple URIs for the same concept was bad (technical term), especially if they are generated en masse. Do you think the argument is mostly settled, or would you agree that duplicating a massive set of URIs for 'local technical simplification' is a bad practice? (In which case, is the question just a matter of scale?) John -- John Graybeal mailto:[EMAIL PROTECTED] -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Interoperability Project: http://marinemetadata.org
Re: linked data hosted somewhere
Hugh Glaser wrote: Thanks, Kingsley and Aldo. I have to say you raise quite a lot of concerns, or at least matters of interest. I really don't think it is a big deal that I asked someone to consider resources when accessing my web site, and I am a bit uncomfortable that I then get messages effectively telling me that my software is poor and I should be using (buying?) something else. Hugh, You're losing me a little, I don't think Aldo or I were making any comments about your software per se. or making suggestions about alternatives. Anyway more comments inline below. On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote: Hugh Glaser wrote: I thought that might be the answer. So what is the ontology of the error, so that my SW application can deal with it appropriately? If it ain¹t RDF it ain¹t sensible in the Semantic Web. ;-| And the ³entitlement² to spend lots of money by accident; a bit worrying, although I assume there are services that allow me to find out at least estimates of the cost. If you are querying via iSQL or the Virtuoso Conductor you wont be moving lots of data between your desktop and EC2. If you do large constructs over the sparql protocol or anything else that produces large HTTP workloads between EC2 and your location, then you will incur the costs (btw - Amazon are quite aggressive re. the costs, so you really have to be serving many client i.e., offering a service for costs being a major concern). Er, yes, that was the question we were discussing. Large constructs over the sparql prototcol. With respect to costs, I never mentioned Amazon, so I am not sure why that is the benchmark for comparison. But I don't want to have a go at the Openlink software (I often recommend it to people); I was just asking about limitations. All software has limitations. Anyway, Virtuoso let's you control lots of things, including shutting down the sparql endpoint. In addition, you will soon be able to offer OAuth access to sparql endpoint etc.. Yes, and I didn't really want to have the overhead of interacting with Ravinder to explain why I had shut down his access to the SPARQL endpoint. I suspect that your comment about a bill is a bit of a joke, in that normal queries do not require money? But it does raise an interesting LOD question. Ravinder asked for LOD sets; if I have to pay for the query service, is it LOD? You pay for traffic that goes in and out of your data space. (effective November 26, 2008) Fixed Costs ($) snip amazon costs/ Here is a purchase link that also exposes the items above. https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe ringCode=6CB89F71 Of course, you can always use the Open Source Edition as is and reconstruct DBpedia from scratch, the cost-benefit analysis factors come down to: 1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs) 2. On / Off edition variant of live DBpedia instance that's fully tuned and in sync with the master Getting back to dealing with awkward queries. Detecting what are effectively DoS attacks is not always the easiest thing to do. Has Bezzo really solved it for a SPARQL endpoint while providing a useful service to users with a wide variety of requirements? I believe so based on what we can do with Virtuoso on EC2. One major example is the backup feature where we can sync from a Virtuoso instance into S3 buckets. Then perform a restore from those buckets (what we do re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for bucket access. I don't think this contributes to helping to service complex SPARQL queries, or have I missed somthing? Hugh: I certainly had my response above a little tangled :-( To clarify, re. Bezos and DOS bit. 1. EC2 instances can be instantiated and destroyed at will 2. Virtuoso (and I assume other SPARQL engines) have DOS busting features such as SPARQL Query Cost Analysis and Rate Limits for HTTP requests. In fact, people don¹t usually offer open SQL access to Open Databases for exactly this reason. I like to think the day will come when the Semantic Web is so widely used that we will have the same problem with SPARQL endpoints. The Linked Data Web is going to take us way beyond anything SQL could even fantasize about (imho). And one such fantasy is going to be accessible sparql endpoints without bringing the house down :-) Now there I agree. The power of LD/SW or whatever you call it will indeed take us a long way further. And I agree on the fantasy, which is actually what I was saying all along. It is a fantasy to suggest that you can do all the wrong you want. Exactly! But I think it is sensible to take the question to a new thread... No problem :-) Kingsley Best Hugh Kingsley -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web:
A VoCamp Galway 2008 success story
Just to let you know. One of the outcomes of the recent VoCamps [1] was that we have agreed on a final layout for voiD (Vocabulary of Interlinked Datasets). It is now available at [2] - please note that the actual (final) namespace will be 'http://rdfs.org/ns/void#' ... can't fix everything within two days, right :) A more detailed user guide to follow soon! Cheers, Michael PS: A big thanks to the Neologism (http://neologism.deri.ie/) people for creating such an awesome tool and John Breslin for the great support re rdfs.org! [1] http://vocamp.org/wiki/VoCampGalway2008#Outcomes [2] http://rdfs.org/ns/neologism/void -- Dr. Michael Hausenblas DERI - Digital Enterprise Research Institute National University of Ireland, Lower Dangan, Galway, Ireland --
Re: linked data hosted somewhere
Hi Hugh, I don't intend to fight at all. And I don't speak for Kingsley, btw. My views are my own. On Wed, Nov 26, 2008 at 5:52 PM, Hugh Glaser [EMAIL PROTECTED] wrote: Thanks, Kingsley and Aldo. I have to say you raise quite a lot of concerns, or at least matters of interest. I really don't think it is a big deal that I asked someone to consider resources when accessing my web site, and I am a bit uncomfortable that I then get messages effectively telling me that my software is poor and I should be using (buying?) something else. Too much reading between the lines... no commercial interests or comparisons intended, honestly. I think your comment suggesting that he should consider resources is totally in place but was enough of a cue to jump in and point out that there is a simple and accessible way to get a use as you pay endpoint ( that can do some nice tricks too ). I was just going over the AMIs at the moment. Just look at the tone of the question. Mr Ravinder appeared very motivated to me, I can imagine him coding a couple of nested loops Other than that, I think the comment was totally in place ( there is also the free version too BTW ). On 26/11/2008 02:12, Kingsley Idehen [EMAIL PROTECTED] wrote: Hugh Glaser wrote: I thought that might be the answer. So what is the ontology of the error, so that my SW application can deal with it appropriately? If it ain¹t RDF it ain¹t sensible in the Semantic Web. ;-| And the ³entitlement² to spend lots of money by accident; a bit worrying, although I assume there are services that allow me to find out at least estimates of the cost. If you are querying via iSQL or the Virtuoso Conductor you wont be moving lots of data between your desktop and EC2. If you do large constructs over the sparql protocol or anything else that produces large HTTP workloads between EC2 and your location, then you will incur the costs (btw - Amazon are quite aggressive re. the costs, so you really have to be serving many client i.e., offering a service for costs being a major concern). Er, yes, that was the question we were discussing. Large constructs over the sparql prototcol. With respect to costs, I never mentioned Amazon, so I am not sure why that is the benchmark for comparison. But I don't want to have a go at the Openlink software (I often recommend it to people); I was just asking about limitations. All software has limitations. Anyway, Virtuoso let's you control lots of things, including shutting down the sparql endpoint. In addition, you will soon be able to offer OAuth access to sparql endpoint etc.. Yes, and I didn't really want to have the overhead of interacting with Ravinder to explain why I had shut down his access to the SPARQL endpoint. I suspect that your comment about a bill is a bit of a joke, in that normal queries do not require money? But it does raise an interesting LOD question. Ravinder asked for LOD sets; if I have to pay for the query service, is it LOD? You pay for traffic that goes in and out of your data space. (effective November 26, 2008) Fixed Costs ($) snip amazon costs/ Here is a purchase link that also exposes the items above. https://aws-portal.amazon.com/gp/aws/user/subscription/index.html?ie=UTF8offe ringCode=6CB89F71 Of course, you can always use the Open Source Edition as is and reconstruct DBpedia from scratch, the cost-benefit analysis factors come down to: 1. Construction and Commissioning time (1 - 1.5 hrs vs 16 - 22 hrs) 2. On / Off edition variant of live DBpedia instance that's fully tuned and in sync with the master Getting back to dealing with awkward queries. Detecting what are effectively DoS attacks is not always the easiest thing to do. Has Bezzo really solved it for a SPARQL endpoint while providing a useful service to users with a wide variety of requirements? I believe so based on what we can do with Virtuoso on EC2. One major example is the backup feature where we can sync from a Virtuoso instance into S3 buckets. Then perform a restore from those buckets (what we do re. DBpedia). In our case we offer HTTP/WebDAV or the S3 protocol for bucket access. I don't think this contributes to helping to service complex SPARQL queries, or have I missed somthing? In fact, people don¹t usually offer open SQL access to Open Databases for exactly this reason. I like to think the day will come when the Semantic Web is so widely used that we will have the same problem with SPARQL endpoints. The Linked Data Web is going to take us way beyond anything SQL could even fantasize about (imho). And one such fantasy is going to be accessible sparql endpoints without bringing the house down :-) Now there I agree. The power of LD/SW or whatever you call it will indeed take us a long way further. And I agree on the fantasy, which is actually what I was saying all along. It is a fantasy to suggest that you can do all the wrong you want. Now, just to be
Re: A VoCamp Galway 2008 success story
Neologism is crucial!! That is an awesome tool! Really looking forward to do VoCamp in Austin after we do a Linked Data tutorial! Juan Sequeda, Ph.D Student Research Assistant Dept. of Computer Sciences The University of Texas at Austin http://www.cs.utexas.edu/~jsequeda [EMAIL PROTECTED] http://www.juansequeda.com/ Semantic Web in Austin: http://juansequeda.blogspot.com/ On Wed, Nov 26, 2008 at 4:38 PM, Michael Hausenblas [EMAIL PROTECTED] wrote: Just to let you know. One of the outcomes of the recent VoCamps [1] was that we have agreed on a final layout for voiD (Vocabulary of Interlinked Datasets). It is now available at [2] - please note that the actual (final) namespace will be 'http://rdfs.org/ns/void#' ... can't fix everything within two days, right :) A more detailed user guide to follow soon! Cheers, Michael PS: A big thanks to the Neologism (http://neologism.deri.ie/) people for creating such an awesome tool and John Breslin for the great support re rdfs.org! [1] http://vocamp.org/wiki/VoCampGalway2008#Outcomes [2] http://rdfs.org/ns/neologism/void -- Dr. Michael Hausenblas DERI - Digital Enterprise Research Institute National University of Ireland, Lower Dangan, Galway, Ireland --
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
Hugh, Nice point brought up. What I do see is that even though we can't make a sql query to the facebook db, we can use the api's to obtain data. Same as some many different applications that offer api. Now imagine LOD as the data that a developer can obtain from an api. In this case, instead of learning the api of several different applications, he learns the vocabulary. The same way nowadays developers get the data from different apis to make mashups, LOD is another form of making mashups, but much better. I agree that having a sparql endpoint for everyhting may not be safe. That is why we started thinking about SQUIN [1]. If the data is out there, it is linked, it is dereferenacble, and it's open, well make the query on SQUIN, and let SQUIN get the data for you. My two cents and my vision on the LOD [1] http://squin.sourceforge.net/ Juan Sequeda, Ph.D Student Research Assistant Dept. of Computer Sciences The University of Texas at Austin http://www.cs.utexas.edu/~jsequeda [EMAIL PROTECTED] http://www.juansequeda.com/ Semantic Web in Austin: http://juansequeda.blogspot.com/ On Wed, Nov 26, 2008 at 6:18 PM, Hugh Glaser [EMAIL PROTECTED] wrote: Prompted by the thread on linked data hosted somewhere I would like to ask the above question that has been bothering me for a while. The only reason anyone can afford to offer a SPARQL endpoint is because it doesn't get used too much? As abstract components for studying interaction, performance, etc.: DB=KB, SQL=SPARQL. In fact, I often consider the components themselves interchangeable; that is, the first step of the migration to SW technologies for an application is to take an SQL-based back end and simply replace it with a SPARQL/RDF back end and then carry on. However. No serious DB publisher gives direct SQL access to their DB (I think). There are often commercial reasons, of course. But even when there are not (the Open in LOD), there are only search options and possibly download facilities. Even government organisations that have a remit to publish their data don't offer SQL access. Will we not have to do the same? Or perhaps there is a subset of SPARQL that I could offer that will allow me to offer a safer service that conforms to other's safer service (so it is well-understood? Is this defined, or is anyone working on it? And I am not referring to any particular software - it seems to me that this is something that LODers need to worry about. We aim to take over the world; and if SPARQL endpoints are part of that (maybe they aren't - just resolvable URIs?), then we should make damn sure that we think they can be delivered. My answer to my subject question? No, not as it stands. And we need to have a story to replace it. Best Hugh === Sorry if this is a second copy, but the first, sent as a new post, seemed to only elicit a message from [EMAIL PROTECTED] and I can't work out or find out whether it means the message was rejected or something else, such as awaiting moderation. So I've done this as a reply. === And now a response to the message from Aldo, done here to reduce traffic: Very generous of you to write in this way. And yes, humour is good. And sorry to all for the traffic. On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote: OK Hugh, I see what you mean and I understand you being upset. Just re-read the conversation word by word because I felt something was not right. I did say wacky... is that it? In that case, and if this caused the confusion, I am really sorry. I was not talking about your software, this was just a joke. Talking in general. You replied to my joke with an absurd reply. My point was simply that, if you want to push things over the edge, why not get your own box. We all take care of our infrastructure and know its limitations. So, I formally apologize. I am by no means endorsing one piece of software over another ( save for mine, but it does't exist yet ;). My preferences for virtuoso come from experiential bias. I hope this clears things up. I apologize for the traffic. However, I do make a formal request for some sense of humor. This list tends to get into this kind of discussions, and we will start getting more and more visits from outsiders who are not used to this sort of sharpness. Best, A
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
2008/11/27 Hugh Glaser [EMAIL PROTECTED] Prompted by the thread on linked data hosted somewhere I would like to ask the above question that has been bothering me for a while. The only reason anyone can afford to offer a SPARQL endpoint is because it doesn't get used too much? As abstract components for studying interaction, performance, etc.: DB=KB, SQL=SPARQL. In fact, I often consider the components themselves interchangeable; that is, the first step of the migration to SW technologies for an application is to take an SQL-based back end and simply replace it with a SPARQL/RDF back end and then carry on. However. No serious DB publisher gives direct SQL access to their DB (I think). There are often commercial reasons, of course. But even when there are not (the Open in LOD), there are only search options and possibly download facilities. Even government organisations that have a remit to publish their data don't offer SQL access. Will we not have to do the same? Or perhaps there is a subset of SPARQL that I could offer that will allow me to offer a safer service that conforms to other's safer service (so it is well-understood? Is this defined, or is anyone working on it? And I am not referring to any particular software - it seems to me that this is something that LODers need to worry about. We aim to take over the world; and if SPARQL endpoints are part of that (maybe they aren't - just resolvable URIs?), then we should make damn sure that we think they can be delivered. My answer to my subject question? No, not as it stands. And we need to have a story to replace it. Best Hugh I don't think we can afford to offer the actual public grade infrastructure for free unless there is corporate backing for particular endpoints. However, we can still tentatively roll out SPARQL endpoints and resolvers in mirror configurations together with software which can round robin across the endpoints to get information without overloading a particular endpoint to at least get some redundancy and figure out what needs to be done to fine tune the methods for distributed queries. Once you have the ability to round robin across sparql endpoints and still choose them intelligently based on a knowledge of what is inside each one you can distribute the source RDF to anyone and have them give back the information about how to access the endpoint, and if people are found to be overloading an endpoint send them a polite message to either round robin across the available endpoints or get their own local SPARQL installation which can be configured to respond to work the same as the public endpoint. An example implementation of this functionality is the distribution of queries across endpoints for Bio2RDF [1] which together with the distribution of a combination of Virtuoso DB files [2] and source NTriples files [3] make it relatively simple for people to download the software [4], and the resolver package and redirect the configuration file to their own local versions for large scale private use of semantics using exactly the same URI's that resolve using a combination of the publically available resolvers which may or may not be contacting public SPARQL endpoints. An example of a public resolver contacting a combination of public and private SPARQL endpoints is [5]. (Please don't go and overload it though because as Hugh says, the threat of overloading is quite real for any particular endpoint :) ). I do agree that arbitrary SPARQL queries should be localised to private installations, but before you do that you have to provide easy ways for people to get private installations which resolve URI's in the same way that they are in the public web. Cheers, Peter [1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml [2] http://quebec.bio2rdf.org/download/virtuoso/indexed/ [3] http://quebec.bio2rdf.org/download/n3/ [4] http://sourceforge.net/project/platformdownload.php?group_id=142631 [5] http://bio2rdf.mquter.qut.edu.au/
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
2008/11/27 Juan Sequeda [EMAIL PROTECTED] Hugh, Nice point brought up. What I do see is that even though we can't make a sql query to the facebook db, we can use the api's to obtain data. Same as some many different applications that offer api. Now imagine LOD as the data that a developer can obtain from an api. In this case, instead of learning the api of several different applications, he learns the vocabulary. The same way nowadays developers get the data from different apis to make mashups, LOD is another form of making mashups, but much better. I agree that having a sparql endpoint for everyhting may not be safe. That is why we started thinking about SQUIN [1]. If the data is out there, it is linked, it is dereferenacble, and it's open, well make the query on SQUIN, and let SQUIN get the data for you. There is still the issue of people wanting to do more advanced things with the data in an efficient manner. Resolvable URI's are great, but if you have to resolve every single URI to finish queries which should be simple with sparql like reverse links (eg. http://bio2rdf.mquter.qut.edu.au/links/geneid:12345), then you either have to make up URI's that stand in for the queries, as Bio2RDF have done (see [1]), or you provide SPARQL access. A system that just resolves data URI's to a local cache won't be able to efficiently perform global queries as efficiently as one where the queries are converted to URI's and access to the actual SPARQL endpoint is effectively prevented for long running, performance hampering queries because it is behind the resolver. Cheers, Peter [1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
Hugh Glaser wrote: Prompted by the thread on linked data hosted somewhere I would like to ask the above question that has been bothering me for a while. The only reason anyone can afford to offer a SPARQL endpoint is because it doesn't get used too much? No. For instance DBpedia has offered a SPARQL endpoint in public view from day one to demonstrate what a public sparql endpoint can deliver. The SPARQL Engine has to be able to work out the Cost of a Query and and have intelligence re. resultset (solution) sizes and final deliver of the resultsets. In short, it has to construct a query fulfillment matrix that is server side configurable and enforceable. In the SQL realm of ODBC/JDBC/etc. we had to do the same thing with our Drivers knowing the high probability of deliberate or inadvertent DOS via Cartesian products. Naturally, this approach is intrinsic to Virtuoso. Any public facing query interface needs to have the capabilities above. Even Google uses similar techniques when delivering its document database realm search engine services. As abstract components for studying interaction, performance, etc.: DB=KB, SQL=SPARQL. In fact, I often consider the components themselves interchangeable; that is, the first step of the migration to SW technologies for an application is to take an SQL-based back end and simply replace it with a SPARQL/RDF back end and then carry on. However. No serious DB publisher gives direct SQL access to their DB (I think). It really depends on the task at hand, and the factors allotted to change sensitivity. If the change sensitivity factor has a high weighting then some form of cursoring against the main server offers a viable solution, but most don't go there because only a handful of DBMS Drivers actually support all the cursor models (Keyset, Dynamic, Mixed, and Static). Even worse, most of the Drivers (bar ours) aren't equipped with the fulfillment matrix capabilities I described above. If scrollable cursors aren't workable, you also have highly granular transactional replication as a change sensitivity issue handler re. indirect access, but these aren't common across all DBMS engines. There are often commercial reasons, of course. But even when there are not (the Open in LOD), there are only search options and possibly download facilities. Even government organisations that have a remit to publish their data don't offer SQL access. From my vantage point exposing SQL wouldn't have really solved the issue at hand (putting the DOS issues aside) anyhow. The data source name granularity offered in the RDBMS realm simply isn't there. This is fundamentally why HTTP based Data Source Naming (using URIs) and HTTP based Data Access by Reference (Linked Data) is ultimately so powerful. It addresses what open SQL RDBMS access would never have been able to deliver re. open data access and connectivity. Will we not have to do the same? Or perhaps there is a subset of SPARQL that I could offer that will allow me to offer a safer service that conforms to other's safer service (so it is well-understood? Is this defined, or is anyone working on it? I really think this is going to come down the the RDF DBMS Engine (as per my initial comments). And I am not referring to any particular software - it seems to me that this is something that LODers need to worry about. LODers are not necessarily DBMS people, I think it's important to note :-) It's one thing to know how to query a DBMS and a totally different kettle of fish re. building one. What LOD needs to do is take engagement of the broader DBMS community very seriously. We aim to take over the world; and if SPARQL endpoints are part of that (maybe they aren't - just resolvable URIs?), then we should make damn sure that we think they can be delivered. I would say we aim to open up data access to world via the World Wide Web :-) Kingsley My answer to my subject question? No, not as it stands. And we need to have a story to replace it. Best Hugh === Sorry if this is a second copy, but the first, sent as a new post, seemed to only elicit a message from [EMAIL PROTECTED] and I can't work out or find out whether it means the message was rejected or something else, such as awaiting moderation. So I've done this as a reply. === And now a response to the message from Aldo, done here to reduce traffic: Very generous of you to write in this way. And yes, humour is good. And sorry to all for the traffic. On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote: OK Hugh, I see what you mean and I understand you being upset. Just re-read the conversation word by word because I felt something was not right. I did say wacky... is that it? In that case, and if this caused the confusion, I am really sorry. I was not talking about your software, this was just a joke. Talking in general. You replied to my joke with an absurd
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
Hugh, Let's just look forward. This is not the same world, not the same game and definitely not the same problem. The comparison stops the minute you realize we now have billions of computers connected, and a globally distributed DSN. The trick is understanding that we are not exposing SQL endpoints, standing on the shore and throwing stones at the ocean hoping to fill it up. We are throwing powder jelly that will create solid land over which we will be able to walk very soon. What we are doing is assembling ONE big database, because we have ONE namespace that meshes everything ( thanks to the URIs ) and one transport mechanism. And the force that will drive us to open data is economic. Your database contains facts that complement my records and we both benefit from the mutual exchange, and this happens more efficiently in an unplanned manner. Serendipity and unplanned knowledge generation. So, if you consider the URI and the WWW, the comparison between SPARQL and SQL and Databases is not enough. However, I admit it is fair and sometimes necessary at a micro-level. The tech details will be solved in a snap. As you can see from Kingsley's response, this is not a new problem, but rather a new opportunity. I guess the question is: Why would anyone open up their data before? when the integration had to be done manually... Best., A On Wed, Nov 26, 2008 at 9:18 PM, Hugh Glaser [EMAIL PROTECTED] wrote: Prompted by the thread on linked data hosted somewhere I would like to ask the above question that has been bothering me for a while. The only reason anyone can afford to offer a SPARQL endpoint is because it doesn't get used too much? As abstract components for studying interaction, performance, etc.: DB=KB, SQL=SPARQL. In fact, I often consider the components themselves interchangeable; that is, the first step of the migration to SW technologies for an application is to take an SQL-based back end and simply replace it with a SPARQL/RDF back end and then carry on. However. No serious DB publisher gives direct SQL access to their DB (I think). There are often commercial reasons, of course. But even when there are not (the Open in LOD), there are only search options and possibly download facilities. Even government organisations that have a remit to publish their data don't offer SQL access. Will we not have to do the same? Or perhaps there is a subset of SPARQL that I could offer that will allow me to offer a safer service that conforms to other's safer service (so it is well-understood? Is this defined, or is anyone working on it? And I am not referring to any particular software - it seems to me that this is something that LODers need to worry about. We aim to take over the world; and if SPARQL endpoints are part of that (maybe they aren't - just resolvable URIs?), then we should make damn sure that we think they can be delivered. My answer to my subject question? No, not as it stands. And we need to have a story to replace it. Best Hugh === Sorry if this is a second copy, but the first, sent as a new post, seemed to only elicit a message from [EMAIL PROTECTED] and I can't work out or find out whether it means the message was rejected or something else, such as awaiting moderation. So I've done this as a reply. === And now a response to the message from Aldo, done here to reduce traffic: Very generous of you to write in this way. And yes, humour is good. And sorry to all for the traffic. On 27/11/2008 00:02, Aldo Bucchi [EMAIL PROTECTED] wrote: OK Hugh, I see what you mean and I understand you being upset. Just re-read the conversation word by word because I felt something was not right. I did say wacky... is that it? In that case, and if this caused the confusion, I am really sorry. I was not talking about your software, this was just a joke. Talking in general. You replied to my joke with an absurd reply. My point was simply that, if you want to push things over the edge, why not get your own box. We all take care of our infrastructure and know its limitations. So, I formally apologize. I am by no means endorsing one piece of software over another ( save for mine, but it does't exist yet ;). My preferences for virtuoso come from experiential bias. I hope this clears things up. I apologize for the traffic. However, I do make a formal request for some sense of humor. This list tends to get into this kind of discussions, and we will start getting more and more visits from outsiders who are not used to this sort of sharpness. Best, A -- Aldo Bucchi U N I V R Z Office: +56 2 795 4532 Mobile:+56 9 7623 8653 skype:aldo.bucchi http://www.univrz.com/ http://aldobucchi.com PRIVILEGED AND CONFIDENTIAL INFORMATION This message is only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If you are not the
Re: Can we afford to offer SPARQL endpoints when we are successful? (Was linked data hosted somewhere)
Peter Ansell wrote: 2008/11/27 Hugh Glaser [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Prompted by the thread on linked data hosted somewhere I would like to ask the above question that has been bothering me for a while. The only reason anyone can afford to offer a SPARQL endpoint is because it doesn't get used too much? As abstract components for studying interaction, performance, etc.: DB=KB, SQL=SPARQL. In fact, I often consider the components themselves interchangeable; that is, the first step of the migration to SW technologies for an application is to take an SQL-based back end and simply replace it with a SPARQL/RDF back end and then carry on. However. No serious DB publisher gives direct SQL access to their DB (I think). There are often commercial reasons, of course. But even when there are not (the Open in LOD), there are only search options and possibly download facilities. Even government organisations that have a remit to publish their data don't offer SQL access. Will we not have to do the same? Or perhaps there is a subset of SPARQL that I could offer that will allow me to offer a safer service that conforms to other's safer service (so it is well-understood? Is this defined, or is anyone working on it? And I am not referring to any particular software - it seems to me that this is something that LODers need to worry about. We aim to take over the world; and if SPARQL endpoints are part of that (maybe they aren't - just resolvable URIs?), then we should make damn sure that we think they can be delivered. My answer to my subject question? No, not as it stands. And we need to have a story to replace it. Best Hugh I don't think we can afford to offer the actual public grade infrastructure for free unless there is corporate backing for particular endpoints. However, we can still tentatively roll out SPARQL endpoints and resolvers in mirror configurations together with software which can round robin across the endpoints to get information without overloading a particular endpoint to at least get some redundancy and figure out what needs to be done to fine tune the methods for distributed queries. Once you have the ability to round robin across sparql endpoints and still choose them intelligently based on a knowledge of what is inside each one you can distribute the source RDF to anyone and have them give back the information about how to access the endpoint, and if people are found to be overloading an endpoint send them a polite message to either round robin across the available endpoints or get their own local SPARQL installation which can be configured to respond to work the same as the public endpoint. An example implementation of this functionality is the distribution of queries across endpoints for Bio2RDF [1] which together with the distribution of a combination of Virtuoso DB files [2] and source NTriples files [3] make it relatively simple for people to download the software [4], and the resolver package and redirect the configuration file to their own local versions for large scale private use of semantics using exactly the same URI's that resolve using a combination of the publically available resolvers which may or may not be contacting public SPARQL endpoints. An example of a public resolver contacting a combination of public and private SPARQL endpoints is [5]. (Please don't go and overload it though because as Hugh says, the threat of overloading is quite real for any particular endpoint :) ). Peter, If you configure the Virtuoso INI file appropriately the deliberate or inadvertent DOS vulnerability is alleviated. You can append this to your Virtuoso INI (if not there already): [SPARQL] ResultSetMaxRows = 1000 DefaultGraph = http://bio2rdf.org MaxQueryExecutionTime = 60 ; seconds MaxQueryCostEstimationTime = 400 ; seconds DefaultQuery = select distinct ?Concept where {[] a ?Concept} I do agree that arbitrary SPARQL queries should be localised to private installations, but before you do that you have to provide easy ways for people to get private installations which resolve URI's in the same way that they are in the public web. We have also made this part of the DBpedia on EC2 solution, thus, the URIs are localized while retaining original data source links by attribution etc. So http://ec2-cname/resource/Berlin will be resolved locally will using an attribution link (dc:source) to http://dbpedia.org/resource/Berlin . The attribution triple doesn't exist in the quad store (so it doesn't result in one for each resource thereby increasing size unnecessarily), we simply produce it on the fly via a re-write rule. Kingsley Cheers, Peter [1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml [2]