Re: Where are the Linked Data Driven Smart Agents (Bots) ?
Ah, ’twas always thus, in every field I know. To put it bluntly, that is because much research is about getting papers published, not about moving the field on. Functional programming (I used to be functional): David Turner (he with a brain the size of a planet) said that his application of combinators to functional language implementation had “cost 10 years of wasted PhD students”. This was because they had all striven to improve in tiny ways on the original implementation. Of course most failed, as it had sprung perfectly-formed from his brain, and more importantly been perfectly-engineered by him into the system†. But even when they succeeded, the increment didn't amount to a hill of beans. And again, watching paper after paper purporting to improve some theoretical upper-bound on execution of some variant of the λ-calculus, which in fact could never be reached without immense execution overheads, was pretty depressing, in terms of wasted research time. Of course, small increments are not always useless. In Operational Research they are bread and butter. But that is because it is a mature field with clear applications, and if you can improve a Search/Optimisation technique by a fraction of a percent, you might save millions on the billions cost of something. In the first decades of a field, it is highly unlikely that incremental change will be significant in the long run, not least because entirely new methods and techniques will be discovered, making the base ones redundant, and rendering the increment moot. †One of my favourite comments was in the garbage collector of David's C implementation of his SK-reduction machine: "Now follow everything on the C stack that looks like a pointer". :-) > On 8 Jul 2016, at 05:01, Ruben Verborghwrote: > > HI Krzysztof, > >> this is all about finding the right balance > > Definitely—but I have the feeling the balance > is currently tipped very much to one side > (and perhaps not the side that delivers > the most urgent components for the SemWeb). > >> as we also do not want to have tons of 'ideas' >> papers without any substantial content or proof of concept > > Mere ideas would indeed not be sufficient; > but even papers with substantial content > and/or a proof of concept will have a difficult time > getting accepted if there is no evaluation > that satisfies the reviewers. > (And, lacking a framework to evaluate evaluations, > I see people typically choosing for things they know, > hence why incremental research gets accepted easily.) > > Best, > > Ruben
CFPs and the lists
Hmmm. So I am enjoying the new regime without CfPs on the LOD list (many thanks, Phil!). However, I now find myself thinking I will unsubscribe from the SemWeb list, since it is almost all CfPs, few, if any, of which I want. I think this may be an unintended consequence (although probably predictable) - losing people from SemWeb. There isn’t really a question here - I just thought I would report it. If there is a question: dare I suggest that now things have settled down, and we can see how things are working, that we might want to revisit the idea of having a separate list for CfPs, and reclaim the SemWeb list for discussion? (Sorry Phil?) Or is the answer that I simply set mail filters and carry on? Best Hugh
[Job Advert] Developer / Data Scientist
Seme4 is looking for people to do exciting things: http://www.seme4.com/wp-content/uploads/2016/05/Seme4-developer-vacancy-May2016.pdf Feel free to email me if you like. Best Hugh -- Hugh Glaser Chief Architect Seme4 Limited International House Southampton International Business Park Southampton Hampshire SO18 2RZ Mobile: +44 7595 334155 Main: +44 20 7060 1590 hugh.gla...@seme4.com www.seme4.com
Re: Deprecating owl:sameAs
And would we also have owl:differentDifferentButSame? The built-in OWL property owl:differentDifferentButSame links things to things. Such an owl:differentDifferentButSame statement indicates that two URI references actually refer to different things but may be the same under some circumstances. > On 1 Apr 2016, at 14:01, Sarven Capadisliwrote: > > There is overwhelming research [1, 2, 3] and I think it is evident at this > point that owl:sameAs is used inarticulately in the LOD cloud. > > The research that I've done makes me conclude that we need to do a massive > sweep of the LOD cloud and adopt owl:sameSameButDifferent. > > I think the terminology is human-friendly enough that there will be minimal > confusion down the line, but for the the pedants among us, we can define it > along the lines of: > > > The built-in OWL property owl:sameSameButDifferent links things to things. > Such an owl:sameSameButDifferent statement indicates that two URI references > actually refer to the same thing but may be different under some > circumstances. > > > Thoughts? > > [1] https://www.w3.org/2009/12/rdf-ws/papers/ws21 > [2] http://www.bbc.co.uk/ontologies/coreconcepts#terms_sameAs > [3] http://schema.org/sameAs > > -Sarven > http://csarven.ca/#i >
Re: Survey: Use of this list for Calls for Papers
Hi Phil, Good question. I’m afraid none of the username/passwords I have for w3.org seem to work. Can you give me a hint at which pair I should be using, or tell me how to retrieve/reset, please? While I’m here… :-) a) I think the idea of allowing CFPs, as long as they clearly have [CFP] or whatever in the subject line, is great. b) We could pick one of the two lists, then we would see less duplication; I would suggest semweb, as that embraces LD. (Maybe I would get to vote that way, but I don’t know what the 4 questions are :-) ) c) I don’t want to have CFPs shortened - I often read my email when I am offline (in fact I keep such emails to read offline), and it is a pain when the information is all “just a click away”, but I can’t get it. Best Hugh > On 30 Mar 2016, at 12:21, Phil Archerwrote: > > Dear all, > > A perennial topic at W3C is whether we should allow calls for papers to be > posted to our mailing lists. Many argue, passionately, that we should not > allow any CfPs on any lists. It is now likely that this will be the policy, > with any message detected as being a CfP marked as spam (and therefore > blocked). > > Historically, the semantic-web and public-lod lists have been used for CfPs > and we are happy for this to continue *iff* you want it. > > Last time we asked, the consensus was that CfPs were seen as useful, but it's > time to ask you again. > > Please take a minute to answer the 4 question, no need for free text, survey > at https://www.w3.org/2002/09/wbs/1/1/ > > Thanks > > Phil. > > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 >
Re: SEMANTiCS 2016, Leipzig, Sep 12-15, Call for Research & Innovation Papers
Hi. I’m sort of puzzled by this. We used to have: SEMANTiCS 2015 : 11th International Conference on Semantic Systems SEMANTICS 2014 : 10th International Conference on Semantic Systems etc. > On 18 Jan 2016, at 09:56, Sebastian Hellmann >wrote: > > Call for Research & Innovation Papers But now we have: > SEMANTiCS 2016 - The Linked Data Conference It doesn’t look like the call has changed much. Linked (Open) Data is now additionally in a bit of one of the topics, and also in a couple of other places as usual. Has anything significant changed? Obviously I am asking because I have never thought of SEMANTiCS as the go to place for Linked Data research publication. And when I look at the Verticals, for example, it isn’t immediately obvious that it is the the list of things to choose for "The Linked Data Conference”. On the other hand, a conference with all those topics, where there was a *requirement* that authors used Linked Data technologies and practices would be pretty exciting. For example, would a submission on "Smart Connectivity, Networking & Interlinking” that didn’t use or apply to Linked Data be rejected as out of scope? Is that what’s happening? Best regards Hugh > > Transfer // Engineering // Community > > > 12th International Conference on Semantic Systems > > Leipzig, Germany > > September 12 -15, 2016 > > http://2016.semantics.cc > > > Important Dates (Research & Innovation) > > • Abstract Submission Deadline:April 14, 2016 (11:59 pm, > Hawaii time) > • Paper Submission Deadline:April 21, 2016 (11:59 pm, > Hawaii time) > • Notification of Acceptance: May 26, 2016 (11:59 pm, > Hawaii time) > • Camera-Ready Paper: June 16, 2016 (11:59 pm, > Hawaii time) > Submissions via Easychair: > https://easychair.org/conferences/?conf=semantics2016research > > As in the previous years, SEMANTiCS’16 proceedings are expected to be > published by ACM ICP. > > > > The annual SEMANTiCS conference is the meeting place for professionals who > make semantic computing work, who understand its benefits and encounter its > limitations. Every year, SEMANTiCS attracts information managers, > IT-architects, software engineers and researchers from organisations ranging > from NPOs, through public administrations to the largest companies in the > world. Attendees learn from industry experts and top researchers about > emerging trends and topics in the fields of semantic software, enterprise > data, linked data & open data strategies, methodologies in knowledge > modelling and text & data analytics. The SEMANTiCS community is highly > diverse; attendees have responsibilities in interlinking areas like > knowledge management, technical documentation, e-commerce, big data > analytics, enterprise search, document management, business intelligence and > enterprise vocabulary management. > > The success of last year’s conference in Vienna with more than 280 attendees > from 22 countries proves that SEMANTiCS 2016 will continue a long tradition > of bringing together colleagues from around the world. There will be > presentations on industry implementations, use case prototypes, best > practices, panels, papers and posters to discuss semantic systems in > birds-of-a-feather sessions as well as informal settings. SEMANTICS addresses > problems common among information managers, software engineers, IT-architects > and various specialist departments working to develop, implement and/or > evaluate semantic software systems. > > The SEMANTiCS program is a rich mix of technical talks, panel discussions of > important topics and presentations by people who make things work - just like > you. In addition, attendees can network with experts in a variety of fields. > These relationships provide great value to organisations as they encounter > subtle technical issues in any stage of implementation. The expertise gained > by SEMANTiCS attendees has a long-term impact on their careers and > organisations. These factors make SEMANTiCS for our community the major > industry related event across Europe. > > > SEMANTiCS 2016 will especially welcome submissions for the following hot > topics: > > • Data Quality Management > • Data Science (Data Mining, Machine Learning, Network Analytics) > • Semantics on the Web, Linked (Open) Data & schema.org > • Corporate Knowledge Graphs > • Knowledge Integration and Language Technologies > • Economics of Data, Data Services and Data Ecosystems > > Following the success of previous years, the ‘horizontals’ (research) and > ‘verticals’ (industries) below are of interest for the conference: > > Horizontals > > • Enterprise Linked Data & Data Integration > • Knowledge Discovery & Intelligent Search > • Business Models, Governance & Data Strategies >
Re: CfP: WWW2016 workshop on Linked Data on the Web (LDOW2016)
Many thanks Chris, very helpful information, and very quickly. And good news too! > On 3 Nov 2015, at 15:45, Christian Bizerwrote: > > Hi Hugh, > >> Hi Chris et al, >> Great stuff. >> Can you tell me please if it will be possible to register for the workshop >> on its own, or will a registration for the full WWW be required to register >> for the workshop? > > The WWW2016 workshop track chairs just confirmed that it will be possible to > register again for the workshop days (not a specific workshop) similar to the > arrangement last year. > > The concrete prices seem not to be set yet, but last year the fees were 410 > Euro just for the workshop days compared to 850 Euro for the full pass. > > See http://www.www2015.it/registrations/ > > Cheers and hope to see you in Montreal, > > Chris > >> >>> On 2 Nov 2015, at 09:06, Christian Bizer wrote: >>> >>> Hi all, >>> >>> Sören Auer, Tim Berners-Lee, Tom Heath, and I are organizing the 9th edition >>> of the Linked Data on the Web workshop at WWW2016 in Montreal, Canada. The >>> paper submission deadline for the workshop is 24 January, 2016. Please find >>> the call for papers below. >>> >>> We are looking forward to having another exciting workshop and to seeing >>> many of you in Montreal. >>> >>> Cheers, >>> >>> Chris, Tim, Sören, and Tom >>> >>> >>> >>> >>> Call for Papers: 9th Workshop on Linked Data on the Web (LDOW2016) >>> >>> >>>Co-located with 25th International World Wide Web Conference >>> April 11 to 15, 2016 in Montreal, Canada >>> >>> >>> http://events.linkeddata.org/ldow2016/ >>> >>> >>> >>> The Web is developing from a medium for publishing textual documents into a >>> medium for sharing structured data. This trend is fueled on the one hand by >>> the adoption of the Linked Data principles by a growing number of data >>> providers. On the other hand, large numbers of websites have started to >>> semantically mark up the content of their HTML pages and thus also >>> contribute to the wealth of structured data available on the Web. >>> >>> The 9th Workshop on Linked Data on the Web (LDOW2016) aims to stimulate >>> discussion and further research into the challenges of publishing, >>> consuming, and integrating structured data from the Web as well as mining >>> knowledge from the global Web of Data. The special focus of this year’s LDOW >>> workshop will be Web Data Quality Assessment and Web Data Cleansing. >>> >>> >>> *Important Dates* >>> >>> * Submission deadline: 24 January, 2016 (23:59 Pacific Time) >>> * Notification of acceptance: 10 February, 2016 >>> * Camera-ready versions of accepted papers: 1 March, 2016 >>> * Workshop date: 11-13 April, 2016 >>> >>> >>> *Topics of Interest* >>> >>> Topics of interest for the workshop include, but are not limited to, the >>> following: >>> >>> Web Data Quality Assessment >>> * methods for evaluating the quality and trustworthiness of web data >>> * tracking the provenance of web data >>> * profiling and change tracking of web data sources >>> * cost and benefits of web data quality assessment >>> * web data quality assessment benchmarks >>> >>> Web Data Cleansing >>> * methods for cleansing web data >>> * data fusion and truth discovery >>> * conflict resolution using semantic knowledge >>> * human-in-the-loop and crowdsourcing for data cleansing >>> * cost and benefits of web data cleansing >>> * web data quality cleansing benchmarks >>> >>> Integrating Web Data from Large Numbers of Data Sources >>> * linking algorithms and heuristics, identity resolution >>> * schema matching and clustering >>> * evaluation of linking and schema matching methods >>> >>> Mining the Web of Data >>> * large-scale derivation of implicit knowledge from the Web of Data >>> * using the Web of Data as background knowledge in data mining >>> * techniques and methodologies for Linked Data mining and analytics >>> >>> Linked Data Applications >>> * application showcases including Web data browsers and search engines >>> * marketplaces, aggregators and indexes for Web Data >>> * security, access control, and licensing issues of Linked Data >>> * role of Linked Data within enterprise applications (e.g. ERP, SCM, CRM) >>> * Linked Data applications for life-sciences, digital humanities, social >>> sciences etc. >>> >>> >>> *Submissions* >>> >>> We seek two kinds of submissions: >>> >>> 1. Full scientific papers: up to 10 pages in ACM format >>> 2. Short scientific and position papers: up to 5 pages in ACM format >>> >>> Submissions must be formatted using the ACM SIG template available at >>> http://www.acm.org/sigs/publications/proceedings-templates. Accepted papers >>> will be presented at the workshop and included in the CEUR workshop >>> proceedings. At least one author of each paper has to register for the >>> workshop and to present the paper. >>> >>> >>> *Organizing Committee* >>> >>> Christian Bizer,
Re: CfP: WWW2016 workshop on Linked Data on the Web (LDOW2016)
Hi Chris et al, Great stuff. Can you tell me please if it will be possible to register for the workshop on its own, or will a registration for the full WWW be required to register for the workshop? Thanks. Best Hugh > On 2 Nov 2015, at 09:06, Christian Bizerwrote: > > Hi all, > > Sören Auer, Tim Berners-Lee, Tom Heath, and I are organizing the 9th edition > of the Linked Data on the Web workshop at WWW2016 in Montreal, Canada. The > paper submission deadline for the workshop is 24 January, 2016. Please find > the call for papers below. > > We are looking forward to having another exciting workshop and to seeing > many of you in Montreal. > > Cheers, > > Chris, Tim, Sören, and Tom > > > > > Call for Papers: 9th Workshop on Linked Data on the Web (LDOW2016) > > > Co-located with 25th International World Wide Web Conference >April 11 to 15, 2016 in Montreal, Canada > > > http://events.linkeddata.org/ldow2016/ > > > > The Web is developing from a medium for publishing textual documents into a > medium for sharing structured data. This trend is fueled on the one hand by > the adoption of the Linked Data principles by a growing number of data > providers. On the other hand, large numbers of websites have started to > semantically mark up the content of their HTML pages and thus also > contribute to the wealth of structured data available on the Web. > > The 9th Workshop on Linked Data on the Web (LDOW2016) aims to stimulate > discussion and further research into the challenges of publishing, > consuming, and integrating structured data from the Web as well as mining > knowledge from the global Web of Data. The special focus of this year’s LDOW > workshop will be Web Data Quality Assessment and Web Data Cleansing. > > > *Important Dates* > > * Submission deadline: 24 January, 2016 (23:59 Pacific Time) > * Notification of acceptance: 10 February, 2016 > * Camera-ready versions of accepted papers: 1 March, 2016 > * Workshop date: 11-13 April, 2016 > > > *Topics of Interest* > > Topics of interest for the workshop include, but are not limited to, the > following: > > Web Data Quality Assessment > * methods for evaluating the quality and trustworthiness of web data > * tracking the provenance of web data > * profiling and change tracking of web data sources > * cost and benefits of web data quality assessment > * web data quality assessment benchmarks > > Web Data Cleansing > * methods for cleansing web data > * data fusion and truth discovery > * conflict resolution using semantic knowledge > * human-in-the-loop and crowdsourcing for data cleansing > * cost and benefits of web data cleansing > * web data quality cleansing benchmarks > > Integrating Web Data from Large Numbers of Data Sources > * linking algorithms and heuristics, identity resolution > * schema matching and clustering > * evaluation of linking and schema matching methods > > Mining the Web of Data > * large-scale derivation of implicit knowledge from the Web of Data > * using the Web of Data as background knowledge in data mining > * techniques and methodologies for Linked Data mining and analytics > > Linked Data Applications > * application showcases including Web data browsers and search engines > * marketplaces, aggregators and indexes for Web Data > * security, access control, and licensing issues of Linked Data > * role of Linked Data within enterprise applications (e.g. ERP, SCM, CRM) > * Linked Data applications for life-sciences, digital humanities, social > sciences etc. > > > *Submissions* > > We seek two kinds of submissions: > > 1. Full scientific papers: up to 10 pages in ACM format > 2. Short scientific and position papers: up to 5 pages in ACM format > > Submissions must be formatted using the ACM SIG template available at > http://www.acm.org/sigs/publications/proceedings-templates. Accepted papers > will be presented at the workshop and included in the CEUR workshop > proceedings. At least one author of each paper has to register for the > workshop and to present the paper. > > > *Organizing Committee* > > Christian Bizer, University of Mannheim, Germany > Tom Heath, Open Data Institute, UK > Sören Auer, University of Bonn and Fraunhofer IAIS, Germany > Tim Berners-Lee, W3C/MIT, USA > > > *Contact Information* > > For further information about the workshop, please contact the workshops > chairs at: ldow2...@events.linkeddata.org > > > -- > Prof. Dr. Christian Bizer > Data and Web Science Group > University of Mannheim, Germany > ch...@informatik.uni-mannheim.de > http://dws.informatik.uni-mannheim.de/bizer > > > > >
Re: Discovering a query endpoint associated with a given Linked Data resource
information about the query endpoints. - dbpedia:Sri_Lanka void:inDataset _:DBpedia . _:DBpedia a void:Dataset; void:sparqlEndpoint http://dbpedia.org/sparql; void:uriLookupEndpoint http://fragments.dbpedia.org/2014/en?subject= . -- or Link: http://dbpedia.org/void/Dataset; rel=http://rdfs.org/ns/void#inDataset; Best Regards, Nandana [1] http://www.w3.org/TR/void/#discovery-links On Wed, Aug 26, 2015 at 11:05 AM, Miel Vander Sande miel.vandersa...@ugent.be wrote: Hi Nandana, I guess VoID would be the best fit In case of LDF you could use ... void:uriLookupEndpoint http://fragments.dbpedia.org/2014/en?subject= But wether these exists in practice? Probably not. I'd leave it up to the dereference publisher to provide this triple in te response, rather than doing the .well_known thing. Best, Miel On 26 Aug 2015, at 10:57, Víctor Rodríguez Doncel vrodrig...@fi.upm.es wrote: Well, you might try to look in this folder location: .well-known/void And possibly find a void:sparqlEndpoint. But this would be too good to be true. Regards, Víctor El 26/08/2015 10:45, Nandana Mihindukulasooriya escribió: Hi, Is there a standard or widely used way of discovering a query endpoint (SPARQL/LDF) associated with a given Linked Data resource? I know that a client can use the follow your nose and related link traversal approaches such as [1], but if I wonder if it is possible to have a hybrid approach in which the dereferenceable Linked Data resources that optionally advertise query endpoint(s) in a standard way so that the clients can perform queries on related data. To clarify the use case a bit, when a client dereferences a resource URI it gets a set of triples (an RDF graph) [2]. In some cases, it might be possible that the returned graph could be a subgraph of a named graph / default graph of an RDF dataset. The client wants to discover if a query endpoint that exposes the relevant dataset, if one is available. For example, something like the following using the search link relation [3]. -- HEAD /resource/Sri_Lanka Host: http://dbpedia.org -- 200 OK Link: http://dbpedia.org/sparql; rel=search; type=sparql, http://fragments.dbpedia.org/2014/en#dataset; rel=search; type=ldf ... other headers ... -- Best Regards, Nandana [1] http://swsa.semanticweb.org/sites/g/files/g524521/f/201507/DissertationOlafHartig_0.pdf [2] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-rdf-graph [3] http://www.iana.org/assignments/link-relations/link-relations.xhtml -- Víctor Rodríguez-Doncel D3205 - Ontology Engineering Group (OEG) Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo s/n Boadilla del Monte-28660 Madrid, Spain Tel. (+34) 91336 3672 Skype: vroddon3 --- El software de antivirus Avast ha analizado este correo electrónico en busca de virus. https://www.avast.com/antivirus -- Prof. Dr. Heiko Paulheim Data and Web Science Group University of Mannheim Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: he...@informatik.uni-mannheim.de Web: www.heikopaulheim.com -- Prof. Dr. Heiko Paulheim Data and Web Science Group University of Mannheim Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: he...@informatik.uni-mannheim.de Web: www.heikopaulheim.com -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Discovering a query endpoint associated with a given Linked Data resource
Thanks. Yeah, pretty close, although I doubt that the RDF returned would have anything of type dcat:Distribution in it to let me form the triple. Sorry, looking at my email I realise I was having a brain fart on seeAlso - of course endpoints are Resources (unless I am also having a senior moment :-) ). Also, sorry, by the way, Laurens, I think backlinks services such as LOD Laundromat are great, and an important part of the LD world. It is just that I think it should be simpler for consumers in this case (and more efficient) Perhaps I should have suggested: http://sws.geonames.org/3405870/ void:inDataset :DBpedia . :DBpedia void:sparqlEndpoint http://dbpedia.org/sparql . to be delivered with the RDF. I think I could cope with 2 triples! Unfortunately, void:inDataset has foaf:Document as the range. So that would mean I would need to do a bit more stuff, and it is getting more complicated again. And also: http://www.w3.org/TR/void/ 6.3 says Providing metadata about the entire dataset in such a scenario should not be done by including VoID details in every document. Rather, a single VoID description of the entire dataset should be published, and individual documents should point to this description via backlinks.” (My brain really hurts from trying to remember all this stuff form many years ago now.) However, I take that to mean that I shouldn’t put *all* the VoID stuff in each document. If I have licence stuff etc., then it should be in a common void.ttl or whatever. I don’t think there is any harm putting some selected bits of VoID in the document - in fact, if the VoID data is in the store itself (as I assume it should be), then some of it will arrive as a natural part of the SCBD. Is there some simple idiom we could all use to carry the info? I dunno, something like: :DBpedia void:uriRegexPattern “^http://sws\\.geonames\\.org/3405870/“ . :DBpedia void:sparqlEndpoint http://dbpedia.org/sparql . But that does require some processing. Any better ideas? I suspect I am just being thick. Would that do what I want? (Would it also do what Nandan wants? :-) ) Cheers On 26 Aug 2015, at 13:44, Ghislain Atemezing auguste.atemez...@eurecom.fr wrote: Hi Hugh, Le 26 août 2015 à 14:23, Hugh Glaser h...@glasers.org a écrit : Another major reason is that the publisher may not have the rights to publish .well-known and its ilk. And if it comes with the RDF we can be really confident of the provenance and trust of who has recommended it. Also, it is a damn sight easier to maintain, than to rebuild the vOID document every time something changes. At the data level, DCAT [1] already defines a property http://www.w3.org/ns/dcat#accessURL which points to “a landing page, feed, SPARQL endpoint or any other type of resource that gives access to the distribution of the dataset.” Of course, the question will remain if no one uses this property at least at dataset/data catalog level. Best, Ghislain [1] http://www.w3.org/TR/vocab-dcat/ -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: [ANN] nature.com/ontologies - July 2015 Release
Hi Tony, On 8 Aug 2015, at 10:19, Hammond, Tony tony.hamm...@macmillan.com wrote: Hi Hugh: Many thanks for the comments. This is exactly the kind of thing we need to hear. Thanks you for your responses, and the manner in which your received my comments. So, I think you may have raised four separate points which I'll try to answer in turn: == 1. Examples You are right. We've been sloppy. These were intended more for reading than parsing. So we took liberties with omitting common namespaces, abbreviating strings, etc. Punctuation, we were just careless. Sure thing. But I agree there is real value in making these examples complete. We will address this in our next release. 2. Dereference We really have no defence here. **We do not support dereference at this time.** The datasets are outputs from our production systems and HTTP URIs are used for namespacing only. We need to figure out a strategy for supporting dereference. So, if that means we are Bad Guys for violating Principle 2, then so be it. We are not trying to claim that this is true Linked Data - it's only common-or-garden linked data (of the RDF kind). That's not to say that we are not interested in adding in dereference. Only that these things take time to implement and we are proceeding with our data publishing in an incremental manner. Again, sure thing. What you have done is lovely Semantic Web stuff - it is only the Linked Data bit that seems to be missing. However, looking at the site, I think you may be being hard on yourself. Were you to have used the term Semantic Web instead of Linked Data on http://www.nature.com/ontologies/ and the other explanation pages, I think it would have been less misleading. I certainly would not have then gone to the data with the expectation of URIs resolving. So, for now, sorry! 3. URNs, etc Note that the RDF you obtained from dereferencing the DOI is from CrossRef - not from ourselves. So we cannot properly answer for something retrieved from a third party. That said, CrossRef are also in the early stages of data publishing, and may not themselves have reached the Linked Data standard. Again, seems like it's only RDF at this time. OK - got that. Not your RDF. (I would rename your DOCS dir to be VoID, by the way, as it makes it more attractive to people like me :-) ) 4. Mappings Am a little perplexed as to the distinction between mappings and links, although maybe I can see where you're coming from. Note that we're anyway planning to decouple our ontology mappings and put those in separate files and list then under Mappings. Sounds good. A personal view: A mapping is something that says to things are pretty much the same (in some sense we won’t go into); A link is the use of a URI that comes from as different source, such as your use of the dbpedia URIs. So I would say that something like http://dx.doi.org/10.1038/003022a0 http://xmlns.com/foaf/0.1/topic http://dbpedia.org/resource/Thomas_Henry_Huxley is a link from your dataset to dbpedia. If you also had a URI for http://dbpedia.org/resource/Thomas_Henry_Huxley that you wanted to say was the same, then you would say so using a skos or owl predicate, and that would be a mapping. Our core and domain ontologies generally have SKOS mappings, i.e. we use skos:closeMatch, skos:broadMatch, skos:exactMatch, skos:relatedMatch, etc. This feels appropriate for the ontology and the taxonomies. I guess we are cautiously feeling our way forward and want to be a little careful about using owl:sameAs. I’m happy to get those as well :-) == So, I hope we've clarified some things here. There's a couple obvious things we can do/are doing (examples, mappings). Some other things are out of our hands (DOI dereference). And some will need more time for us to implement (dereference generally). Anyway, many thanks again for all your comments. It's really good to hear back from real users. Otherwise it can feel like we are whistling in the wind. Pleasure. Hugh Tony On 07/08/2015 12:48, Hugh Glaser h...@glasers.org wrote: Hi Tony, Great stuff! So I start exploring, looking for more fodder for sameAs.org Š :-) It may be that my questions are too specific for the list - feel free to go off-list in response, and then we can summarise. And there is rather a lot here, I¹m afraid. Some possible problemettes I hit: http://www.nature.com/ontologies/datasets/articles/#data_example might be confusing for people (and awkward when I tried to rapper it). Since quite a few prefixes are not declared, most notably one of yours: npg, but also the usual suspects (xsd, dc, bibo, foaf and also prism). There is also a missing foaf:homepage that causes a syntax error. And some semi-colons missing off the last few lines. A slightly more challenging problem is that the URI for that example doesn¹t resolve. It unqualifies to http://ns.nature.com/articles/nrg3870 (I
Re: [ANN] nature.com/ontologies - July 2015 Release
the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor Macmillan Publishers International Limited nor any of their agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or Macmillan Publishers International Limited or one of their agents. Please note that neither Macmillan Publishers Limited nor Macmillan Publishers International Limited nor any of their agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or Macmillan Publishers International Limited or their agents by means of e-mail communication. Macmillan Publishers Limited. Registered in England and Wales with registered number 785998. Macmillan Publishers International Limited. Registered in England and Wales with registered number 02063302. Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS Pan Macmillan, Priddy and MDL are divisions of Macmillan Publishers International Limited. Macmillan Science and Education, Macmillan Science and Scholarly, Macmillan Education, Language Learning, Schools, Palgrave, Nature Publishing Group, Palgrave Macmillan, Macmillan Science Communications and Macmillan Medical Communications are divisions of Macmillan Publishers Limited. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
UK Open Data data vignette
The ODI (http://theodi.org) has published a research report, Research: Open data means business (http://theodi.org/open-data-means-business), along with a Google sheet of Open Data Companies (https://docs.google.com/spreadsheets/d/1xwxNIaxXSEMLktb-oo1UoYTd5WMPFkRLytRVVBr5OMA). It seemed a nice idea to map the companies and make the data accessible as 5 * Open Data at http://opendatacompanies.data.seme4.com So we have loaded the company data into a Linked Data store with a SPARQL endpoint (http://opendatacompanies.data.seme4.com/sparql/), and made the URIs resolve, such as http://opendatacompanies.data.seme4.com/id/company/od001 We have also used the UK postcodes (and OS services) to plot the companies on a map: http://opendatacompanies.data.seme4.com/services/map/ There you go. This is all a little rough-and-ready, just to see what it looks like, and see if anyone wants to use it. If you do, and want any changes, please ask. Actually, if anyone wanted to produce a similar GSheet for other places, we could suck that in too. Best Hugh
Re: UK Open Data data vignette
Yeah, the Seme4 data ain't too good either :-) Try Tom (Tom Heath tom.he...@theodi.org) at the ODI. On 04/08/2015 17:00, Kingsley Idehen wrote: On 8/4/15 11:35 AM, Hugh Glaser wrote: The ODI (http://theodi.org) has published a research report, Research: Open data means business (http://theodi.org/open-data-means-business), along with a Google sheet of Open Data Companies (https://docs.google.com/spreadsheets/d/1xwxNIaxXSEMLktb-oo1UoYTd5WMPFkRLytRVVBr5OMA). It seemed a nice idea to map the companies and make the data accessible as 5 * Open Data at http://opendatacompanies.data.seme4.com So we have loaded the company data into a Linked Data store with a SPARQL endpoint (http://opendatacompanies.data.seme4.com/sparql/), and made the URIs resolve, such as http://opendatacompanies.data.seme4.com/id/company/od001 We have also used the UK postcodes (and OS services) to plot the companies on a map: http://opendatacompanies.data.seme4.com/services/map/ There you go. This is all a little rough-and-ready, just to see what it looks like, and see if anyone wants to use it. If you do, and want any changes, please ask. Actually, if anyone wanted to produce a similar GSheet for other places, we could suck that in too. Best Hugh Nice work! BTW -- how actually handles editing of the original spreadsheet? The information on OpenLink is really messed up. The ultimate demonstration of identity and identifiers gone very wrong. Half of the description is based on OpenLink Financials and the other half OpenLink Software :(
Open Position: Developer / Data Scientist at Seme4
Seme4 Ltd, a leading Linked Data company founded by ECS Professors Sir Nigel Shadbolt and Dame Wendy Hall, is looking to recruit experienced developers to join the technical team. For the right candidate this poses an exciting opportunity to work with cutting edge technology alongside experts in the field, in an interesting, varied and rewarding role. http://www.seme4.com/jobs/ Please see attached for full details. Best Hugh Seme4-job-vacancy.pdf Description: Adobe PDF document
Re: DBpedia-based RDF dumps for Wikidata
Thanks Dimitris - well done to the whole team. In case it helps anyone, I have brought up a sameAs store for the sameAs relations in this dataset alone: http://sameas.org/store/wikidata_dbpedia/ In passing, it is interesting to note that the example URI, http://wikidata.dbpedia.org/resource/Q586 , has 110 sameAs URIs in this dataset alone. What price now the old view that everybody would use the same URIs for Things?! Best Hugh On 15 May 2015, at 11:28, Dimitris Kontokostas kontokos...@informatik.uni-leipzig.de wrote: Dear all, Following up on the early prototype we announced earlier [1] we are happy to announce a consolidated Wikidata RDF dump based on DBpedia. (Disclaimer: this work is not related or affiliated with the official Wikidata RDF dumps) We provide: * sample data for preview http://wikidata.dbpedia.org/downloads/sample/ * a complete dump with over 1 Billion triples: http://wikidata.dbpedia.org/downloads/20150330/ * a SPARQL endpoint: http://wikidata.dbpedia.org/sparql * a Linked Data interface: http://wikidata.dbpedia.org/resource/Q586 Using the wikidata dump from March we were able to retrieve more that 1B triples, 8.5M typed things according to the DBpedia ontology along with 48M transitive types, 6.4M coordinates and 1.5M depictions. A complete report for this effort can be found here: http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf The extraction code is now fully integrated in the DBpedia Information Extraction Framework. We are eagerly waiting for your feedback and your help in improving the DBpedia to Wikidata mapping coverage http://mappings.dbpedia.org/server/ontology/wikidata/missing/ Best, Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian Hellmann [1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg06936.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig DBpedia Association Projects: http://dbpedia.org, http://http://aligned-project.eu Homepage:http://aksw.org/DimitrisKontokostas Research Group: http://aksw.org -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Ontology to link food and diseases
One of the datasets at http://data.totl.net is Causes of Cancer The Daily Mail has put a huge amount of effort in instructing the UK public about what causes and prevents cancer. Here we have marked up the causes and preventions where possible linking to a valid dbpedia URI. Also includes references to the relevant news stories. Browse: http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fdata.totl.net%2Fcancer_causes.rdf It has a fair bit of data, but not a very rich ontology (hardly any), so possibly not a huge help. But certainly good reading. Hugh On 3 May 2015, at 22:20, Marco Brandizi brand...@ebi.ac.uk wrote: Hi all, I'm looking for an ontology/controlled vocabulary/alike that links food ingredients/substances/dishes to human diseases/conditions, like intolerances, allergies, diabetes etc. Examples of information I'd like to find coded (please assume they're true, I'm no expert): - gluten must be avoided by people affected by coeliac disease - omega-3 is good for people with high cholesterol - sugar should be avoided by people with diabetes risk I also would like linked data about commercial food products, but even an ontology without 'instances' would be useful. So far, I've found an amount of literature (eg, [1-3]) and vocabularies like AGROVOC[4], but nothing like the above. Thanks in advance for any help! Marco [1] http://fruct.org/publications/abstract14/files/Kol_21.pdf [2] http://www.researchgate.net/publication/224331263_FOODS_A_Food-Oriented_Ontology-Driven_System [3] http://www.hindawi.com/journals/tswj/aip/475410/ [4] http://tinyurl.com/ndtdhwn -- === Marco Brandizi, PhD brand...@ebi.ac.uk, http://www.marcobrandizi.info Functional Genomics Group - Sr Software Engineer http://www.ebi.ac.uk/microarray European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom Office V2-26, Phone: +44 (0)1223 492 613, Fax: +44 (0)1223 492 620 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Survey on Faceted Browsers for RDF data ?
Nice. On 28 Apr 2015, at 08:33, Michael Brunnbauer bru...@netestate.de wrote: Hello Hugh, On Mon, Apr 27, 2015 at 03:24:57PM +0100, Hugh Glaser wrote: one probably needs to materialize data in some other more facets-friedly system (e.g. solr, elastic search) to gain good performances (I might be wrong but this is what my - limited - experience told me). Woah there! I would say that is exactly where a faceted browser stops. And where a faceted search starts - which usually uses a faceted browser and may even be called faceted browser by some people. Not sure where to draw lines here. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Enterprise information system
Thank you all. Sigh… On 26 Feb 2015, at 22:44, Jean-Marc Vanel jeanmarc.va...@gmail.com wrote: Hi Hugh 2015-02-25 23:06 GMT+01:00 Hugh Glaser h...@glasers.org: But what if you start from scratch? So, the company wants to base all its stuff around Linked Data technologies, starting with information about employees, what they did and are doing, projects, etc., ... Is there a solution out of the box for all the data capture from individuals, and reports, queries, etc.? There is no out of the box solution, if one would exist you would know about it :) , it would have the features, plus the LD advantages. BUT there are people like me working toward the Semantic Enterprise information system. Even if there is no need to share and publish your data, let me remind the advantages over traditional development ( traditional i.e. SQL or MongoDB) : • Data models and data sources are available (Linked Open Data) • 40 implementations of graph datases to the W3C standard SPARQL • RDF data models are more flexible than SQL in terms of cardinality • simple inferences out of the box (inheritance) • easy to have interconnected yet independent applications by sharing URI's of common objects • no need of Object RDF mapping, there are DSL to express business logic in terms of RDF • Easily customized open source generic applications The point 7 is were progress is being made. There is not yet the equivalent of Ruby On Rails, Symphony, or Django for the Semantic Web and SPARQL databases, but work is being done. The strategic item is the input form management. Some frameworks exist that faciitate the creation of applications with form specifications in RDF, leveraging on RDF vocabularies, and storing in RDF. I have writen a review of semantic_based frameworks : http://svn.code.sf.net/p/eulergui/code/trunk/eulergui/html/semantic_based_apps_review.html of which the most promising seem : • semantic_forms : https://github.com/jmvanel/semantic_forms • Vitro https://github.com/vivo-project/Vitro -- Jean-Marc Vanel Déductions SARL - Consulting, services, training, Rule-based programming, Semantic Web +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui -- Jean-Marc Vanel Déductions SARL - Consulting, services, training, Rule-based programming, Semantic Web http://deductions-software.com/ +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Enterprise information system
So, here’s a thing. Usually you talk to a company about introducing Linked Data technologies to their existing IT infrastructure, emphasising that you can add stuff to work with existing systems (low risk, low cost etc.) to improve all sorts of stuff (silo breakdown, comprehensive dashboards, etc..) But what if you start from scratch? So, the company wants to base all its stuff around Linked Data technologies, starting with information about employees, what they did and are doing, projects, etc., and moving on to embrace the whole gamut. (Sort of like a typical personnel management core, plus a load of other related DBs.) Let’s say for an organisation of a few thousand, roughly none of whom are technical, of course. It’s a pretty standard thing to need, and gives great value. Is there a solution out of the box for all the data capture from individuals, and reports, queries, etc.? Or would they end up with a team of developers having to build bespoke things? Or, heaven forfend!, would they end up using conventional methods for all the interface management, and then have the usual LD extra system? Any thoughts? -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Quick Poll - Results
Hi, Thanks to all the responders (although there were not an awful lot!) I think there was a slight leaning towards putting the “home” URI in the subject, but only slight. The reason I asked this, by the way, is for sameAs import (of course!). sameAs services recommend a “canon” to be used. The canon can be set explicitly, but if it is not, then a decision needs to be made, since it has to have something. So we currently use the subject of the last sameAs triple we got for the bundle. I was trying to work out if there was a better way, in terms of subject or object. (We can of course do other stuff, such as the shortest, or alpha order, or from a priority list of domains, etc. but we are talking default behaviour.) Best Hugh On 23 Jan 2015, at 11:39, Hugh Glaser h...@glasers.org wrote: I would be really interested to know, please. I suggest answers by email, and I’ll report back eventually. Here goes: Imagine you have some of your own RDF using URIs on your base/domain. And you have reconciled some of your URIs against some other stuff, such as dbpedia, freebase, geonames... Now, visualise the owl:sameAs (or skos:whatever) triples you have made to represent that. Q1: Where are your URIs? a) subject, b) object, c) both Q2: Do all the triples have one of your URIs in them? a) yes, b) no It’s just for a choice I have about the input format for sameAs services, so I thought I would ask :-) Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Quick Poll
Thanks Stian and Alasdair, Just going back to the original question for a moment, I’ll try another way of putting it for people who don’t have their own URIs. When you want to create a set of links (of the sorts of properties you are talking about, but only the symmetric ones), you have often started with candidate URIs, and then found other URIs that have that relationship. When you create the triples to record this valuable information, does the original candidate appear as a) subject, b) object, or c) just whatever, or d) maybe you assert 2 triples both ways? It’s a very qualitative and woolly question, I realise :-) Thanks. On 25 Jan 2015, at 10:45, Stian Soiland-Reyes soiland-re...@cs.manchester.ac.uk wrote: For properties you would need to use owl:equivalentProperty or rdfs:subPropertyOf in either direction. SKOS is very useful here as an alternative when the logic gets dirty due to loose term definition. As an example see this SKOS mapping from PAV onto Dublin Core Terms (which are notoriously underspecified and vague): http://www.jbiomedsem.com/content/4/1/37/table/T5 http://www.jbiomedsem.com/content/4/1/37 (Results) Actual SKOS: http://purl.org/pav/mapping/dcterms Here we found SKOS as a nice way to do the mapping independently (and justified) as the inferences from OWL make DC Term incompatible with any causal provenance ontology like PROV and PAV. On 23 Jan 2015 17:59, Hugh Glaser h...@glasers.org wrote: Thanks, and thanks for all the answers so far. On 23 Jan 2015, at 16:23, Stian Soiland-Reyes soiland-re...@cs.manchester.ac.uk wrote: Not sure where you are going, but you are probably interested in linksets - as a way to package equivalence relations - typically in a graph of its own. Thanks - I have a lot of linksets :-) http://www.w3.org/TR/void/#describing-linksets To answer the questions: Q1: d) in subject, property, object, or multiple of those. I don’t understand where property comes in for using owl:sameAs (or whatever) in stating equivalence between URIs, so I’ll read that as c) Q2: No. We already reuse existing vocabularies and external identifiers, and there could be a nested structure which is only indirectly connected to our URIs. I realise that this second question wasn’t as clear as it might have been. What I meant was concerned with the sameAs triples only (as was explicit for Q1). So, to elaborate, if you have decided that: http://mysite.com/foo, http://dbpedia.org/resource/foo, http://rdf.freebase.com/ns/m.05195d8 are aligned (the same), then what do the triples describing that look like? In particular, do you have any that look like http://dbpedia.org/resource/foo owl:sameAs http://rdf.freebase.com/ns/m.05195d8 . (or vice versa), or do you equivalent everything to a “mysite” URI? But I guess for OpenPHACTS this doesn’t apply, since I understand from what you say below that you never mint a URI of your own where you know there is an external one. Although it does beg the question, perhaps, of what you do when you alter find equivalences. Best Hugh http://example.com/our/own pav:authoredBy http://orcid.org/-0001-9842-9718 . http://orcid.org/-0001-9842-9718 foaf:name Stian Soiland-Reyes . It's true you would also get the second triple from ORCID (remember content negotiation!), but it's very useful for presentation and query purposes to include these directly, e.g. in a VOID file. In most cases we do however not have any our URIs except for provenance statements. But perhaps Open PHACTS is special in that regard as we are integrating other people's datasets and shouldn't be making up any data of our own. :) Perhaps also of interest: In the Open PHACTS project http://www.openphacts.org/ we use this extensively - we let the end-user choose which linksets of weak and strong equivalences they want to apply when a query is made. Such a collection of linksets and their application we call a lense - so you apply lenses to merge/unmerge your data. See http://www.slideshare.net/alasdair_gray/gray-compcoref In our identity mapping service http://www.openphacts.org/about-open-phacts/how-does-open-phacts-work/identities-within-open-phacts we pass in several parameters - the minimal is the URI to map. See http://openphacts.cs.man.ac.uk:9092/QueryExpander/mapURI and use http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_2443 as the URI. We also have a piece of magic that can rewrite a SPARQL query to use the mapped URIs for a given variable (adding FILTER statements) try - http://openphacts.cs.man.ac.uk:9092/QueryExpander/ On 23 January 2015 at 11:39, Hugh Glaser h...@glasers.org wrote: I would be really interested to know, please. I suggest answers by email, and I’ll report back eventually. Here goes: Imagine you have some of your own RDF using URIs on your
Re: Quick Poll
Thanks, and thanks for all the answers so far. On 23 Jan 2015, at 16:23, Stian Soiland-Reyes soiland-re...@cs.manchester.ac.uk wrote: Not sure where you are going, but you are probably interested in linksets - as a way to package equivalence relations - typically in a graph of its own. Thanks - I have a lot of linksets :-) http://www.w3.org/TR/void/#describing-linksets To answer the questions: Q1: d) in subject, property, object, or multiple of those. I don’t understand where property comes in for using owl:sameAs (or whatever) in stating equivalence between URIs, so I’ll read that as c) Q2: No. We already reuse existing vocabularies and external identifiers, and there could be a nested structure which is only indirectly connected to our URIs. I realise that this second question wasn’t as clear as it might have been. What I meant was concerned with the sameAs triples only (as was explicit for Q1). So, to elaborate, if you have decided that: http://mysite.com/foo, http://dbpedia.org/resource/foo, http://rdf.freebase.com/ns/m.05195d8 are aligned (the same), then what do the triples describing that look like? In particular, do you have any that look like http://dbpedia.org/resource/foo owl:sameAs http://rdf.freebase.com/ns/m.05195d8 . (or vice versa), or do you equivalent everything to a “mysite” URI? But I guess for OpenPHACTS this doesn’t apply, since I understand from what you say below that you never mint a URI of your own where you know there is an external one. Although it does beg the question, perhaps, of what you do when you alter find equivalences. Best Hugh http://example.com/our/own pav:authoredBy http://orcid.org/-0001-9842-9718 . http://orcid.org/-0001-9842-9718 foaf:name Stian Soiland-Reyes . It's true you would also get the second triple from ORCID (remember content negotiation!), but it's very useful for presentation and query purposes to include these directly, e.g. in a VOID file. In most cases we do however not have any our URIs except for provenance statements. But perhaps Open PHACTS is special in that regard as we are integrating other people's datasets and shouldn't be making up any data of our own. :) Perhaps also of interest: In the Open PHACTS project http://www.openphacts.org/ we use this extensively - we let the end-user choose which linksets of weak and strong equivalences they want to apply when a query is made. Such a collection of linksets and their application we call a lense - so you apply lenses to merge/unmerge your data. See http://www.slideshare.net/alasdair_gray/gray-compcoref In our identity mapping service http://www.openphacts.org/about-open-phacts/how-does-open-phacts-work/identities-within-open-phacts we pass in several parameters - the minimal is the URI to map. See http://openphacts.cs.man.ac.uk:9092/QueryExpander/mapURI and use http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_2443 as the URI. We also have a piece of magic that can rewrite a SPARQL query to use the mapped URIs for a given variable (adding FILTER statements) try - http://openphacts.cs.man.ac.uk:9092/QueryExpander/ On 23 January 2015 at 11:39, Hugh Glaser h...@glasers.org wrote: I would be really interested to know, please. I suggest answers by email, and I’ll report back eventually. Here goes: Imagine you have some of your own RDF using URIs on your base/domain. And you have reconciled some of your URIs against some other stuff, such as dbpedia, freebase, geonames... Now, visualise the owl:sameAs (or skos:whatever) triples you have made to represent that. Q1: Where are your URIs? a) subject, b) object, c) both Q2: Do all the triples have one of your URIs in them? a) yes, b) no It’s just for a choice I have about the input format for sameAs services, so I thought I would ask :-) Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Stian Soiland-Reyes, eScience Lab School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/http://orcid.org/-0001-9842-9718 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Survey on Faceted Browsers for RDF data ?
On 23 Jan 2015, at 11:42, Christian Morbidoni christian.morbid...@gmail.com wrote: Hi all, I'm doing some research to get a comprehensive (as much as possible) view on what faceted browsers are out there today for RDF data and what features they offer. I collected a lot of links to papers, web sites and demos... but I found very few comparison/survey papers about this specific topic. [1] contains a section on faceted browsers, but not so exhaustive, [2] mentions some interesting systems but is a bit outdated. So, my questions are: 1) Do someone know a better paper/resource I can look at for a survey? 2) Is someone currently working on a survey like this? 3) Does someone have notable additions to my list? (pasted at the end of the mail) I think http://www.dotac.info/explorer/ might fit your definition, as might the older http://www.rkbexplorer.com Best Hugh At this stage I'm interested in both: automatic and configuration based browsers, free and commercial products, hierarchical and flat facets, simple and pivoting. thank you in advance best, Christian [1] Survey of linked data based exploration systems (2014) http://ceur-ws.org/Vol-1279/iesd14_8.pdf [2] From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web http://hcil2.cs.umd.edu/trs/2008-06/2008-06.pdf My current, randomly ordered list: tFacets - http://www.visualdataweb.org/tfacet.php Exhibit (3) + Babel Virtuoso built-in search + faceted browser RDF-faceted-browser -Blog post: https://shr.wordpress.com/2012/02/08/a-faceted-browser-over-sparql-endpoints/ Facete -http://aksw.org/Projects/Facete.html PivotBrowser - http://www.sindicetech.com/pivotbrowser.html Rhizomik - http://rhizomik.net/html/ /facets Paper: http://homepages.cwi.nl/~media/publications/iswc06.pdf gFacets - Paper: http://www.sfb716.uni-stuttgart.de/uploads/tx_vispublications/eswc10-heimErtlZiegler.pdf Flamenco Nested Facets Browser - Demo: http://people.csail.mit.edu/dfhuynh/projects/nfb/ Humboldt mSpace -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Quick Poll
I would be really interested to know, please. I suggest answers by email, and I’ll report back eventually. Here goes: Imagine you have some of your own RDF using URIs on your base/domain. And you have reconciled some of your URIs against some other stuff, such as dbpedia, freebase, geonames... Now, visualise the owl:sameAs (or skos:whatever) triples you have made to represent that. Q1: Where are your URIs? a) subject, b) object, c) both Q2: Do all the triples have one of your URIs in them? a) yes, b) no It’s just for a choice I have about the input format for sameAs services, so I thought I would ask :-) Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Semantic Web Dogfood
Does anyone have the data? I (or someone else) could at least stuff it in a browsable store if someone can get it to me? It is all rather an embarrassment now, I would say - maybe it should be switched off, if we can’t access or update it? Best Hugh On 21 Dec 2014, at 00:54, Andreas Harth andr...@harth.org wrote: Hi, I have a similar problem when accessing RDF files (e.g., [1]). ad...@data.semanticweb.org still bounces. It would be great to get access to these files again. Cheers, Andreas. [1] http://data.semanticweb.org/workshop/cold/2010/PC/rdf *** I'm getting error messages when accessing RDF files of workshops and conferences. $ wget http://data.semanticweb.org/workshop/cold/2010/PC/rdf; $ more rdf br / bFatal error/b: Call to a member function writeRdfToString() on a non-object in b/var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module/b on line b174/bbr / $ On 2012-03-28 07:56, Hugh Glaser wrote: Sorry, I have been here before, and can't remember who to email (ad...@data.semanticweb.org bounces). And I know some brave people were trying to sort it out. Anyway: Hi there, Sorry to report, but it seems things are a bit broken. Eg Resource URI on the dog food server: http://data.semanticweb.org/person/dan-brickley Email Hash: 748934f32135cfcf6f8c06e253c53442721e15e7 Eg transcript: hg@cohen [2012-03-28T15:43:32] acm.rkbexplorer.com/acquisition rdfget http://data.semanticweb.org/person/libby-miller HTTP/1.1 303 See Other Date: Wed, 28 Mar 2012 16:15:23 GMT Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 mod_ssl/2.2.3 OpenSSL/0.9.8c X-Powered-By: PHP/5.2.0-8+etch16 Set-Cookie: SESS002fbfc63133341c13dbc400422ca44a=40e15aa64d8febbf4530d9d3bd778487; expires=Fri, 20 Apr 2012 19:48:43 GMT; path=/; domain=.data.semanticweb.org Expires: Sun, 19 Nov 1978 05:00:00 GMT Last-Modified: Wed, 28 Mar 2012 16:15:23 GMT Cache-Control: store, no-cache, must-revalidate Cache-Control: post-check=0, pre-check=0 Location: http://data.semanticweb.org/person/libby-miller/rdf Access-Control-Allow-Origin: * Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 HTTP/1.1 200 OK Date: Wed, 28 Mar 2012 16:15:23 GMT Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 mod_ssl/2.2.3 OpenSSL/0.9.8c X-Powered-By: PHP/5.2.0-8+etch16 Set-Cookie: SESS002fbfc63133341c13dbc400422ca44a=a6cd8a43718d688ec6192079abe7a400; expires=Fri, 20 Apr 2012 19:48:43 GMT; path=/; domain=.data.semanticweb.org Expires: Sun, 19 Nov 1978 05:00:00 GMT Last-Modified: Wed, 28 Mar 2012 16:15:23 GMT Cache-Control: store, no-cache, must-revalidate Cache-Control: post-check=0, pre-check=0 Access-Control-Allow-Origin: * Content-Length: 186 Content-Type: application/rdf+xml; charset=utf-8 br / bFatal error/b: Call to a member function writeRdfToString() on a non-object in b/var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module/b on line b171/bbr / It only gives the 200 response after a very looong time. Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Microsoft OLE
Thanks Paul, On 15 Dec 2014, at 19:07, Paul Houle ontolo...@gmail.com wrote: Most Windows programmers would instantiate OLE objects in the applications and query them to get results; Ah, the first problem - I’m not a Windows programmer :-) In fact, I want to access OLE-published stuff without any need to have knowledge of Windows at all. http resolution to IIS or Apache running on the Windows machine seemed like a good choice. commonly people write XML or JSON APIs, but writing RDF wouldn't be too different. The next step up is to have a theory that converts OLE data structures to and from RDF either in general or in a specific case with help from a schema. Microsoft invested a lot in making SOAP work well with OLE, so you might do best with a SOAP to RDF mapping. So yes - a service that did some mapping from the retrieved OLE data structure to RDF; and a general one was what I was thinking of. The incoming URI would be interpretable as an OLE data object (I guess with some server config), which then got fetched and converted to RDF. In fact, it seems an obvious way of exposing Word docs, Excel spreadsheets and even Access DBs live, but there is probably some stuff I don’t understand that means it is crazy. I suspect the silence (except you and Barry) means that this isn’t something anyone has done, at least yet. Best Hugh This caught my eye though, because I've been looking at the relationships between RDF and OMG, a distant outpost of standardization. You can find competitive products on the market, one based on UML and another based on RDF, OWL, SKOS and so forth. The products do more or less the same thing, but described in such different language and vocabulary that it's hard to believe that they compete for any sales. There is lots of interesting stuff there, but the big theme is ISO Common logic, which adds higher-arity predicates and a foundation for inference that people will actually want to use. It's not hard to convince the enterprise that first-order-logic is ready for the big time because banks and larger corporations all use FOL-based systems on production rules to automate decisions. On Sat, Dec 13, 2014 at 7:30 AM, Hugh Glaser h...@glasers.org wrote: Anyone know of any work around exposing OLE linked objects as RDF? I could envisage a proxy that gave me URIs and metadata for embedded objects. Is that even a sensible question? :-) -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Microsoft OLE
Anyone know of any work around exposing OLE linked objects as RDF? I could envisage a proxy that gave me URIs and metadata for embedded objects. Is that even a sensible question? :-) -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: How to model valid time of resource properties?
Hi, On 15 Oct 2014, at 23:02, John Walker john.wal...@semaku.com wrote: Hi On October 15, 2014 at 2:59 PM Kingsley Idehen kide...@openlinksw.com wrote: On 10/15/14 8:36 AM, Frans Knibbe | Geodan wrote: ... Personally I would not use this approach for foaf:age and foaf:based_near as these capture a certain snapshot/state of (the information about) a resource. Having some representation where the foaf:age triple could be entailed could lead to having multiple conflicting statements with no easy way to find the truth. Having a clear understanding of the questions you want to ask of your knowledge base should help steer modelling choices. This undoubtedly true, and very important - is the modelling fit for purpose? Proper engineering. In the cases known to me that require the recording of history of resources, all resource properties (except for the identifier) are things that can change in time. If this pattern would be applied, it would have to be applied to all properties, leading to vocabularies exploding and becoming unwieldy, as described in the Discussion paragraph. I think that the desire to annotate statements with things like valid time is very common. Wouldn't it be funny if the best solution to a such a common and relatively straightforward requirement is to create large custom vocabularies? If you want to be able to capture historical states of a resource, using named graphs to provide that context would be my first thought. However, there is a downside to this. If all that is happening is that Frans is gathering his own data into a store, and then using that data for some understood application of his, then this will be fine. Then he knows exactly the structure to impose on his RDF using named Graphs. But this is Linked Open Data, right? So what happens about use by other people? Or if Frans wants to build other queries over the same data? If he hasn’t foreseen the other structure, and therefore ensured that the required Named Graphs exist, then it won;t be possible to make the statements required about the RDF. The problem is that in choosing the Named Graph structure, the data publisher makes very deep assumptions and even decisions about how the dataset will be used. This is not really good practice in an Open world - in fact, one of the claimed advantages of Semantic Web technologies is that such assumptions (such as the choice of tables in a typical database) are no longer required! I’m not saying that Named Graphs aren’t useful and often appropriate, but choosing to use Named Graphs can really make the data hard to consume. And if they are used, the choice of how really needs to be considered very much with the modelling. (This is particularly important in the absence of any ability to nest Named Graphs.) Cheers If that resource consists of just one triple, then RDF reification of that statement would also work as Kingsley mentions. Regards, Frans Frans, How about reified RDF statements? I think discounting RDF reification vocabulary is yet another act of premature optimization, in regards to the Semantic Web meme :) Some examples: [1] http://bit.ly/utterances-since-sept-11-2014 -- List of statements made from a point in time. [2] http://linkeddata.uriburner.com/c/8EPG33 -- About Connotation -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
Hi Alexander, On 5 Oct 2014, at 15:57, Alexander Garcia Castro alexgarc...@gmail.com wrote: metadata, sure. it is a must. BUT good and thought for the web of data. not designed for paper based collections. From my experience it is not so much about representing everything from the paper as triplets. there will be statements that won't be representable, also, such approach may not be efficient. why don't we just go a little bit further up from the lowest hanging fruit and start talking about self describing documents? well annotated documents with well structured metadata that are interoperable. this is easy, achievable, requires little tooling, does not put any burden on the author, delivers interoperability beyond just simple hyperlinks, it is much more elegant than adhering to HTML, etc. You lost me here. Who or what does the well annotated documents and well structured metadata”? If it isn’t any burden for the authors. Easy and little tooling - I wonder what methods and tools you have in mind? These have proved to be hard problems - otherwise we wouldn’t be having this painful discussion. Best Hugh On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote: On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh... P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space... [1] https://atlas.oreilly.com [2] http://metrodigi.com [3] https://www.inkling.com On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually semantic in some sense is still, in my view, a research topic, not a routine reality. Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing. I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences. Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)). Just my personal 2c Daniel On Oct 3, 2014, at 12:50 - 03/10/14, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: In my opinion PDF is currently the clear winner over HTML in both the ability to produce readable documents and the ability to display readable documents in the way that the author wants them to display. In the past I have tried various means to produce good-looking HTML and I've always gone back to a setup that produces PDF. If a document is available in both HTML and PDF I almost always choose to view it in PDF. This is the case even though I have particular preferences in how I view documents. If someone wants to change the format of conference submissions, then they are going to have to cater to the preferences of authors, like me, and reviewers, like me. If someone wants to change the format of conference papers, then they are going to have to cater to the preferences of authors, like me, attendees, like me, and readers, like me. I'm all for *better* methods for preparing, submitting, reviewing, and publishing conference (and journal) papers. So go ahead, create one. But just saying that HTML is better than PDF in some dimension, even if it were true, doesn't mean that HTML is better than PDF for this purpose. So I would say that the semantic web community is saying that there are better formats and tools for creating, reviewing, and publishing scientific papers than HTML and tools that create and view HTML. If there weren't these better ways then an HTML-based solution might be tenable, but why use a worse solution when a better one is available? peter On 10/03/2014 08:02 AM, Phillip Lord wrote: [...] As it stands, the only statement that the semantic web community are making is that web formats are too poor for scientific usage. [...] Phil Daniel Schwabe Dept. de Informatica, PUC-Rio Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225 Fax: +55-21-3527 1530 Rio de Janeiro, RJ 22453-900, Brasil http://www.inf.puc-rio.br/~dschwabe Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
sense is still, in my view, a research topic, not a routine reality. Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing. I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences. Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)). Just my personal 2c Daniel On Oct 3, 2014, at 12:50 - 03/10/14, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: In my opinion PDF is currently the clear winner over HTML in both the ability to produce readable documents and the ability to display readable documents in the way that the author wants them to display. In the past I have tried various means to produce good-looking HTML and I've always gone back to a setup that produces PDF. If a document is available in both HTML and PDF I almost always choose to view it in PDF. This is the case even though I have particular preferences in how I view documents. If someone wants to change the format of conference submissions, then they are going to have to cater to the preferences of authors, like me, and reviewers, like me. If someone wants to change the format of conference papers, then they are going to have to cater to the preferences of authors, like me, attendees, like me, and readers, like me. I'm all for *better* methods for preparing, submitting, reviewing, and publishing conference (and journal) papers. So go ahead, create one. But just saying that HTML is better than PDF in some dimension, even if it were true, doesn't mean that HTML is better than PDF for this purpose. So I would say that the semantic web community is saying that there are better formats and tools for creating, reviewing, and publishing scientific papers than HTML and tools that create and view HTML. If there weren't these better ways then an HTML-based solution might be tenable, but why use a worse solution when a better one is available? peter On 10/03/2014 08:02 AM, Phillip Lord wrote: [...] As it stands, the only statement that the semantic web community are making is that web formats are too poor for scientific usage. [...] Phil Daniel Schwabe Dept. de Informatica, PUC-Rio Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225 Fax: +55-21-3527 1530 Rio de Janeiro, RJ 22453-900, Brasil http://www.inf.puc-rio.br/~dschwabe -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Searching for references to a certain URI
Me? Well, obviously I use http://sameas.org/ :-) Eg http://sameas.org/?uri=http%3A%2F%2Fd-nb.info%2Fgnd%2F120273152 Best Hugh On 25 Sep 2014, at 09:59, Neubert Joachim j.neub...@zbw.eu wrote: What strategies do you use to find all references to a certain URI, e.g.http://d-nb.info/gnd/120273152, on the (semantic) web? I used Sindice for this, but sadly the service is discontinued, and the data becomes more and more outdated. Google link:/info:https://en.wikipedia.org/wiki/Horst_Siebert) are excluded by rel=nofollow links, and pure RDF links (e.g. from dbpedia) don’t show up at all. Cheers, Joachim -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Position Advertised - Web Developer
We are keen to have someone who can do Linked Data technologies, although it would be hard to make it a requirement, as it would restrict the market too much. http://www.timewisejobs.co.uk/job/6726/webmaster-developer-part-time-/ Best Hugh -- Hugh Glaser Partner Ethos Valuable Outcomes +44 23 8061 5652 | hugh.gla...@ethosvo.org | Skype: hugh_glaser | www.ethosvo.org Solving complex problems through collaboration, trust and moderation
Re: testable properties of repositories that could be used to rate them
Thanks Stuart, On 14 Sep 2014, at 22:49, Stuart Yeates stuart.yea...@vuw.ac.nz wrote: On 15/09/14 09:25, Hugh Glaser wrote: I've greyed out the 'everything' requirement, since I'm not sure that 'everything' is script-testable. Yes, I was puzzling over that (how it could be made scriptable). Certainly quite a lot of the other things in the list make assumptions about repository identifiers being available - otherwise how can you get started, or ask if dc:title is used, for example? So how do you find the repository identifiers in a scriptable manner? Let’s assume that there is no OAI-PMH support, for example. In the community in which I am working (and which this list grew out of) 'repository' is effectively defined as a document-full website with a working OAI-PMH feed and the backing of a long-lived institution or organisation. Without an OAI-PMH feed, the answer is 'get an OAI-PMH feed.’ Seems sensible to me! So for this, maybe I could move it to after number 3 (where we know there is RDF) and then I could list the predicates that must have URIs (rather than strings)? I've grey'ed out the content negotiation requirements since I'm not aware that any repositories or prototypes that try and do this (I'm happy to be corrected). The standard ePrints 3 software supports content negotiation - e.g. http://oro.open.ac.uk/id/eprint/40795 I've un-greyed this item. (I confess that most of the input into the document so-far has come from the dspace world) Great. I've recast most of this in the document. I've not gone for exact reflection of what the design doc says, but script-testable easily-understandable items that encourage useful steps towards best practice. I’ll make some more suggestions to try to capture a crucial thing - that authors are identified by URI. Best Hugh cheers stuart -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: testable properties of repositories that could be used to rate them
Hi. On 14 Sep 2014, at 22:06, Stuart Yeates stuart.yea...@vuw.ac.nz wrote: The initial aim of this was to counter an apparently arbitrary repository ranking algorithm (which I won't deign link to) with a set of web standards that we (repository developers and maintainers) can collectively work towards, with an emphasis on breadth of different standards that could be applied. Sure. -- I've greyed out the 'everything' requirement, since I'm not sure that 'everything' is script-testable. Yes, I was puzzling over that (how it could be made scriptable). Certainly quite a lot of the other things in the list make assumptions about repository identifiers being available - otherwise how can you get started, or ask if dc:title is used, for example? So how do you find the repository identifiers in a scriptable manner? Let’s assume that there is no OAI-PMH support, for example. So for this, maybe I could move it to after number 3 (where we know there is RDF) and then I could list the predicates that must have URIs (rather than strings)? I've grey'ed out the content negotiation requirements since I'm not aware that any repositories or prototypes that try and do this (I'm happy to be corrected). That actually seems rather a strange statement - if you had said that there was no interest in it, then that would be fine. But surely your rating should list anything useful that a repository might offer? Is there nothing else in your list that is not currently supported? Is RDFa supported anywhere? But fear not, there are many examples in the wild! The standard ePrints 3 software supports content negotiation - e.g. http://oro.open.ac.uk/id/eprint/40795 I see it does rdf+xml and text/n3 - I haven’t tried any others. I've found a better URL for the RDFa requirement. Nice. cheers Cheers Hugh stuart On 13/09/14 22:58, Hugh Glaser wrote: The messages below should make sense. Stuart is trying to make a doc for rating repositories. I’ve added some stuff about Linked Data: From http://www.w3.org/DesignIssues/LinkedData.html (Linked Data Principles) Everything has a URI - publications, documents, people, organisations, categories, ... These URIs are HTTP or HTTPS When RDF is requested, the URIs return RDF metadata RDF/XML supported N3 supported Turtle supported JSON-LD supported There are URIs that are not from this repository There are URIs from other repositories There is a SPARQL endpoint RDFa is embedded in the HTML Is there somewhere I could have taken this from that would be suitable? Anyone care to contribute? It seems like it is a really useful thing to have (modulo a bit of specialisation for any particular domain). (I didn’t want to go over the top on formats, by the way.) Cheers Begin forwarded message: From: Stuart Yeates stuart.yea...@vuw.ac.nz Subject: RE: testable properties of repositories that could be used to rate them Date: 13 September 2014 10:31:36 BST To: Hugh Glaser h...@ecs.soton.ac.uk Cc: jisc-repositor...@jiscmail.ac.uk jisc-repositor...@jiscmail.ac.uk I notice there is nothing about Linked Data and Semantic Web - would it be sensible to have something on this? If there's something that's recommended by some standard / recommendation and is script-testable, you're welcome to add it. So for example does it provide RDF at all? It has a question based on http://validator.w3.org/feed/ which validates RSS, which in turn is either RDF (v1.0) or can trivially be converted to it (v2.0/atom). I've added a note that this is RSS. cheers stuart Begin forwarded message: From: Hugh Glaser h...@ecs.soton.ac.uk Subject: Re: testable properties of repositories that could be used to rate them Date: 12 September 2014 14:05:34 BST To: jisc-repositor...@jiscmail.ac.uk Reply-To: Hugh Glaser h...@ecs.soton.ac.uk Very interesting (and impressive!) I notice there is nothing about Linked Data and Semantic Web - would it be sensible to have something on this? Well, actually there is Semantic Web:- right up at the start there is a Cool URI reference, which is the the W3C Cool URIs for the Semantic Web” note! Perhaps there should be a section on this - maybe starting with with whether it is 5* Linked Data. http://en.wikipedia.org/wiki/Linked_data http://www.w3.org/DesignIssues/LinkedData.html But it probably useful to unpick some of this in a less structured way. So for example does it provide RDF at all? Formats? RDF, N3, JSON-LD… Best Hugh On 12 Sep 2014, at 03:29, Stuart Yeates stuart.yea...@vuw.ac.nz wrote: A couple of us have drawn up a bit of a list of script-testable properties of repositories that could be used to rate them. We’re tried to both avoid arbitrary judgements and the implication that every repository should meet every item: https://docs.google.com/document/d
Fwd: testable properties of repositories that could be used to rate them
The messages below should make sense. Stuart is trying to make a doc for rating repositories. I’ve added some stuff about Linked Data: From http://www.w3.org/DesignIssues/LinkedData.html (Linked Data Principles) Everything has a URI - publications, documents, people, organisations, categories, ... These URIs are HTTP or HTTPS When RDF is requested, the URIs return RDF metadata RDF/XML supported N3 supported Turtle supported JSON-LD supported There are URIs that are not from this repository There are URIs from other repositories There is a SPARQL endpoint RDFa is embedded in the HTML Is there somewhere I could have taken this from that would be suitable? Anyone care to contribute? It seems like it is a really useful thing to have (modulo a bit of specialisation for any particular domain). (I didn’t want to go over the top on formats, by the way.) Cheers Begin forwarded message: From: Stuart Yeates stuart.yea...@vuw.ac.nz Subject: RE: testable properties of repositories that could be used to rate them Date: 13 September 2014 10:31:36 BST To: Hugh Glaser h...@ecs.soton.ac.uk Cc: jisc-repositor...@jiscmail.ac.uk jisc-repositor...@jiscmail.ac.uk I notice there is nothing about Linked Data and Semantic Web - would it be sensible to have something on this? If there's something that's recommended by some standard / recommendation and is script-testable, you're welcome to add it. So for example does it provide RDF at all? It has a question based on http://validator.w3.org/feed/ which validates RSS, which in turn is either RDF (v1.0) or can trivially be converted to it (v2.0/atom). I've added a note that this is RSS. cheers stuart Begin forwarded message: From: Hugh Glaser h...@ecs.soton.ac.uk Subject: Re: testable properties of repositories that could be used to rate them Date: 12 September 2014 14:05:34 BST To: jisc-repositor...@jiscmail.ac.uk Reply-To: Hugh Glaser h...@ecs.soton.ac.uk Very interesting (and impressive!) I notice there is nothing about Linked Data and Semantic Web - would it be sensible to have something on this? Well, actually there is Semantic Web:- right up at the start there is a Cool URI reference, which is the the W3C Cool URIs for the Semantic Web” note! Perhaps there should be a section on this - maybe starting with with whether it is 5* Linked Data. http://en.wikipedia.org/wiki/Linked_data http://www.w3.org/DesignIssues/LinkedData.html But it probably useful to unpick some of this in a less structured way. So for example does it provide RDF at all? Formats? RDF, N3, JSON-LD… Best Hugh On 12 Sep 2014, at 03:29, Stuart Yeates stuart.yea...@vuw.ac.nz wrote: A couple of us have drawn up a bit of a list of script-testable properties of repositories that could be used to rate them. We’re tried to both avoid arbitrary judgements and the implication that every repository should meet every item: https://docs.google.com/document/d/1sEDqPS2bfAcbunpjNzHwB56f5CY1SxJunSBLFtom3IM/edit cheers stuart
Re: URIs within URIs
Nice. That enumerates the choices, I think. In a world where the services are themselves being used as LD URIs (because everything is a LD URI, of course!) there is the orthogonal question of whether the URI needs to be URLEncoded. And in fact I think all the prefixing patterns fail that test? If you are still updating patterns, you might like to add a note? Cheers On 28 Aug 2014, at 15:12, Leigh Dodds le...@ldodds.com wrote: Hi, I documented all the variations of this form of URI construction I was aware of in the Rebased URI pattern: http://patterns.dataincubator.org/book/rebased-uri.html This covers generating one URI from another. What that new URI returns is a separate concern. Cheers, L. On Fri, Aug 22, 2014 at 4:56 PM, Bill Roberts b...@swirrl.com wrote: Hi Luca We certainly find a need for that kind of feature (as do many other linked data publishers) and our choice in our PublishMyData platform has been the URL pattern {domain}/resource?uri={url-encoded external URI} to expose info in our databases about URIs in other domains. If there was a standard URL route for this scenario, we'd be glad to implement it Best regards Bill On 22 Aug 2014, at 16:44, Luca Matteis lmatt...@gmail.com wrote: Dear LOD community, I'm wondering whether there has been any research regarding the idea of having URIs contain an actual URI, that would then resolve information about what the linked dataset states about the input URI. Example: http://foo.com/alice - returns data about what foo.com has regarding alice http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't just resolve the alice URI above, but returns what bar.com wants to say about the alice URI For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could return: http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice a void:Dataset . http://foo.com/alice #some #data . I know SPARQL endpoints already have this functionality, but was wondering whether any formal research was done towards this direction rather than a full-blown SPARQL endpoint. The reason I'm looking for this sort of thing is because I simply need to ask certain third-party datasets whether they have data about a URI (inbound links). Best, Luca -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Education
Hi Leif, I’m not sure you meant to do Reply-all. :-) But a Reply-all from me what said that you exactly have the point. It is entirely appropriate that more than half the course, or even more, would be on scripting itself. And that the students would start from essentially no knowledge - that is the target audience. The course is about the students learning scripting and data stuff - it just happens that the examples used are Linked data related, giving useful added value. By the way, someone else is going to to the course that prompted this, and will use Python with data processing and stats stuff as the subject, so not so far off. Best Hugh On 23 Aug 2014, at 20:06, Leif Isaksen leif...@googlemail.com wrote: Hi Hugh sorry for a slow reply. I was away have been digging through my email backlog ever since. I think this is a really interesting question, although I suspect you are setting the bar a bit high for humanists (and probably social scientists too). The majority of them have no experience in scripting at all (although some do, and many are willing to try). I think you'd probably need to spend at least half the course (or more) dealing with the basic principles of scripting before you could start touching on these topics. Having said that, if you can get them inspired by the possibilities, I've found they are often willing to invest a lot of their own time learning the skills. Of course, you can't really write that into the syllabus... As it happens, my colleague with whom I co-teach our Masters module on 'Web technologies in the Humanities' has just gone on leave, so if you feel like trialling any of these ideas, I have a captive audience for you :-) All the best L. PS and as an ex Java developer I'm also sad to agree that java and Linked Data are probably a terrible mix, conceptually speaking at any rate. I remember in my first ever Semantic Web application we used Remote Procedure Calls to transfer RDF :-S On Sat, Jul 12, 2014 at 12:02 PM, Hugh Glaser h...@glasers.org wrote: The other day I was asked if I would like to run a Java module for some Physics Astronomy students. I am so far from plain Java and that sort of thing now there was almost a cognitive dissonance. But it did cause me to ponder on about what I would do for such a requirement, given a blank sheet. For people whose discipline is not primarily technical, what would a syllabus look like around Linked Data as a focus, but also causing them to learn lots about how to just do stuff on computers? How to use a Linked Data store service as schemaless storage: bit of intro to triples as simply a primitive representation format; scripting for data transformation into triples - Ruby, Python, PHP, awk or whatever; scripting for http access for http put, delete to store; simple store query for service access (over http get); scripting for data post-processing, plus interaction with any data analytic tools; scripting for presentation in html or through visualisation tools. It would be interesting for scientists and, even more, social scientists, archeologists, etc (alongside their statistical package stuff or whatever). I think it would be really exciting for them, and they would get a lot of skills on the way - and of course they would learn to access all this Open Data stuff, which is becoming so important. I’m not sure they would go for it ;-) Just some thoughts. And does anyone knows of such modules, or even is teaching them? Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: URIs within URIs
On 22 Aug 2014, at 22:43, Ruben Verborgh ruben.verbo...@ugent.be wrote: Hi Hugh, Can you tell me id there is a pattern for the uri= style stuff, where you want everything the service wants to say about the URI, in any position? The current triple pattern fragments spec does not mandate this, but: - each response will give you the controls (links and/or form) to find the other patterns Not very nice if all I want is to get what the service wants to tell me about that URI. - the server is free to include more triples than asked for Sounds better. - future extensions (that are planned) can support this Even better :-) And I guess that raises the question of bnodes as well. My answer to that is always: bnodes are Semantic Web, but not Linked Data. If a node doesn't have a universal identifier, it cannot be addressed. I find this comment strange. If you mean that I can’t query using a bnode, then sure. If you mean that I never get any bnodes back as a result of a Linked Data URI GET, then I think not. But then again, I think my comment was a bit confused itself :-) Cheers That might seem like the simple explanation—because it is— but it's the only satisfying answer I have found so far. I suppose I am looking at LDF from the point of view of it is a way of specifying the invoking URI pattern, and what my services would look like if they were using such patterns to be invoked - although maybe that is misuse? You could do that; that's one way of looking at it. The important thing is that a client doesn't have to guess or know anything about the server. Just by getting one arbitrary response (fragment), it is able to retrieve any other. No URL hacking needed. Best, Ruben PS Something I didn't mention in the earlier mail: it does combine nicely with dereferencing. For instance, the URL http://data.mmlab.be/people/Ruben+Verborgh 303s to http://data.mmlab.be/mmlab?subject=http%3A%2F%2Fdata.mmlab.be%2Fpeople%2FRuben%2BVerborgh. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: URIs within URIs
Hi Luca, You mean things like http://sameas.org/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh I think. And for something many years old, and with other flags: http://www.rkbexplorer.com/network/?uri=http://southampton.rkbexplorer.com/id/person-2f876940347fe251382724b34c27346f-cb9c89b02b078212e440a8016915856atype=person-personformat=foafknowsn3 So yes, they are out there (I have lots of other sites and services that do this), but no, I don’t know any research, or even what the topic might be. Actually, we use a more Cool URI/Restful-like invocation now: http://sociam-pub.ecs.soton.ac.uk/sameas/symbols/http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh is much preferable, I think. Hope that helps. Best Hugh On 22 Aug 2014, at 16:44, Luca Matteis lmatt...@gmail.com wrote: Dear LOD community, I'm wondering whether there has been any research regarding the idea of having URIs contain an actual URI, that would then resolve information about what the linked dataset states about the input URI. Example: http://foo.com/alice - returns data about what foo.com has regarding alice http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't just resolve the alice URI above, but returns what bar.com wants to say about the alice URI For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could return: http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice a void:Dataset . http://foo.com/alice #some #data . I know SPARQL endpoints already have this functionality, but was wondering whether any formal research was done towards this direction rather than a full-blown SPARQL endpoint. The reason I'm looking for this sort of thing is because I simply need to ask certain third-party datasets whether they have data about a URI (inbound links). Best, Luca -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: URIs within URIs
Hi Ruben, Cool posting. Can you tell me id there is a pattern for the uri= style stuff, where you want everything the service wants to say about the URI, in any position? For a simple site this might look like the SCBD for the URI? And I guess that raises the question of bnodes as well. I have looked a bit at the paper and the spec, but couldn’t find it and I’m feeling lazy - sorry :-) I suppose I am looking at LDF from the point of view of it is a way of specifying the invoking URI pattern, and what my services would look like if they were using such patterns to be invoked - although maybe that is misuse? Best Hugh On 22 Aug 2014, at 17:19, Ruben Verborgh ruben.verbo...@ugent.be wrote: Hi Luca, I'm wondering whether there has been any research regarding the idea of having URIs contain an actual URI, that would then resolve information about what the linked dataset states about the input URI. Example: http://foo.com/alice - returns data about what foo.com has regarding alice http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't just resolve the alice URI above, but returns what bar.com wants to say about the alice URI This specific use case has been one of the motivations behind Triple Pattern Fragments [1][2]. Section 4.3 of our publication on the Linked Data on the Web workshop [2] specifically tackles this issue. The problem with dereferencing is that the URI of a concept only leads to the information about this concept by the particular source that has created this specific URI—even though there might be others. For instance, even if http://example.org/#company was the official URI of the company EXAMPLE, it is unlikely the source of the most objective information about this company. But how can we find that information then? And the problem gets worse with URIs like http://xmlns.com/foaf/0.1/Person. This URI gives you exactly 0 persons, as strange as this might seem to an outsider. With Triple Pattern Fragments, you can say: “give me all information this particular dataset has about concept X.” For instance, given the resource http://dbpedia.org/resource/Barack_Obama, here is data for this person *in a specific dataset*: http://data.linkeddatafragments.org/dbpedia?subject=http%3A%2F%2Fdbpedia.org%2Fresource%2FBarack_Obama Here is data about http://xmlns.com/foaf/0.1/Person in that same dataset: http://data.linkeddatafragments.org/dbpedia?object=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson Note how these resources are *not* created by hacking URI patterns manually; instead, you can find them through a hypermedia form: - http://data.linkeddatafragments.org/dbpedia This form woks for both HTML and RDF clients, thanks to the Hydra Core Vocabulary. In other words, this interface is a hypermedia-driven REST interface through HTTP. This gets us to a deeper difference between (current) Linked Data and the rest of the Web: Linked Data uses only links as hypermedia controls, whereas the remainder of the Web uses links *and forms*. Forms are a much more powerful mechanism to discover information. So part of what we want to achieve with Triple Pattern Fragments is to broaden the usage of Linked Data from links to more expressive hypermedia. This truly allows “anybody to say anything about anything”— and to discover that information, too! I know SPARQL endpoints already have this functionality, but was wondering whether any formal research was done towards this direction rather than a full-blown SPARQL endpoint. The reason I'm looking for this sort of thing is because I simply need to ask certain third-party datasets whether they have data about a URI (inbound links). Consider using a Triple Pattern Fragments server [3]. Their handy and very cheap to host in comparison to SPARQL servers! Best, Ruben [1] http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/ [2] http://ceur-ws.org/Vol-1184/ldow2014_paper_04.pdf [3] https://github.com/LinkedDataFragments/Server.js -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram - Missed data sources.
On 16 Aug 2014, at 12:57, David Wood da...@3roundstones.com wrote: On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote: On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote: Hi, But I wonder where so many other sites (including mine) went ? The problem with crawling the Web of Linked Data is really that it is hard to get the datasets on the edges that set RDF links to other sources but are not the target of links from well-connected sources. I'm curious, why you don't just crawl the whole Web looking for linked data? Or better yet, work with one of the search engines or Open Crawl so you can use their indexes. Well there is possibly a quick answer to this. Google, at least, doesn’t index Linked Data. Well, certainly not the kind that does conneg. See other recent messages on this list about the problem of SEO of Linked Data, which is another side of the same coin. Checking Google: Looking at http://dbpedia.org/resource/Birching If I take a URI from (the RDF I get from) that page, and search for it in Google, I think I would expect it to take me to quite a few RDF documents in various formats. But, for example, https://www.google.com/#filter=0q=%22http://ru.dbpedia.org/resource/Розги%22 (asking for all results in the filter=0), shows no RDF documents at all. Of course, RDF documents would have …/data/… in them, rather than …/resource/… or …/page/… And, in fact, searching for dbpedia/data https://www.google.com/#q=%22dbpedia.org%2Fdata%22 only gives 1.2M hits, which is way short of what it would be. Not my field, so I may have it wrong, but I felt like checking it out on a stormy Sunday afternoon! Best Hugh Regards, Dave -- http://about.me/david_wood Sent from my iPad Mark. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram - First draft and last feedback.
Feedback: Awesome, just awesome - no “but”s. I was wondering, if not even doubtful, that the next versions would be useful, because there would be so much. This version is actually possibly more useful than previous ones. Not so much for finding datasets, although it is good for that; in addition, at a distance it gives you a real sense of the different sectors, and how they are connected, while the inter-sector connections are visualised. Of course it helps I have a 30” screen, so I can even read the words while looking at the whole picture, and without my glasses :-) It makes me think that perhaps I was right, and sameAs.org would have spoilt it:- we’ll see next time, I guess. Well done team! On 15 Aug 2014, at 08:07, Christian Bizer ch...@bizer.de wrote: Hi all, on July 24th, we published a Linked Open Data (LOD) Cloud diagram containing crawlable linked datasets and asked the community to point us at further datasets that our crawler has missed [1]. Lots of thanks to everybody that did respond to our call and did enter missing datasets into the DataHub catalog [2]. Based on your feedback, we have now drawn a draft version of the LOD cloud containing: 1.the datasets that our crawler discovered 2.the datasets that did not allow crawling 3.the datasets you pointed us at. The new version of the cloud altogether contains 558 linked datasets which are connected by altogether 2883 link sets. As we were pointed at quite a number of linguistic datasets [3], we added linguistic data as a new category to the diagram. The current draft version of the LOD Cloud diagram is found at: http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/extendedLO DCloud/extendedCloud.png Please note that we only included datasets that are accessible via dereferencable URIs and are interlinked with other datasets. It would be great if you could check if we correctly included your datasets into the diagram and whether we missed some link sets pointing from your datasets to other datasets. If we did miss something, it would be great if you could point us at what we have missed and update your entry in the DataHub catalog [2] accordingly. Please send us feedback until August 20th. Afterwards, we will finalize the diagram and publish the final August 2014 version. Cheers, Chris, Max and Heiko -- Prof. Dr. Christian Bizer Data and Web Science Research Group Universität Mannheim, Germany ch...@informatik.uni-mannheim.de www.bizer.de -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram - First draft and last feedback. - sameAs.org
Hi Chris, On 15 Aug 2014, at 11:15, Christian Bizer ch...@bizer.de wrote: Hi Hugh, thank you very much for your positive feedback. Richly deserved. Yes, we decided not to include sameAs.org as we understand it to be more a service that works on top of the LOD cloud than an actual dataset that contributes additional data to the cloud. We hope that this interpretation is OK with you. It is certainly OK leaving it out. But I don’t agree it does not contribute additional data to the cloud. It publishes millions of triples that are not available (all the inferred sameAs triples), and they would be very hard for people to construct themselves, as they are cross-domain. It also bridges gaps between different equivalence predicates - although of course some people won’t want that! In that sense the main sameAs.org store is a search engine, and provides discovery that would be practically impossible to do any other way. Anyway, encouraged by Kingsley ( :-) ), I have opened all the sameAs sites up to LDSpider:- so next time the crawl is likely to get a load of them. We’ll get to see what it looks like! Best Hugh Cheers, Chris -Ursprüngliche Nachricht- Von: Hugh Glaser [mailto:h...@glasers.org] Gesendet: Freitag, 15. August 2014 11:57 An: Christian Bizer Cc: public-lod@w3.org Betreff: Re: Updated LOD Cloud Diagram - First draft and last feedback. Feedback: Awesome, just awesome - no “but”s. I was wondering, if not even doubtful, that the next versions would be useful, because there would be so much. This version is actually possibly more useful than previous ones. Not so much for finding datasets, although it is good for that; in addition, at a distance it gives you a real sense of the different sectors, and how they are connected, while the inter-sector connections are visualised. Of course it helps I have a 30” screen, so I can even read the words while looking at the whole picture, and without my glasses :-) It makes me think that perhaps I was right, and sameAs.org would have spoilt it:- we’ll see next time, I guess. Well done team! On 15 Aug 2014, at 08:07, Christian Bizer ch...@bizer.de wrote: Hi all, on July 24th, we published a Linked Open Data (LOD) Cloud diagram containing crawlable linked datasets and asked the community to point us at further datasets that our crawler has missed [1]. Lots of thanks to everybody that did respond to our call and did enter missing datasets into the DataHub catalog [2]. Based on your feedback, we have now drawn a draft version of the LOD cloud containing: 1. the datasets that our crawler discovered 2. the datasets that did not allow crawling 3. the datasets you pointed us at. The new version of the cloud altogether contains 558 linked datasets which are connected by altogether 2883 link sets. As we were pointed at quite a number of linguistic datasets [3], we added linguistic data as a new category to the diagram. The current draft version of the LOD Cloud diagram is found at: http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/exte ndedLO DCloud/extendedCloud.png Please note that we only included datasets that are accessible via dereferencable URIs and are interlinked with other datasets. It would be great if you could check if we correctly included your datasets into the diagram and whether we missed some link sets pointing from your datasets to other datasets. If we did miss something, it would be great if you could point us at what we have missed and update your entry in the DataHub catalog [2] accordingly. Please send us feedback until August 20th. Afterwards, we will finalize the diagram and publish the final August 2014 version. Cheers, Chris, Max and Heiko -- Prof. Dr. Christian Bizer Data and Web Science Research Group Universität Mannheim, Germany ch...@informatik.uni-mannheim.de www.bizer.de -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Just what *does* robots.txt mean for a LOD site?
Thanks all. OK, I can live with that. So things like Tabulator, Sig.ma and SemWeb Browsers can be expected to go through a general robots.txt Disallow, which is what I was hoping. Yes, thanks Aidan, I know I can do various User-agents, but I really just wanted to stop anything like googlebot. By the way, have I got my robots.txt right? http://ibm.rkbexplorer.com/robots.txt In particular, is the User-agent: LDSpider correct? Should I worry about case-sensitivity? Thanks again, all. Hugh On 27 Jul 2014, at 19:23, Gannon Dick gannon_d...@yahoo.com wrote: On Sat, 7/26/14, aho...@dcc.uchile.cl aho...@dcc.uchile.cl wrote: The difference in opinion remains to what extent Linked Data agents need to pay attention to the robots.txt file. As many others have suggested, I buy into the idea of any agent not relying document-wise on user input being subject to robots.txt. = +1 Just a comment. Somewhere, sometime, somebody with Yahoo Mail decided that public-lod mail was spam, so every morning I dig it out because I value the content. Of course, I could wish for a Linked Data Agent which does that for me, but that would be to complete a banal or vicious cycle, depending on the circle classification scheme in use. I'm looking gor virtuous cycles and in the case of robots.txt, The lady doth protest too much, methinks. --Gannon -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Call for Linked Research
This is of course an excellent initiative. But I worry that it feels like people are talking about building stuff from scratch, or even lashing things together. Is it really the case that a typical research approach to what you are calling Linked Research doesn’t turn up theories and systems that can inform what we do? What I think you are talking about is what I think is commonly called e-Science. And there is a vast body of research on this topic. This initiative also impinges on the Open Archives/Access/Repositories movements, who are deeply concerned about how to capture all research outputs. See for example http://www.openarchives.org/ore/ In e-Science I know of http://www.myexperiment.org, for example, which has been doing what I think is very related stuff for 6 or 7 years now, with significant funding, so is a mature system. And, of course, it is compatible with all our Linked Data goodness (I hope). Eg http://www.myexperiment.org/workflows/59 We could do worse than look to see what they can do for us? And it appears that things can be skinned within the system: http://www.myexperiment.org/packs/106 You are of course right, that it is a social problem, rather than a technical problem; this is why others’ experience in solving the social problem is of great interest. Maybe myExperiment or a related system would do what you want pretty much out of the box? Note that it goes even further than you are suggesting, as it has facilities to allow other researchers to actually run the code/workflows. It would take us years to get anywhere close to this sort of thing, unless we (LD people) could find serious resources. And I suspect we would end up with something that looks very similar! Very best Hugh On 29 Jul 2014, at 10:02, Sarven Capadisli i...@csarven.ca wrote: On 2014-07-29 09:43, Andrea Perego wrote: You might consider including in your call an explicit reference to nanopublications [1] as an example of how to address point (5). About source code, there's a project, SciForge [1], working on the idea of making scientific software citable. My two cents... [1]http://nanopub.org/ [2]http://www.gfz-potsdam.de/en/research/organizational-units/technology-transfer-centres/cegit/projects/sciforge/ Thanks for the heads-up, Andrea. The article on my site has an open comment system, which is intended to have an open discussion or have suggestions for the others (like the ones you've proposed). Not that I'm opposed to continuing the discussion here, but you are welcome to contribute there so that the next person that comes along can get a hold of that information. It wasn't my intention to refer to all workshops that play nicely towards open science, vocabularies to use, exact tooling to use, or all efforts out there e.g., nanopublications. You have just cited two hyperlinks in that email. Those URLs are accessible by anything in existence that can make an HTTP GET request. Pardon my ignorance, but, why do we need off-band software when we have something that works remarkably well? -Sarven http://csarven.ca/#i -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Just what *does* robots.txt mean for a LOD site?
Hi. I’m pretty sure this discussion suggest that we (the LD community) should come try to come to some consensus of policy on exactly what it means if an agent finds a robots.txt on a Linked Data site. So I have changed the subject line - sorry Chris, it should have been changed earlier. Not an easy thing to come to, I suspect, but it seems to have become significant. Is there a more official forum for this sort of thing? On 26 Jul 2014, at 00:55, Luca Matteis lmatt...@gmail.com wrote: On Sat, Jul 26, 2014 at 1:34 AM, Hugh Glaser h...@glasers.org wrote: That sort of sums up what I want. Indeed. So I agree that robots.txt should probably not establish whether something is a linked dataset or not. To me your data is still linked data even though robots.txt is blocking access of specific types of agents, such as crawlers. Aidan, *) a Linked Dataset behind a robots.txt blacklist is not a Linked Dataset. Isn't that a bit harsh? That would be the case if the only type of agent is a crawler. But as Hugh mentioned, linked datasets can be useful simply by treating URIs as dereferenceable identifiers without following links. In Aidan’s view (I hope I am right here), it is perfectly sensible. If you start from the premise that robots.txt is intended to prohibit access be anything other than a browser with a human at it, then only humans could fetch the RDF documents. Which means that the RDF document is completely useless as a machine-interpretable semantics for the resource, since it would need a human to do some cut and paste or something to get it into a processor. It isn’t really a question of harsh - it is perfectly logical from that view of robots.txt (which isn’t our view, because we think that robots.txt is about specific types of agents”, as you say). Cheers Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram -freebase and :baseKB
Thanks Chris, Great stuff. Maybe I’ll change the robots.txt - but I may need to buy more disk space for caching before I do :-), or flush the cache more aggressively when I know spidering is happening. It is an awesome picture!! Previously I was doubtful whether the next version would give much added value, but it really does. Very best Hugh On 25 Jul 2014, at 11:12, Christian Bizer ch...@bizer.de wrote: Hi Hugh, thank you very much for your feedback :-) Yes, your data sources and all data sources in this list http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/tables/not CrawlableDatasets.tsv will reappear in the final version. Freebase is heavily interlinked from DBpedia and also gives you something back if you dereference their URIs like http://rdf.freebase.com/ns/m.0156q We will check why LDspider did not manage to retrieve data from freebase (Andreas: Thank you for your explanation on the topic) Does anybody know if :baseKB is served via dereferencable URIs and if they set any links pointing at other data sets? If yes, we would love to include them into the final version of the diagram. Cheers, Chris -Ursprüngliche Nachricht- Von: Hugh Glaser [mailto:h...@glasers.org] Gesendet: Freitag, 25. Juli 2014 01:07 An: Mike Liebhold Cc: Christian Bizer; public-lod@w3.org Betreff: Re: Updated LOD Cloud Diagram - Please enter your linked datasets into the datahub.io catalog for inclusion. Awesome achievement, Chris and team! Yes Mike, there is quite a lot missing from the LOD Cloud we have grown to know and love. Some of that is I understand because it says it only has stuff that allowed spidering (that is, robots.txt permitted it, etc.). (I notice this because it means everything I used to have in the LOC Cloud has disappeared!) However, the announcement message says that these sets will re-appear, so that is good. I don’t know if that applies to Freebase; and I think :baseKB is not there either, but maybe that doesn’t have any links. I have to say that it is not clear to me that it is good practice to refer to this image as the current/updated version of the LOD Cloud diagram”. It seems that you didn’t understand the significance of this from Chris’ message, and I suspect that you will not be alone. Best Hugh On 24 Jul 2014, at 23:39, Mike Liebhold m...@well.com wrote: I recall earlier versions of the LOD Cloud diagram included freebase - I don't see it here, - or the google knowledge graph either. am I missing something? ?? On 7/24/14, 5:18 AM, Christian Bizer wrote: Hi all, Max Schmachtenberg, Heiko Paulheim and I have crawled of the Web of Linked Data and have drawn an updated LOD Cloud diagram based on the results of the crawl. This diagram showing all linked datasets that our crawler managed to discover in April 2014 is found here: http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/LOD CloudDiagram.png We also analyzed the compliance of the different datasets with the Linked Data best practices and a paper presenting the results of the analysis is found below. The paper will appear at ISWC 2014 in the Replication, Benchmark, Data and Software Track. http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/Sc hmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf The raw data used for our analysis is found on this page: http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/ Our crawler did discover 77 dataset that do not allow crawling via their robots.txt files and these datasets were not included into our analysis and are also not included in the current version of the LOD Cloud diagram. A list of these datasets is found at http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/tab les/notCrawlableDatasets.tsv In order to give a comprehensive overview of all Linked Data sets that are currently online, we would like to draw another version of the LOD Cloud diagram including the datasets that our crawler has missed as well as the datasets that do not allow crawling. Thus, if you publish or know about linked datasets that are not in the diagram or in the list of not crawlable datasets yet, please: 1. Enter them into the datahub.io data catalog until August 8th. 2. Tag them in the catalog with the tag ‘lod’ (http://datahub.io/dataset?tags=lod) 3. Send an email to Max and Chris pointing us at the entry in the catalog. We will include all datasets into the updated version of the cloud diagram, that fulfill the following requirements: 1. Data items are accessible via dereferencable URIs. 2. The dataset sets at least 50 RDF links pointing at other datasets or at least one other dataset is setting 50 RDF links pointing at your dataset. Instructions on how to describe your dataset in the catalog are found here: https://www.w3.org/wiki
Re: Updated LOD Cloud Diagram - Missed data sources.
Hi Aiden, I think I probably agree with everything you say, but with one exception: On 25 Jul 2014, at 19:14, aho...@dcc.uchile.cl wrote: found that the crawl encountered many problems accessing the various datasets in the catalogue: robots.txt, 401s, 502s, bad conneg, 404/dead, etc. The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for a typical consumer of the dataset. By that measure, serious numbers of the web sites we all use on a daily basis are problematic. By the way, the reason this has come up for me is because I was quite happy not to be spidered for the BTC (a conscious decision), but I think that some of my datasets might be useful for people, so would prefer to see them included in the LOD Cloud. I actually didn’t submit a seed list to the BTC; but I had forgotten that we had robots.txt everywhere, so it wouldn’t have done it in any case! :-) Anyway, we just need to get around the problem, if we feel that this is all useful. So… Let’s do something about it. I’m no robots.txt expert, but I have changed the appropriate robots.txt to have: User-agent: LDSpider Allow: * User-agent: * Sitemap: http:/{}.rkbexplorer.com/sitemap.xml Disallow: /browse/ ... I wonder whether this (or something similar) is useful? I realise that it is now too late for the current activity (I assume), but I’ll just leave it all there for future stuff. Cheers -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
sameAs.org The LOD Cloud Diagram - advice please
So sameAs.org never appears in any of this stuff. That’s deliberate. The whole idea of it is that it doesn’t add to the plethora of URIs by generating new ones. But it seems that people do find it useful - I get emails from people about it, especially when it does strange things :-) sameAs.org is a service that consumes LD URIs from other PLDs and delivers the answer as LD. So it is, in fact, a proper Linked Data site. So you can do the stuff you expect with a URI like http://www.sameas.org/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh As you can see, this is not a Cool URI (I think), and as I said, I really don’t want people to think of this as the ID for a sameAs bundle, although in fact it is! So what should I do? Keep sameAs.org living outside the BTC and LOD Cloud world? Or change things so that it becomes a more normal part of the LOD world? In addition, I have quite a lot of other services that work over LD URIs to produce LD about the URI, but as with sameAs.org are also LD URIs because the service interface itself supports the LD model. Should these brought into the PR fold, and if so how? It would be very painful to allow these to be spidered, because they are heavy computations, and running the service over all the URIs that could be run would be many years of computation for my server. I suspect that others have similar services, so a policy might be useful. In fact, as LD becomes more mature (!), I think we are finding that it is more a communication mechanism between cooperating service than simply delivering RDF in response to an identifying URI - how do we capture this massive LD resource? I hope that makes some sort of sense. Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram - Missed data sources.
Very interesting. On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote: On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for a typical consumer of the dataset. By that measure, serious numbers of the web sites we all use on a daily basis are problematic. snip I think the general interpretation of the robots in robots.txt is any software agent accessing the site automatically (versus a user manually entering a URL). I had never thought this. My understanding of the agents that should respect the robots.txt is what are usually called crawlers or spiders. Primarily search engines, but also including things that aim to automatically get a whole junk of a site. Of course, there is no de jure standard, but the places I look seem to lean to my view. http://www.robotstxt.org/orig.html WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages.” https://en.wikipedia.org/wiki/Web_robot Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. “ It’s all about scale and query rate. So a php script that fetches one URI now and then is not the target for the restriction - nor indeed is my shell script that daily fetches a common page I want to save on my laptop. So, I confess, when my system trips over a dbpedia (or any other) URI and does follow-your-nose to get the RDF, it doesn’t check that the site robots.txt allows it. And I certainly don’t expect Linked Data consumers doing simple URI resolution to check my robots.txt But you are right, if I am wrong - robots.txt would make no sense in the Linked Data world, since pretty much by definition it will always be an agent doing the access. But then I think we really need a convention (User-agent: ?) that lets me tell search engines to stay away, while allowing LD apps to access the stuff they want. Best Hugh If we agree on that interpretation, a robots.txt blacklist prevents applications from following links to your site. In that case, my counter-question would be: what is the benefit of publishing your content as Linked Data (with dereferenceable URIs and rich links) if you subsequently prevent machines from discovering and accessing it automatically? Essentially you are requesting that humans (somehow) have to manually enter every URI/URL for every source, which is precisely the document-centric view we're trying to get away from. Put simply, as far as I can see, a dereferenceable URI behind a robots.txt blacklist is no longer a dereferenceable URI ... at least for a respectful software agent. Linked Data behind a robots.txt blacklist is no longer Linked Data. (This is quite clear in my mind but perhaps others might disagree.) Best, Aidan -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Updated LOD Cloud Diagram - Missed data sources.
Hi, Well, as you might guess, I can’t say I agree. Firstly, as you correctly say, if there is a robots.txt with Disallow / on the RDF on a LD site, then it effectively prohibits any LD app from accessing the LD. So clearly that can’t be what the publisher intended (the idea of publishing RDF for humans to fetch is not a big market). So what did the publisher intend? This should be what the consumer aims to comply with. If you take a pragmatic (rather than perhaps more literal) view of what someone might mean when they put such a robots.txt on a LD site, then it can only mean please only access my site in the sort of usage patterns that I might expect from a person” or similar. Secondly, I think in discussing robots, it is central to the issue to try to answer the question of what is a robot?”, which is why I included that discussion, which is linked off reference to robots on the wikipedia page that you quote, rather than just the page you quote. The systems you describe are good questions, and I would say that in the end the builders have to decide whether their system is what the publisher might have thought of as a robot. My system (if I recall correctly!), monitors what it is accessing to ensure that it does not make undue demands on the LD sites it accesses; this is just good practice, irrespective of whether there is a Disallow or not, I think. I am guessing we will just have to differ on all this! Best Hugh On 25 Jul 2014, at 22:13, aho...@dcc.uchile.cl wrote: On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting. On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote: On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for a typical consumer of the dataset. By that measure, serious numbers of the web sites we all use on a daily basis are problematic. snip I think the general interpretation of the robots in robots.txt is any software agent accessing the site automatically (versus a user manually entering a URL). I had never thought this. My understanding of the agents that should respect the robots.txt is what are usually called crawlers or spiders. Primarily search engines, but also including things that aim to automatically get a whole junk of a site. Of course, there is no de jure standard, but the places I look seem to lean to my view. http://www.robotstxt.org/orig.html WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages.” https://en.wikipedia.org/wiki/Web_robot Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. “ It’s all about scale and query rate. So a php script that fetches one URI now and then is not the target for the restriction - nor indeed is my shell script that daily fetches a common page I want to save on my laptop. So, I confess, when my system trips over a dbpedia (or any other) URI and does follow-your-nose to get the RDF, it doesn’t check that the site robots.txt allows it. And I certainly don’t expect Linked Data consumers doing simple URI resolution to check my robots.txt But you are right, if I am wrong - robots.txt would make no sense in the Linked Data world, since pretty much by definition it will always be an agent doing the access. But then I think we really need a convention (User-agent: ?) that lets me tell search engines to stay away, while allowing LD apps to access the stuff they want. Then it seems our core disagreement is on the notion of a robot, which is indeed a grey area. With respect to robots only referring to warehouses/search engines, this was indeed the primary use-case for robots.txt, but for me it's just an instance of what robots.txt is used for. Rather than focus on what is a robot, I think it's important to look at (some of the commonly quoted reasons) why people use robots.txt and what the robots.txt requests: Charles Stross claims to have provoked Koster to suggest robots.txt, after he wrote a badly-behaved web spider that caused an inadvertent denial of service attack on Koster's server. [1] Note that robots.txt has an optional Crawl-delay primitive. Other reasons: A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. [1] So moving aside from the definition of a robot, more importantly, I think a domain administrator has
Re: Updated LOD Cloud Diagram - Missed data sources.
Hi Luca, Thanks for asking. I have resources that number 100Ms and even 1Bs of resolvable URIs. I even have datasets with effectively infinite numbers of URIs. Some people seem to find them useful, in the sense that they want to look specific things up. These are not documents - they are dynamically generated RDF documents from SQL, triple or other storage mechanisms. It can be a serious cost to me in terms of server processor, network and disk cost (I do some caching to trade processor cost against disk space) to allow crawlers to try to spider serious parts or all of the dataset. Some of the documents can take several seconds of CPU to generate. (Since all this is unfunded most costs come out of my pocket, by the way.) So it may be that avoiding spiders is the difference between me offering the dataset and not - or at least it means that the service that the “real” users get is not overwhelmed by the bots. So what I want to do is make the datasets available, but I don’t want to bear the costs of having Google, Bing, or anyone else, actually crawling the site. And no, I don’t want to have anything more than URI resolution, by having people register or authenticate - I want access to be as easy as possible - URI resolution. Actually, spidering is what the sitemap (which I put work into building if one is possible) is for. Oh, and I should say that the dynamic nature of the data means that the last modified and similar headers cannot be reliable set, and so bots would find incremental spidering rather challenging. And I do think what I say applies to the web of documents. Would a web site manager really object to me having a script that occasionally got some news or weather and displayed it on a web page. By the way, I see that the standard Drupal instance puts this in the robots.txt: # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these robots where not to go on your site, # you save bandwidth and server resources. That sort of sums up what I want. But now I seem to be repeating myself :-) Best Hugh On 25 Jul 2014, at 23:23, Luca Matteis lmatt...@gmail.com wrote: Robots.txt to me works well for a web of documents. That is, wanting only humans to access certain resources. But for a web of data, why resort to a robots.txt when you could simply not put the resource online in the first place? On Fri, Jul 25, 2014 at 11:54 PM, Hugh Glaser h...@glasers.org wrote: Hi, Well, as you might guess, I can’t say I agree. Firstly, as you correctly say, if there is a robots.txt with Disallow / on the RDF on a LD site, then it effectively prohibits any LD app from accessing the LD. So clearly that can’t be what the publisher intended (the idea of publishing RDF for humans to fetch is not a big market). So what did the publisher intend? This should be what the consumer aims to comply with. If you take a pragmatic (rather than perhaps more literal) view of what someone might mean when they put such a robots.txt on a LD site, then it can only mean please only access my site in the sort of usage patterns that I might expect from a person” or similar. Secondly, I think in discussing robots, it is central to the issue to try to answer the question of what is a robot?”, which is why I included that discussion, which is linked off reference to robots on the wikipedia page that you quote, rather than just the page you quote. The systems you describe are good questions, and I would say that in the end the builders have to decide whether their system is what the publisher might have thought of as a robot. My system (if I recall correctly!), monitors what it is accessing to ensure that it does not make undue demands on the LD sites it accesses; this is just good practice, irrespective of whether there is a Disallow or not, I think. I am guessing we will just have to differ on all this! Best Hugh On 25 Jul 2014, at 22:13, aho...@dcc.uchile.cl wrote: On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting. On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote: On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for a typical consumer of the dataset. By that measure, serious numbers of the web sites we all use on a daily basis are problematic. snip I think the general interpretation of the robots in robots.txt is any software agent accessing the site automatically (versus a user manually entering a URL). I had never thought this. My understanding of the agents that should respect the robots.txt is what are usually called crawlers or spiders. Primarily search engines, but also including things that aim to automatically get a whole junk of a site. Of course
Re: Education
Thanks Sarven, Sounds like you have flesh on very much the sort of thing I was thinking. And in fact you are reporting success too, which is great. And yes, definitely turtle/n3, and even the command line world too! Very best Hugh On 12 Jul 2014, at 13:38, Sarven Capadisli i...@csarven.ca wrote: On 2014-07-12 13:02, Hugh Glaser wrote: The other day I was asked if I would like to run a Java module for some Physics Astronomy students. I am so far from plain Java and that sort of thing now there was almost a cognitive dissonance. But it did cause me to ponder on about what I would do for such a requirement, given a blank sheet. For people whose discipline is not primarily technical, what would a syllabus look like around Linked Data as a focus, but also causing them to learn lots about how to just do stuff on computers? How to use a Linked Data store service as schemaless storage: bit of intro to triples as simply a primitive representation format; scripting for data transformation into triples - Ruby, Python, PHP, awk or whatever; scripting for http access for http put, delete to store; simple store query for service access (over http get); scripting for data post-processing, plus interaction with any data analytic tools; scripting for presentation in html or through visualisation tools. It would be interesting for scientists and, even more, social scientists, archeologists, etc (alongside their statistical package stuff or whatever). I think it would be really exciting for them, and they would get a lot of skills on the way - and of course they would learn to access all this Open Data stuff, which is becoming so important. I’m not sure they would go for it ;-) Just some thoughts. And does anyone knows of such modules, or even is teaching them? Best Hugh Hi Hugh, I teach a few introductory lectures on Linked Data, HTTP, URI, RDF, SPARQL as part of a Web and Internet Technologies course to students in Business IT at the Bern University of Applied Sciences. Majority of the students do not have a developer profile. Focus of the lessons is not about the inner technical details of these technologies, but via some practical work, what they can take away: understanding some publishing and consuming challenges for data on the Web, and potentially communicating problems and solutions to their colleagues with technical expertise in the future. What I have observed: * Before going any further, examples on the state of things and the potentials of what can be accomplished is vital. If they are not remotely excited, it sets the tone for the remainder of the lectures. * At first they do not completely take the importance of HTTP/URI seriously. They've seen them, they know mentality. The exercises around that is about designing their own URI patterns for their site/profile, and repeating the importance of Cool URIs and what that entails over and over. * Majority of the students understand the RDF data model and can express statements (either using human language or one of the formats). I usually bounce back and forth between drawing graphs on the board, and showing, dereferencing, browsing RDF resources, and pointing at people and objects in and outside of the room. * As far as their comprehension for the formats i.e., how to write some statements that's mostly syntactically valid, Turtle/N-Triples lead the pack. RDF/XML and RDFa usually turn out to be a disaster. Most do not bother with JSON(-LD). * Once they get the hang of Turtle, they do relatively well in SPARQL. I've noticed that it is via SPARQL examples, trials and errors, they really get the potential of Linked Data. Along the way, it appears to reassure them that RDF and friends are powerful and will come in handy. IMHO: Although I welcome them to use any format for exercises and whatnot, I encourage them to use Turtle or N-Triples. I tell them that learning Turtle is the best investment because they can use that knowledge towards SPARQL. However, Turtle comes with a few syntactical traps and declarations, that, I secretly wish that they use N-Triples instead to learn to create statements for the sake of simplicity. After all, N-Triples is as WYSIWYG as it gets! With a blank slate: In most cases: I have a strong bias towards *nix command-line toolbox and shell scripting over alternative programming languages. *Out of the box*, the shell environment is remarkable and indispensable. The documentation is baked in. Working in this environment leads to some design decisions as described in http://www.faqs.org/docs/artu/ch01s06.html. One can do everything from data processing, transformations, inspection, analysis to parallelization here. Besides, it is the perfect glue for everything else. -Sarven http://csarven.ca/#i -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533
Education
The other day I was asked if I would like to run a Java module for some Physics Astronomy students. I am so far from plain Java and that sort of thing now there was almost a cognitive dissonance. But it did cause me to ponder on about what I would do for such a requirement, given a blank sheet. For people whose discipline is not primarily technical, what would a syllabus look like around Linked Data as a focus, but also causing them to learn lots about how to just do stuff on computers? How to use a Linked Data store service as schemaless storage: bit of intro to triples as simply a primitive representation format; scripting for data transformation into triples - Ruby, Python, PHP, awk or whatever; scripting for http access for http put, delete to store; simple store query for service access (over http get); scripting for data post-processing, plus interaction with any data analytic tools; scripting for presentation in html or through visualisation tools. It would be interesting for scientists and, even more, social scientists, archeologists, etc (alongside their statistical package stuff or whatever). I think it would be really exciting for them, and they would get a lot of skills on the way - and of course they would learn to access all this Open Data stuff, which is becoming so important. I’m not sure they would go for it ;-) Just some thoughts. And does anyone knows of such modules, or even is teaching them? Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Alternative Linked Data principles
Something like doing Linked Data over P2P networks, that is, using distributed hash tables? You might like to have a look at some of that research (you can probably google better than I can). It weakens the provenance etc of the DNS angle, but of course then enables others to publish Linked Data against identifiers where they don’t have the DNS. There are various people who have been interested in it over the years - I have had interesting discussions with some. I played with putting some RDF on bittorrent using IDs that were Linked Data http URIs a while ago (in fact they may still be out there!), but it seems we have enough problems trying to get the http world working before trying other frameworks :-) Hugh On 28 Apr 2014, at 16:55, Luca Matteis lmatt...@gmail.com wrote: Thanks John but not really. I was specifically looking for research that wasn't based on protocols such as HTTP, URIs and RDF. But that is still in the field of achieving a global interconnected database. I know webby standards are implemented so no need to reinvent the wheel, but I think it's healthy to look things from a different prospective; who knows maybe UDP works better for achieving federated queries. Or maybe triples aren't really the only way to represent the real world. Luca On Mon, Apr 28, 2014 at 5:41 PM, John Erickson olyerick...@gmail.com wrote: Luca, I think you are not asking quite the right question; I think what you want to ask is whether the Linked Data Principles can be applied to different... * entity identifiers... * protocols with which to resolve and retrieve information about those entities * protocols with which to retrieve manifestations of resources associated with those named entities... * file formats with which to serialize manifestations of resources... * standards for modelling relationships between entities... The value of the Linked Data Principles as bound to Webby standards is that they are specific and readily implemented; no make believe... John On Mon, Apr 28, 2014 at 11:23 AM, Luca Matteis lmatt...@gmail.com wrote: The current Linked Data principles rely on specific standards and protocols such as HTTP, URIs and RDF/SPARQL. Because I think it's healthy to look at things from a different prospective, I was wondering whether the same idea of a global interlinked database (LOD cloud) was portrayed using other principles, perhaps based on different protocols and mechanisms. Thanks, Luca -- John S. Erickson, Ph.D. Deputy Director, Web Science Research Center Tetherless World Constellation (RPI) http://tw.rpi.edu olyerick...@gmail.com Twitter Skype: olyerickson -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Encoding an incomplete date as xsd:dateTime
It may be worth reminding ourselves (?) that in RDF you can use “all of the above”. That is, if you want to make your RDF as consumable as possible, you may well represent the same stuff using more than one ontology. This is true of lots of things, such as peoples’ names or different bibliographic ontologies (although sometimes subproperty will do), but is particularly true for date and time. http://www.w3.org/TR/owl-time/ has an example at the end: :meetingStart a :Instant ; :inDateTime :meetingStartDescription ; :inXSDDateTime 2006-01-01T10:30:00-5:00 . :meetingStartDescription a :DateTimeDescription ; :unitType :unitMinute ; :minute 30 ; :hour 10 ; :day 1 ; :dayOfWeek :Sunday ; :dayOfYear 1 ; :week 1 ; :month 1 ; :timeZone tz-us:EST ; :year 2006 . and it might well be the case that you would have both (and other representations) in your dataset, and of course you would only assert the bits of the second representation that were appropriate. Of course, that doesn’t answer your original question, Heiko (sorry!), about what the xsd version should look like. Hugh On 10 Feb 2014, at 15:53, Niklas Lindström lindstr...@gmail.com wrote: Hi Heiko, Unless you want to use another ontology (e.g. BIO [1][2] or schema.org [3]), I'd probably go ahead and break that contract, although it is not technically safe (AFAIK, it's a violation of OWL semantics). It depends on the expected consumption of your data. I would say that the vcard ontology formally needs to be fixed to allow for more variation. It actually seems to have been amended somewhat in 2010 [4], to at least not require the exact second (or fraction thereof) of the birth. But that's hardly enough. A lot of the point of datatyped literals in RDF is lost when datatype properties are locked down like this. Cheers, Niklas [1]: http://vocab.org/bio/0.1/.html [2]: http://wiki.foaf-project.org/w/BirthdayIssue [3]: http://schema.org/birthDate [4]: http://www.w3.org/Submission/2010/SUBM-vcard-rdf-20100120/ On Mon, Feb 10, 2014 at 3:55 PM, Heiko Paulheim he...@informatik.uni-mannheim.de wrote: Hi Jerven, this looks like a pragmatic solution. But I wonder if it may lead to any conflicts, e.g., the vcard ontology defines the bday property with xsd:dateTime as its range explicitly. Is it safe to simply use an xsd:gYear value as its object? Best, Heiko Am 10.02.2014 15:43, schrieb Jerven Bolleman: Hi Heiko, http://www.w3.org/TR/xmlschema-2/#gYear and http://www.w3.org/TR/xmlschema-2/#gYeargYearMonth are the datatypes that you should use. Regards, Jerven On 10 Feb 2014, at 15:37, Heiko Paulheim he...@informatik.uni-mannheim.de wrote: Hi all, xsd:dateTime and xsd:date are used frequently for encoding dates in RDF, e.g., for birthdays in the vcard ontology [1]. Is there any best practice to encode incomplete date information, e.g., if only the birth *year* of a person is known? As far as I can see, the XSD spec enforces the provision of all date components [2], but 1997-01-01 seems like a semantically wrong way of expressing that someone is born in 1997, but the author does not know exactly when. Thanks, Heiko [1] http://www.w3.org/2006/vcard/ns [2] http://www.w3.org/TR/xmlschema-2/#dateTime [3] http://www.w3.org/TR/xmlschema-2/#date -- Dr. Heiko Paulheim Research Group Data and Web Science University of Mannheim Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: he...@informatik.uni-mannheim.de Web: www.heikopaulheim.com -- Dr. Heiko Paulheim Research Group Data and Web Science University of Mannheim Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: he...@informatik.uni-mannheim.de Web: www.heikopaulheim.com -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Extracting URIs - rapper --trace?
Hi. I wanted to extract the URIs from some rdf, and it struck me that rapper probably did it for me. And yes, that is what the -t/—trace flag says it does, I think. -t, --trace Print URIs retrieved during parsing. Especially useful for monitor- ing what the guess and GRDDL parsers are doing.” But I can’t get it to make any difference - am I doing something wrong, please? Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: HTTPS for RDF URIs?
On 31 Jan 2014, at 11:29, ☮ elf Pavlik ☮ perpetual-trip...@wwelves.org wrote: On 01/30/2014 09:10 PM, Kingsley Idehen wrote: On 1/30/14 1:09 PM, Melvin Carvalho wrote: If not bad, is there any provision for allowing that an HTTPS URI that only differs in the scheme part from HTTPS URI be identified as the same resource? http and https are fundamentally different resources, but you can link them together with owl : sameAs, I think ... Yes. You simply use an http://www.w3.org/2002/07/owl#sameAs relation to indicate that a common entity is denoted [1] by the http: and https: scheme URIs in question. does it make sense then to use https: IRIs if we state that one can treat http: version as equivalent? Yes. Because you get a different description of the NIR back from the other URI. I’m tempted to say that the s after the http is no different to adding an s to the end - they are both valid URIs, and so simply opaque identifiers. But someone will probably tell me that is too sloppy :-) On the other hand, if I was to be pedantic, adding the s to the end of the http does take you out of Linked Data (although it is Semantic Web) according to the Principles. But I have never let a little thing like that bother me. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: HTTPS for RDF URIs?
On 31 Jan 2014, at 12:46, Alfredo Serafini ser...@gmail.com wrote: Hi all regarding opaque uri: maybe a difference in the scheme could be seen as a complementary to a different type extension. If i'm referring for example to the resource http://wiki/page.html or http://wiki/page.rdf i probably expect two different representation on the same resource, from a technical REST-like approach. Should we interpet also those as opaques? Sorry if this is probably a sort of recurring question. If the formats for type extension are acceptable, the best would be in using also the schem much like in the same way. For example I suppose that I could have also have something like: file://wiki/page.html, for a local copy. Is this acceptable in theory? Well, it is a URI, and as Kingsley says, denotes the resource it denotes, which may of may not be the same as anything else in this world. So you could do this, if you wanted. And you could use mailto for your local URI and ftp or http for the public one for a resource which is an email address - or vice versa. But it is all in your mind (or some more complex RDF and OWL if you want to - as elf says, these things are all opaque. But I’m afraid you just crossed a line for me. http://www.w3.org/DesignIssues/LinkedData.html says (number 2): Use HTTP URIs so that people can look up those names.” and all the other versions have something similar. I can happily accept that “HTTP” in this is a shorthand for “HTTP or HTTPS”, since they perform very similarly in terms of why the Principles specify HTTP. But if you move to “file:”, then you have lost all the things that Principle 2 was aiming for. Of course, if this discussion was happening on the Semantic Web list, I would not make these comments (or at least not the same way); but this is the LOD list, and I think that globally-recognised identifiers are more de rigueur as a sine qua non, to use a couple of English phrases :-) Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: HTTPS for RDF URIs?
And of course I would be happy to host such triples at sameAs.org :-) (And maybe a separate store that was devoted only to such triples would be a useful idea?) Just send me them to tell me where they are… Best On 30 Jan 2014, at 18:09, Melvin Carvalho melvincarva...@gmail.com wrote: On 29 January 2014 22:36, Maloney, Christopher (NIH/NLM/NCBI) [C] malon...@ncbi.nlm.nih.gov wrote: Apologies if this topic has come up before (I feel certain that it has) but I've searched the archives and Googled, and can't find anything (maybe too many false positives). What are the current best practice recommendations regarding the use of HTTPS URIs for resources in RDF? Are they bad? No, they are good. We use them over at https://w3id.org/ (one of the reasons that was created) If not bad, is there any provision for allowing that an HTTPS URI that only differs in the scheme part from HTTPS URI be identified as the same resource? http and https are fundamentally different resources, but you can link them together with owl : sameAs, I think ... Thanks! Chris Maloney NIH/NLM/NCBI (Contractor) Building 45, 5AN.24D-22 301-594-2842 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: LOD publishing question
Hi Giovanni, Thank you for the update. I am sorry to hear that Sindice is going into a frozen state, and that circumstances are making that happen, but of course pleased that you are able to keep it going at all. I send you and your team my personal thanks for the service you have provided over the last 5 or so years, and wish you all well. Very best Hugh. On 28 Jan 2014, at 14:19, Giovanni Tummarello g.tummare...@gmail.com wrote: With respect to Sindice for a number of reasons, the people who originally created it, the former Data Intensive Infrastructure group, are either not working in the original institution hosting it, National University of Ireland Galway, institute formerly known as DERI or have been assigned to other tasks. Sindice has been operating for 5+ years, updating its index, (though we were never perfect) and we believe supported a lot of works on the field, but its now time to move on. In the meanwhile the project will continue answer queries but without updating its index. Apologies for the inconvenience of course, we'll be posting on this soon and update the homepage to reflect the change. Giovanni On Tue, Jan 28, 2014 at 11:27 AM, Hugh Glaser h...@glasers.org wrote: Good question. I’ll report what I found, rather than advising. So I went there when you published that email, looking for stuff to put in my sameas.org site. I tried exploring, and when I went to Browse I only found a few things, so wasn’t encouraged :-) (And, as an aside, Advanced Search didn’t seem to do anything, and the search links at the bottom were not links.) So I decided that it wasn’t really mature enough to make it worth the effort (yet?), even though there should be massive scope for linkage eventually. But the real problem was that I couldn’t find any Linked Data, or even an RDF store. The URIs you use are not very Cool URIs, and I tried to see if there was RDF at the end of them by doing Content Negotiation, but there wasn’t. I am thinking of things like http://tundra.csd.sc.edu/rol/view-person.php?id=291 So I went away :-) For people like me, you could put something about how to see the RDF in an About page (or if it is there, make it easier to find). You only get one chance to snare people on the web, after all. Of course as Alfredo says, for spidering search engines, and it would have helped me too, you need robots.txt (which I couldn’t find either), sitemap, sitemap.xml, voiD description. Good luck! Hugh On 28 Jan 2014, at 04:12, WILDER, COLIN wilde...@mailbox.sc.edu wrote: Another question to you very helpful people– and apologies again for semi cross-posting Our LOD working group is having trouble publishing our data (see email below) in RDF form. Our programmer, a master’s student, who is working under the supervision of myself and a computer science professor, has mapped sample data into RDF, has the triplestore on a D2RQ server (software) on our server and has set up a SPARQL end-point on the latter. But he has been unsuccessful so far getting 3 candidate semantic web search engines (Falcons, Swoogle and Sindice) to be able to find our data when he puts a test query in to them. He has tried communicating with the people who run these, but to little avail. Any suggestions about sources of information, pointers, best practices for this actual process of publishing LOD? Or, if you know of problems with any of those three search engines and would suggest a different candidate, that would be great too. Thanks again, Colin Wilder From: WILDER, COLIN [mailto:wilde...@mailbox.sc.edu] Sent: Thursday, January 16, 2014 11:51 AM To: 'public-lod@w3.org' Subject: LOD for historical humanities information about people and texts To the many people who have kindly responded to my recent email: Thanks for your suggestions and clarifying questions. To explain a bit better, we have a data curation platform called RL, which is a large, complex web-based MySQL database designed for users to be able to simply input, store and share data about social and textual networks with each other, or to share it globally in RL’s data commons. The data involved are individual data items, such as info about one person’s name, age, a book title, a specific social relationship, etc. The entity types (in the ordinary-language sense of actors and objects, not in the database tabular sense) can be seen athttp://tundra.csd.sc.edu/rol/browse.php. The data commons in RL is basically a subset of user data that users have elected (irrevocably) to share with all other users of the system. NB there is a lot of dummy data in the data commons right now because of testing. We are designing an expansion of RL’s functionality so as to publish data from the data commons as LOD, so I am doing some preliminary work to assess feasibility and fit
Re: General tuning for Dbpedia Spotlight
Thank you for the responses, both on- and off-list. So I see perhaps I should recast my question, with maybe wider scope. I have a load of abstract-style text fragments - that is perhaps 100 words each, on a wide variety of topics, although there is a bit of a technical bent. I want to be able to do linkage between them and to other things, based around our lovely Linked Data world. That is, have lots triples something like :docIDn :some-pred :conceptURI It would be a bonus to know which words in the text triggered the generation of the triple. Of course, the system doesn’t actually have to generate the triples - I can build them if I get sufficiently sensible output, including the sort of html output that Spotlight does. And because it goes automatically to users, I need quite high precision, even if recall suffers (I think is the terminology). Oh, and ideally free, although not necessarily. My current preference is for dbpedia or freebase URIs, but wordnet is probably OK too. I think this must be something that there are people who have done this (a lot). Or at least there should be. There are certainly quite a lot of systems that can do it, some more or less playing well with Linked Data URIs. I think my problem (apart from laziness) is that the systems I look at seem to want me to care about what they do, or at least engage with tuning and things, which means I need some understanding of what they do, which I don’t have (and I probably don’t care either :-) ). So, does anyone (else) feel they can point me at a system for doing this that I can just use out of the box (possibly having been told some parameters to use)? Of course, maybe I am just asking too much of the technology at the moment, but I can hope! Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
General tuning for Dbpedia Spotlight
Hi. I am trying to use Dbpedia Spotlight to find stuff in arbitrary English texts. Following the instructions, I found it very easy to download and install the whole shebang on my Mac laptop - thanks! It does pretty well in finding stuff, but gets some strange things wrong for me (choosing people called Monday instead of the day of the week, for example, or Municipalities of Germany for Municipalities). That’s fine - I understand that there is always a precision/recall thing going on. But I want to use it to mark up web pages, so having even a small number of strange links is not too good. So my question is: What are the parameters I should set to get a set of results with high precision (even if low recall) for arbitrary English text? I assume that I need to set Confidence and Annotation Score, and probably some Types. Related to this, I am using the Lucene version. I see there is a Statistical version, but can’t work out what the difference might be. Should I be using that to get more precise results? Sorry if this is somewhere in the docs, but I couldn’t find it easily. My guess is that this is something that quite a few people have been through? I am using it from php via http, if anyone can actually provide the code! :-) Best Hugh
Re: State of Open Source Semantic Web CMS
Hi Christof, On 27 Dec 2013, at 15:57, Christoph Seelus christoph.see...@fh-potsdam.de wrote: # State of Open Source Semantic Web CMS Hello there, back in october, I asked here for Semantic Web CMS, written in PHP. The response I got on this list and directly via mail was great, so thanks again. At the moment, I'm writing a paper, regarding the state of Open Source CMS with Semantic Web support in general. Again: The final goal is to use our (or any) OWL-based ontology (http://isdc.gfz-potsdam.de/ontology/isdc_1.4.owl) as a knowledge foundation in a content management system, which would enable us to enrich available data with Linked Open Data. My list so far contains the following projects: - Drupal (https://drupal.org/) - OntoWiki (http://ontowiki.net) - Ximdex (http://www.ximdex.com/) - Dspace (http://www.dspace.org/) Can you point me at where the Semantic Web bit on Dspace is documented please, as I can’t find it? Also, you may want to include http://www.eprints.org which does Linked Data against http://www.eprints.org/ontology/ And in fact I have a SPARQL endpoint etc for all the ePrints RDF I can harvest. http://foreign.rkbexplorer.com/ Best Hugh - (DCThera, not released to the public yet) Any suggestions of other systems I overlooked? Thanks and best regards, Christoph -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
UK Photonics Portal
Hi. You might like to be able to point at this new example of a site built around Linked Data from multiple sources: http://www.ukphotonics.org We (Seme4) have built it over the last few months, funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC) Centre for Innovative Manufacturing in Photonics at the University of Southampton. Press release: http://www.southampton.ac.uk/mediacentre/news/2013/dec/13_224.shtml Best Hugh
Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
On 2 Dec 2013, at 06:24, Ross Horne ross.ho...@gmail.com wrote: Andy is right (as usual!). With the proposed bnode encoding, the graph becomes fatter each time the same triple is loaded. But how much fatter was the question. RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer and the parser, as summarised by David recently: http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html Ah yes, I forgot that everything is rosy now with 1.1 - sorry. Please don't get back into mixing up the lexer and the parser. The lexical spaces of the basic datatypes are disjoint, so in any language we can just write: - 999 instead of 999^^xsd:integer - 9.99 instead of 9.99^^xsd:decimal - WWV instead of WWV^^xsd:string - 2013-06-6T11:00:00+01:00 instead of 2013-06-6T11:00:00+01:00^^xsd:dateTime As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and turns the characters into a token. A token consists of a string, called an attribute value, plus a token name, e.g. 999^^xsd:integer. Only a relatively small handful of people writing compilers for languages should have to care about how tokens are represented, not end users of languages. Well personally I prefer the first version I used for my course on this when it came out in 1977, the Dragon Book - Principles of Compiler Design, before Sethi polluted it with all that type-checking stuff :-) Actually, it wasn’t about blurring the lexer and parser - the graph semantics were different. It was closer to having two representations of zero in the machine (as some machines used to have), and having to write code to ensure that you coped with both of them. Of course your examples do raise the issue of multiple representations for the same thing if the user is not careful. 23.4, 23.5, 23.0, 23.2, 23, 23.1, 023.0, 023 all of which are different RDF terms. Would a lexer/parser make 23.00 and 23.000 different RDF terms, I find myself thinking I should know, but don’t - my guess is it should. (RDF 1.1 doesn’t seem to give guidance on this.) And I find myself getting strangely interested in your dateTime example. I think most lexers will reject it? Or friendly ones will treat it as the correct lexical form: 2013-06-06T11:00:00+01:00 (You need to pad the day) So maybe we need to get a bit more explicit about the RDF term for dateTime (unless I have missed it)? That the RDF term is always in UTC? - This is what the xdd standard says. That the RDF term always has a fractional second part? - Good question. That the RDF term always has a timezone? - Better question. (See http://www.w3.org/TR/xmlschema-2/#dateTime ) Or are we happy with many different representations of a given dateTime? (Of course xsd:dateTime does get into problems with year zero, but lets not worry about that :-) ) But I guess my friendly RDF parser gnomes (all hail!) already have stories for all this. Best Hugh For language tags, a little simple conventional datatype subtyping (as opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a programmer that writes regex(WWV2013@en, WWV) clearly meant regex(WWV2013, WWV) and shouldn't have to care about the distinction, unless I am mistaken. Regards, Ross [1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and tools. 1986 [2] Local Type Checking for Linked Data Consumers. http:/dx.doi.org/10.4204/EPTCS.123.4 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: An
Thanks Andy, Sorry, I had a brain-fart (senior moment?), and forgot that we were dealing with RDF 1.1. I guess I have suffered the pain of unknown presence of datatypes in the RDF terms for literals for so long it takes a while for me to accept that it has been fixed. Thanks so much to the people that did it. Using the bnode solution would be like bringing back the complexity of the optional datatype, which would bring back the pain! Best Hugh On 2 Dec 2013, at 11:04, Andy Seaborne andy.seabo...@epimorphics.com wrote: On 01/12/13 23:02, Hugh Glaser wrote: Hi. Thanks. A bit of help please :-) On 1 Dec 2013, at 17:36, Andy Seaborne andy.seabo...@epimorphics.com wrote: On 01/12/13 12:25, Tim Berners-Lee wrote: On 2013-11 -23, at 12:21, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: [...] This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins) is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to. But I have always preferred it with not 2 extra triples, just one: :x :p [ lang:en cat ] which allows you also to write things like :x :p [ lang:en cat] , [ lang:fr chat ]. or if you use the ^ back-path syntax of N3 (which was not taken up in turtle), :x :p cat^lang:en, chat^lang:fr . You can do the same with datatypes: :x :q 2013-11-25^xsd:date . instead of :x :q 2013-11-25^xsd:date . This seems to bring it it's own issues. These bnodes seem to be like untidy literals as considered in RDF-2004 WG. :x :p [ lang:en cat ] :x :p [ lang:en cat ] :x :p [ lang:en cat ] is 6 triples. :x :p :q . :x :p :q . :x :p :q . is 1 triple. Repeated read in same file - this already causes confusion. :x :p cat . :x :p cat . :x :p cat . is 1 triple or is it 3 triples because it's really Is it not 1 triple if you take the first view or 6 triples if you take the second? Or probably I don’t understand bnodes properly!? :x :p [ xsd:string cat ]. :x :p 123 . :x :p 123 . :x :p 123 . It makes it hard to ask do X and Y have the same value for :p? - it gets messy to consider all the cases of triple patterns that arise and I would not want to push that burden back onto the application writer. Why can't the app writer say find me all things which a property value less than 45? I see it makes it hard, but I don’t see it as any harder than what we have now, with multiple patterns that do and don’t have ^^xsd:String As I said before, with the ^^xsd you need to consider a bunch of patterns to do the query - again, it is messy, but is it messier? Actually I find { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . } with a possible also { ?s1 ?p ?str . ?s2 ?p ?str . } Let's talk numbers (strings have a lexical form that looks like the value) and have 123 as shorthand for [ xsd:integer 123 ]. And let's ignore rdf:langString. { ?s1 ?p ?x . ?s2 ?p ?x . } does not care whether ?x is a URI or a literal at the moment. Your example is a good one as it's ?p so the engine does not know whether it's a datatype property or a object property. With bnodes this may match, it probably doesn't. It depends on the micro-detail of the data. # No. :x1 :p 123 . :x2 :p 123 . # Yes :s1 :p _:a . :s2 :p _:a _:a xsd:string abc . Sure, if you know it's an integer ?s1 ?p [ xsd:integer ?str ] or even: { ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . } { ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . } though I think this is shifting unnecessary cognitive model onto the app writer. I didn't say the access language was SPARQL :-) I meant how people think about accessing the data. Datatype properties are really very bizarre in this world. And this is at the fine grain level. Now apply to real queries that are 10s of lines long. { ?s1 ?p [ xsd:integer 123 ] } { ?s1 ?p 123 } it might be possible to make that bNode infer to the value 123 which would be a win. Making literals value-centric not appearance/struct based would be a very nice. And counting. Counting matters to people (e.g. facetted browse) Andy PS I started my first email draft with the argument that it was better to have the more triples form
Understanding datatypes in RDF 1.1 - was various things
Hmm, My head is spinning a bit now - I’m trying to understand something simple - 1^^xsd:boolean. So my reading says that is a valid lexical form (in the lexical space) for the value ’true’ (in the value space). (http://www.w3.org/TR/rdf11-concepts/#dfn-lexical-space ) I think that ‘value space’ is where the other documents talk about 'RDF term’, but I’m not sure. And I also I read: Literal term equality: Two literals are term-equal (the same RDF literal) if and only if the two lexical forms, the two datatype IRIs, and the two language tags (if any) compare equal, character by character.” (http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal ) So the language processor will (must) take my lexical form 1^^xsd:boolean and make it an RDF term “true And then if I ask the store (sorry, I am rather engineering in this) if 2 terms are equal, it will always be comparing two similar terms (from the literal space), (probably, but see below): “true^^xsd:boolean And I can expect a sensible querying engine to consider 1^^xsd:boolean as a shorthand for “true It could be confusing, which it was for a bit for me, because the equality constraint says the two lexical forms”, but in this case there is more than one lexical from for the value form. So I think it means that a processor must always choose the same lexical form for any given value form. I am guessing that processors could consistently choose 1^^xsd:boolean as the value form for “true but that would be pretty perverse. A little further confusion for me arises as to whether the datatype IRI is part of the value space. I have taken off any ^^xsd:boolean from my rendering of the “true” in the value space because the documentation seems to leave it out. (The table says: '“true”, xsd:boolean’ and ‘true’ are the literal and value.) So I am left assuming that the datatype IRI is somewhere in the RDF term world, although we know it isn’t in the graph. Not something I need to worry about as a consumer, as it is all an internal issue, I think, but I thought I would mention it. Best Hugh -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
Hi. Thanks. A bit of help please :-) On 1 Dec 2013, at 17:36, Andy Seaborne andy.seabo...@epimorphics.com wrote: On 01/12/13 12:25, Tim Berners-Lee wrote: On 2013-11 -23, at 12:21, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: [...] This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins) is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to. But I have always preferred it with not 2 extra triples, just one: :x :p [ lang:en cat ] which allows you also to write things like :x :p [ lang:en cat] , [ lang:fr chat ]. or if you use the ^ back-path syntax of N3 (which was not taken up in turtle), :x :p cat^lang:en, chat^lang:fr . You can do the same with datatypes: :x :q 2013-11-25^xsd:date . instead of :x :q 2013-11-25^xsd:date . This seems to bring it it's own issues. These bnodes seem to be like untidy literals as considered in RDF-2004 WG. :x :p [ lang:en cat ] :x :p [ lang:en cat ] :x :p [ lang:en cat ] is 6 triples. :x :p :q . :x :p :q . :x :p :q . is 1 triple. Repeated read in same file - this already causes confusion. :x :p cat . :x :p cat . :x :p cat . is 1 triple or is it 3 triples because it's really Is it not 1 triple if you take the first view or 6 triples if you take the second? Or probably I don’t understand bnodes properly!? :x :p [ xsd:string cat ]. :x :p 123 . :x :p 123 . :x :p 123 . It makes it hard to ask do X and Y have the same value for :p? - it gets messy to consider all the cases of triple patterns that arise and I would not want to push that burden back onto the application writer. Why can't the app writer say find me all things which a property value less than 45? I see it makes it hard, but I don’t see it as any harder than what we have now, with multiple patterns that do and don’t have ^^xsd:String As I said before, with the ^^xsd you need to consider a bunch of patterns to do the query - again, it is messy, but is it messier? Actually I find { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . } with a possible also { ?s1 ?p ?str . ?s2 ?p ?str . } much easier to work with than something that has this stuff optionally tacked on the end of literals, that isn’t really part of the string but isn’t part of RDF either. Or maybe it is part of the literal but not the string? Surely that should be clear to me? I just don’t see there is a difference in complexity for querying - it is just that the current situation is genuinely messier for consumers because there are two notations in play, whereas if RDF is so good we should have everything in RDF. Not that I would say anything should change :-) it ain’t actually broken, but it could get fixed. (Oh dear, Hugh showing his ignorance of the fancy stuff again) Best Hugh To give that, if we add interpretation of bNodes used in this value form (datatype properties vs object properties ?), so you can ask about shared values, we have made them tidy again. But then it is little different from structured literals with @lang and ^^datatype. Having the data model and the access model different does not gain anything. The data model should reflect the way the data is accessed. Like RDF lists, or seq/alt/bag, encoding values in triples is attractive in its uniformity but the triples nature always shows through somewhere, making something else complicated. Andy PS Graph leaning does not help because you can't add data incrementally if leaning is applied at each addition. I suggested way back these properties as a way of putting the info into the graph but my suggestion was not adopted. I think it would have made the model more complete which would have been a good think, though SPARQL would need to have language-independent query matching as a special case -- but it does now too really. (These are interpretation properties. I must really update http://www.w3.org/DesignIssues/InterpretationProperties.html) Units are fun as properties too. http://www.w3.org/2007/ont/unit Tim Andy -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Dumb SPARQL query problem
Its’ the other bit of the pig’s breakfast. Try an @en On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Dumb SPARQL query problem
Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string “cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, but I think it’s only one of them :-) Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = “cat”)…”, but that its likely to be much slower. This means that you may need to do a lot of queries. I built something to look for matching strings (of course! - finding sameAs candidates) where the RDF had been gathered from different sources. Something like SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s } would have been nice. I’ll leave it as an exercise to the reader to work out how many queries it takes to genuinely achieve the desired effect without using FILTER and str. Unfortunately it seems that recent developments have not been much help here, but I may be wrong: http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals I guess that the truth is that other people don’t actually build systems that follow your nose to arbitrary Linked Data resources, so they don’t worry about it? Or am I missing something obvious, and people actually have a good way around this? To me the problem all comes because knowledge is being represented outside the triple model. And also because of the XML legacy of RDF, even though everyone keeps saying that is only a serialisation of an abstract model. Ah well, back in my box. Cheers. On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote: On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Richard Light -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: OpenRefine
Ah, flesh search still wins, thanks to Ruben. Mind you, I have to say that for once I read all the documentation I could find, and still wasted several very frustrating hours, if not a day. Perhaps someone knows how to update http://openrefine.org or the Github thingy to point at https://groups.google.com/forum/#!msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ ? On 29 Oct 2013, at 10:06, Sergio Fernández sergio.fernan...@salzburgresearch.at wrote: Exactly that reference I was looking for, but I didn't find it; so your search skills are not so bad after all ;-) On 29/10/13 00:30, Hugh Glaser wrote: Thank you for all the responses. I can report success (and I am pleased to say it doesn’t seem to have been my stupidity, although it may be my lack of web search skills!) Ruben Verborgh communicated quickly and efficiently off list, and pointed me at https://groups.google.com/d/msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ which explains that things died last February, and I needed to add a replacement reconciliation service. I now have reconciliation! Best Hugh
OpenRefine
Hi. I’m not sure where to ask, so I’ll try my friends here. I was having a go at OpenRefine yesterday, and I can’t get it to reconcile, try as I might - I have even watched the videos again. I’m doing what I remember, but it is a while ago. Are there others currently using it successfully? Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently. Cheers -- Hugh
Re: OpenRefine
Unfortunately I’ve not been a regular user, so it is probably my stupidity. Basically, I go through the Reconcile process using the Freebase Reconcile service, but it doesn’t find anything to reconcile, even though I have fixed it so that there is an entry that has exactly the same text as the Freebase entry title. It just shows as if there are no positive results. I try clicking on the search for match after that, but it never comes back, which makes me wonder. On 28 Oct 2013, at 18:53, John Erickson olyerick...@gmail.com wrote: Hugh, I wonder if you could be more specific regarding the troubles you had with OpenRefine? One of our students also had trouble, and I'm wondering if it might be the same problem. Like you, reconciliation with Refine has worked for me in the past but I haven't tried the same process using OpenRefine... On Mon, Oct 28, 2013 at 2:41 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote: Hi. I’m not sure where to ask, so I’ll try my friends here. I was having a go at OpenRefine yesterday, and I can’t get it to reconcile, try as I might - I have even watched the videos again. I’m doing what I remember, but it is a while ago. Are there others currently using it successfully? Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently. Cheers -- Hugh -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) http://tw.rpi.edu olyerick...@gmail.com Twitter Skype: olyerickson -- Hugh 023 8061 5652
Re: OpenRefine
Thank you for all the responses. I can report success (and I am pleased to say it doesn’t seem to have been my stupidity, although it may be my lack of web search skills!) Ruben Verborgh communicated quickly and efficiently off list, and pointed me at https://groups.google.com/d/msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ which explains that things died last February, and I needed to add a replacement reconciliation service. I now have reconciliation! Best Hugh On 28 Oct 2013, at 19:39, Sergio Fernández sergio.fernan...@salzburgresearch.at wrote: Hi Hugh, which version of OpenRefine, and the Freebase extension are you using? I'm not totally sure, but I think few months ago they've change something in the API. Anyway, for such concrete questions of a tool, I think it is much better to directly ask on its discussion list, in this case: http://groups.google.com/d/forum/openrefine BTW, in verson 0.7.0 of the RDF Refine extension Stanbol-based reconciliation support has been added; so I'd recommend you to give it a try too. Cheers, On 28/10/13 19:59, Hugh Glaser wrote: Unfortunately I’ve not been a regular user, so it is probably my stupidity. Basically, I go through the Reconcile process using the Freebase Reconcile service, but it doesn’t find anything to reconcile, even though I have fixed it so that there is an entry that has exactly the same text as the Freebase entry title. It just shows as if there are no positive results. I try clicking on the search for match after that, but it never comes back, which makes me wonder. On 28 Oct 2013, at 18:53, John Erickson olyerick...@gmail.com wrote: Hugh, I wonder if you could be more specific regarding the troubles you had with OpenRefine? One of our students also had trouble, and I'm wondering if it might be the same problem. Like you, reconciliation with Refine has worked for me in the past but I haven't tried the same process using OpenRefine... On Mon, Oct 28, 2013 at 2:41 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote: Hi. I’m not sure where to ask, so I’ll try my friends here. I was having a go at OpenRefine yesterday, and I can’t get it to reconcile, try as I might - I have even watched the videos again. I’m doing what I remember, but it is a while ago. Are there others currently using it successfully? Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently. Cheers -- Hugh -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) http://tw.rpi.edu olyerick...@gmail.com Twitter Skype: olyerickson -- Hugh 023 8061 5652 -- Sergio Fernández Senior Researcher Knowledge and Media Technologies Salzburg Research Forschungsgesellschaft mbH Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria T: +43 662 2288 318 | M: +43 660 2747 925 sergio.fernan...@salzburgresearch.at http://www.salzburgresearch.at -- Hugh 023 8061 5652
Re: TRank (Ranking Entity Types) pipeline released open-source at ISWC2013
Hi Michele, Looks exciting. I wanted to have a go, but... Can you help me find the documentation please? I am a newby for quite a bit of this - not a great github user, and never used scala before, so I am probably missing something obvious, but was prompted to try because of the exhaustive documentation” that would help me! Best Hugh On 23 Oct 2013, at 01:48, Michele Catasta michele.cata...@epfl.ch wrote: TRank is a pipeline that, given a textual/HTML document as input, performs named-entity recognition, entity linking/disambiguation, and entity type ranking/selection from a variety of type hierarchies including DBpedia, YAGO, and schema.org. TRank has been nominated as best paper at ISWC2013. We have now released TRank open-source for others to use: https://github.com/MEM0R1ES/TRank It provides good test coverage, continuous build, and exhaustive documentation. You can use it as is, or easily integrate your own entity type ranking algorithm to compare against or to build on top of TRank. Bug reports and pull requests are welcome! We also recommend to watch/star the GitHub repository, as we will be releasing soon the MapReduce implementation of TRank. -- Best, Michele -- Hugh 023 8061 5652
Re: How to publish SPARQL endpoint limits/metadata?
Hmm. In my mind, a dataset is rather abstract - a collection of data that is being made available. They may use a combination of any or all of SPARQL endpoints, downloads of dumps, and resolvable URIs (Linked Data). They may also make it available in other forms, but we are possibly primarily concerned with RDF here, although it would be a shame if we could not embrace the more abstract concept. As a consumer (always!), I would like to come to where I think the dataset is being published, or look in some aggregator index, and easily find out all the stuff I need to know about the dataset, and how I might use it. That's my starting point. So in our system like many others around I think, for example, when we get a new URI, we hope there is a SPARQL endpoint, as that is our preferred format. We need to use internal information to do this, so we can only do it for known places. If not, then we try to simply resolve it. Failing that, we could look in a cache of dumps we have found, but don't actually at the moment. It would be good, for example, if resolving the URI always told us where there is metadata about a SPARQL endpoint that is recommended as having RDF about this URI. In fact, we do this for co-reference information for our URIs (we use a bespoke predicate, but should probably have been using seeAlso), but should probably do it for SPARQL endpoint as well. The metadata should be at the end of a resolvable URI, and the SPARQL endpoint should hold its own metadata in it, etc. etc.. So having a separation between SPARQL Service Description and voiD would just be plain wrong. They must embrace each other, so that consumers can easily work out how to use what they think of as a dataset. I would also add that if I take a REST-like view of the world, which I do for accessing a SPARQL endpoint (I am simply retrieving a document), the distinction between dataset and service becomes very blurred. Even calling it a SPARQL Service Description seems rather old-fashioned to me. Best Hugh On 9 Oct 2013, at 11:04, Barry Norton barrynor...@gmail.com wrote: On Wed, Oct 9, 2013 at 10:55 AM, Frans Knibbe | Geodan frans.kni...@geodan.nl wrote: Shouldn't that be the SPARQL Service Description instead of VoID? In my mind, SPARQL endpoints and datasets are separate entities. +1
Re: How to publish SPARQL endpoint limits/metadata?
On 9 Oct 2013, at 12:46, Barry Norton barrynor...@gmail.com wrote: On Wed, Oct 9, 2013 at 12:15 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote: [...] So having a separation between SPARQL Service Description and voiD would just be plain wrong. They must embrace each other, so that consumers can easily work out how to use what they think of as a dataset. I would also add that if I take a REST-like view of the world, which I do for accessing a SPARQL endpoint (I am simply retrieving a document), the distinction between dataset and service becomes very blurred. Even calling it a SPARQL Service Description seems rather old-fashioned to me. Hugh, I tend to agree (certainly about calling them 'service descriptions', ugh). From a REST point of view, void:Datasets, named graphs (capable of RESTful interaction via Graph Store Protocol) and SPARQL query/update 'endpoints' (ugh again) are all resources that allow one to find other, more specific, resources. That said if we accept that one needs some up-front guidance on what those resources allow you to get to (a big 'if', if the REST community, but I don't think anyone in ours would be happy with just a media type) then we want them to be self-describing in RDF. Always everything! At the same time, the relationships we want to attach to the query/update endpoints are semi-distinct, no? You'd agree these are different classes of resource? Yes, or perhaps I am saying different sub-classes? Thinking of it that way, I then look at Frans' list of the kind of thing he would like to be able to say about endpoints. It seems that at least the following might be common to almost any delivery mechanism for datasets: • The time period of the next scheduled downtime • (the URI of) a document that contains a human readable SLA or fair use policy for the service • URIs of mirrors So, yes, there are semi-distinctions, but if that implies semi-non-distinctions, there should be very useful mileage in trying to make such things deeply compatible. Or at least starting from there? Best Hugh Barry
Re: ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links
Hi. Chris has suggested I send the following to the LOD list, as it may be of interest to several people: Hi Chris. Great stuff! I have a question. Or would you prefer I put it on the LOD list for discussion? It is about url encoding. Dbpedia: http://dbpedia.org/page/Ashford_%28borough%29 is not found http://dbpedia.org/page/Ashford_(borough) works, and redirects to http://dbpedia.org/resource/Borough_of_Ashford Wikipedia: http://en.wikipedia.org/wiki/Ashford_%28borough%29 works http://en.wikipedia.org/wiki/Ashford_(borough) works Both go to the page with content of http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the address bar doesn't change. So the problem: I usually find things in wikipedia, and then use the last bit to construct the dbpedia URI - I suspect lots of people do this. But as you can see, the url encoded URI, which can often be found in the wild, won't allow me to do this. There are of course many wikipedia URLs with ( and ) in them - (artist), (programmer), (borough) etc. It is also the same with comma and single quote. I think this may be different from 3.8, but can't be sure - is it intended? Very best Hugh
Re: SPARQL results in RDF
You'll get me using CONSTRUCT soon :-) (By the way, Tim's actual CONSTRUCT WHERE query isn't allowed because of the FILTER). In the end, I just wrote a little service to process the XML into turtle, so I do want I want now. The problem is that the only result format I can rely on an endpoint giving is XML:- CSV and TSV (the other standards) which would have been easier are not always supported, it seems. One thing I was trying to do (as Tim distinguished) was have the result set, bindings and all, in RDF, if for nothing else than credibility and PR. Because it is true that people I explain Linked Data, RDF and conneg to and then go on to RDF stores just can't understand how I can tell them about this wonderful RDF, but when they ask or even try to do a conneg to the endpoint they don't get RDF. I think one answer is to ignore SELECT completely, and just talk about CONSTRUCT. It makes a lot more sense - in fact I might do that myself. One fly in the ointment for that is that (as far as I can tell), even though I get RDF turtle or whatever back from an endpoint, it doesn't allow me to conneg for Accept:application/rdf+xml At least dbpedia seems to give 406 Unacceptable. Is there some adjustment that could be made here? I know it would be a fudge, but if I request Accept:application/rdf+xml on a SPARQL endpoint, using a CONSTRUCT, would it be so bad to actually return RDFXML? Thanks for all the interesting discussion. Hugh On 25 Sep 2013, at 10:05, Stuart Williams s...@epimorphics.com wrote: On 25/09/2013 00:23, Tim Harsch wrote: That idea seems very similar to the DELETE WHERE already in SPARQL 1.1, so maybe to be consistent with that existing syntax it should be CONSTRUCT WHERE Hmmm... something like: http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#constructWhere Stuart -- On Mon, Sep 23, 2013 at 3:08 PM, Tim Berners-Lee ti...@w3.org mailto:ti...@w3.org wrote: 1) I can see Hugh's frustration that the RDF system is incomplete in a way. You tell everyone you have a model which can be used for anything and then make something which doesn't use it. What's wrong with this picture? Standardising/using/adopting http://www.w3.org/2001/sw/DataAccess/tests/result-set would solve that. (The file actually defines terms like http://www.w3.org/2001/sw/DataAccess/tests/result-set#resultVariable without the .n3) 2) Different (I think) from what you want Hugh, but something I have thought would be handy would b a CONSTRUCT * where it returns the sub graphs it matches as turtle, ideally without duplicates. This would be nice for lots of things, such as extracting a subset of a dataset. CONSTRUCT * WHERE { ?x name ?y; age ?a; ?p ?o.} FILTER { a 18 } Tim On 2013-09 -23, at 07:03, Andy Seaborne wrote: DAWG did at one time work with result sets encoded in RDF for the testing work. As the WG progressed, it was clear that implementation of testing was based on result set comparison, and an impl needed to grok the XML results encoding anyway. Hence the need for the RDF form dwindled but it's still there: http://www.w3.org/2001/sw/DataAccess/tests/result-set.n3 Apache Jena will still produce it if you ask it nicely. Andy -- Epimorphics Ltdwww.epimorphics.com Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT Tel: 01275 399069 Epimorphics Ltd. is a limited company registered in England (number 7016688) Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT, UK
SPARQL results in RDF
I was saying to someone the other day that it is bizarre and painful that you can't get SPARQL result sets in RDF, or at least there isn't a standard ontology for them. But it looks like I was wrong. http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgquery=select+distinct+*+where+{%3Fs+%3Fp+%3Fo}+LIMIT+100format=application%2Frdf%2Bxml happily gives me what I was expecting, and also gives me NTriples if I want them. But the NS is http://www.w3.org/2005/sparql-results# which doesn't give me what I was expecting (it is ordinary XML). I did find what I think is the latest version, but it eschews RDF, and only talks about XML, JSON, CSV and TSV formats. Can anyone shed any light on where things are on all this please? Cheers Hugh
Re: SPARQL results in RDF
Many thanks, William, and for confirming so quickly. (And especially thanks for not telling me that CONSTRUCT does what I want!) I had suddenly got excited that RDF might actually be useable to represent something I wanted to represent, just like we tell other people :-) So it is all non-standard, as I suspected. Ah well, I'll go back to trying to work with XML stuff, instead of using my usual RDF tools :-( Very best Hugh On 21 Sep 2013, at 19:14, William Waites w...@styx.org wrote: Hi Hugh, You can get results in RDF if you use CONSTRUCT -- which is basically a special case of SELECT that returns 3-tuples and uses set semantics (does not allow duplicates), but I imagine that you are aware of this. Returning RDF for SELECT where the result set consists in n-tuples where n != 3 is difficult because there is no direct way to represent it. Also problematic is that there *is* a concept of order in SPARQL query results while there is not with RDF. Also the use of bag semantics allowing duplicates which also does not really work with RDF. These, again, could be kludged with reification, but that is not very elegant. So most SELECT results are not directly representable in RDF. Cheers, -w
Re: SPARQL results in RDF
Thanks Jerven, you may well be right! SELECT DISTINCT * WHERE { ?s foo:bar ?o } would do. And things like SELECT DISTINCT * WHERE { ?v1 foo:bar ?o . ?v1 ?p1 ?v2 . ?v2 ?p2 ?v3 } and then probably get back an identifier for each result, so that I can find out what are the values of the ?p* and ?v* I think essentially the sort of thing that dbpedia/virtuoso is giving me. (By the way, Kingsley, replying to this has caused me to notice that the rdfxml does not rapper very nicely - sorry to report! rapper: Error - URI file:///home/hg/sparql.rdf:8 - property element 'solution' has multiple object node elements, skipping.) Best Hugh On 21 Sep 2013, at 23:32, Jerven Bolleman jerven.bolle...@isb-sib.ch wrote: Hi Hugh, I think you disregarded the CONSTRUCT queries a bit to quickly. This is what you use when you want to get back triples. If you want back result columns you use SELECT. If you want describe to the concept of result columns in RDF then you are on your own. Maybe if you explain what you want to represent then we can have a bit more of an informed discussion. Regards, Jerven On Sep 21, 2013, at 8:38 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote: Many thanks, William, and for confirming so quickly. (And especially thanks for not telling me that CONSTRUCT does what I want!) I had suddenly got excited that RDF might actually be useable to represent something I wanted to represent, just like we tell other people :-) So it is all non-standard, as I suspected. Ah well, I'll go back to trying to work with XML stuff, instead of using my usual RDF tools :-( Very best Hugh On 21 Sep 2013, at 19:14, William Waites w...@styx.org wrote: Hi Hugh, You can get results in RDF if you use CONSTRUCT -- which is basically a special case of SELECT that returns 3-tuples and uses set semantics (does not allow duplicates), but I imagine that you are aware of this. Returning RDF for SELECT where the result set consists in n-tuples where n != 3 is difficult because there is no direct way to represent it. Also problematic is that there *is* a concept of order in SPARQL query results while there is not with RDF. Also the use of bag semantics allowing duplicates which also does not really work with RDF. These, again, could be kludged with reification, but that is not very elegant. So most SELECT results are not directly representable in RDF. Cheers, -w --- Jerven Bollemanjerven.bolle...@isb-sib.ch SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 1211 Geneve 4, Switzerland www.isb-sib.ch - www.uniprot.org Follow us at https://twitter.com/#!/uniprot ---
Re: Maphub -- RWW meets maps
Hi Andy, Nice. In case you hadn't guessed: http://sameas.org/?uri=http://oxpoints.oucs.ox.ac.uk/id/23232414 :-) On 19 Sep 2013, at 15:03, Andy Turner a.g.d.tur...@leeds.ac.uk wrote: http://www.oucs.ox.ac.uk/oxpoints/ Andy http://www.geog.leeds.ac.uk/people/a.turner/ From: Gannon Dick [mailto:gannon_d...@yahoo.com] Sent: 19 September 2013 13:55 To: Andy Turner; 'Kingsley Idehen'; public-...@w3.org; public-lod@w3.org Cc: chippy2...@gmail.com; suchith.an...@nottingham.ac.uk Subject: Re: Maphub -- RWW meets maps FWIW, the University of Oxford has an 800th Birthday coming up soon. http://www.rustprivacy.org/2012/roadmap/oxford-university-area-map.pdf The geo coordinates, founding dates etc. for the Colleges and Halls are available on the University site. The lo-res sunrise and sunset data is available in spreadsheets at http://www.esrl.noaa.gov/gmd/grad/solcalc/calcdetails.html My offering has some *cough* complete lack of artistic promise and bandwidth crushing size *cough* limitations, but I had fun :-) It would be nice to see this *cough* done well *cough* duplicated. --Gannon From: Andy Turner a.g.d.tur...@leeds.ac.uk To: 'Kingsley Idehen' kide...@openlinksw.com; public-...@w3.org public-...@w3.org; public-lod@w3.org public-lod@w3.org Cc: chippy2...@gmail.com chippy2...@gmail.com; suchith.an...@nottingham.ac.uk suchith.an...@nottingham.ac.uk Sent: Thursday, September 19, 2013 3:36 AM Subject: RE: Maphub -- RWW meets maps Interesting work. It's a way to go for linking OpenStreetMap data and Wikimapia data with Wikipedia and each other etc.. I don't know the state of play with how OpenStreetMap or Wikimapia are currently doing this, but I like to think that someone at the recent Maptember events in Nottingham, UK hopefully does and might provide some feedback... Thanks, Andy http://www.geog.leeds.ac.uk/people/a.turner/ -Original Message- From: Kingsley Idehen [mailto:kide...@openlinksw.com] Sent: 18 September 2013 19:44 To: public-...@w3.org; public-lod@w3.org Subject: Re: Maphub -- RWW meets maps On 9/18/13 1:40 PM, Melvin Carvalho wrote: A fantastic open source project maphub which uses linked data to read and write to current and historical maps, using RDF and the open annotations vocab. There's even links to DBPedia! http://maphub.github.io/ A great example of how to use the Read Write Web. The video is well worth watching! Also publishes annotations in Linked Data form [1] :-) [1] http://maphub.herokuapp.com/control_points/4 . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
http://differentfrom.org
Hi, I mentioned this in an earlier post. I then discovered that I was the only one who could access it! (While I was building it I fixed my private DNS.) Anyway, since I mentioned it, I have now fixed the public DNS, and it should have propagated by now. So feel free to go and (have another?) look, and any feedback welcome. Best Hugh
sameAs.org license - was Re: Linked data sets for evaluating interlinking?
Thanks Ghislain. Sorry, no SPARQL endpoint, as it isn't an RDF store. With respect to a license, it is more difficult. This may be a longer answer than you were expecting. :-) (Firstly, please understand that I'm not very good with this license stuff.) When I started sameAs.org, it only had mostly my rkb stuff in it. So I could do what I liked. Understanding the importance of having some sort of license, I put what I thought was the most liberal one I could find - http://creativecommons.org/publicdomain/zero/1.0/ (which is at the bottom of the page). Take it away and do what you like with it. I would have liked to say Please attribute if you can, but I understand that may be difficult, so don't worry if you can't but I couldn't find one of this, and I think having a license that is quickly seen and widely understood is important. Sub-bit on attribution A problem with follow your nose (fyn) Linked Data is that the attribution can be very hard. I may tell you that a owl:sameAs b. The reason I tell you that is that I have found loads of stuff about c, d and e which allowed me to infer that. And some of that data may no longer even be available. So the only safe attribution for every fact I give you would be my entire source attribution - I might as well tell you the attribution is the Web. Correct, but hardly in the spirit of the thing (I am actually more interested in the spirit of fair attribution tun the legal side of it!) For one of my users who uses fyn to attribute is probably even harder - at least I now my sources by hand. If they came using fyn, then they may be using a URI that happened to be got by a previous resolution (and so on). So essentially, every time they resolve a URI, they need to do license work. Of course in principle this is what people should be doing - absolutely! But in practice, people are not tooled up for this; so a requirement for attribution would make the data unusable for such people. And they are the ones who are *really* using Linked Data, so I want to encourage them! /Sub-bit on attribution Of course, it now has stuff from lots of other sources. Many of these simply sent me the data, or told me I could put it in sameAs.org - but I don't really recall anyone ever discussing license! Since I asked for it for sameAs.org, then I assumed that they agreed to have it out there with the license. Other stuff, I have just gone to a sparql endpoint or download site and taken a bit of their data. So what is the license of stuff on the open web? - No, you don't need to answer that! Essentially sameAs.org is a search engine for the Linked Data web; so I went to Google and Bing to see what license they might put on their data. Answer found I none! [I even found that if you put a search such as Bing license into Bing it barfs! :-) ] There is lots of stuff about what users license them to do with user data, and what they license for their software, but nothing on the results returned from a web search on their site. My sameAs.org about page does list a bunch of places which should provide compliance with any attribution requirements for those sites, but is now seriously out of date, I think. So I just left it at that. I know I don't have the same legal department as Google or Microsoft if there is a problem :-), but I sort of think that I take far less data from sites than they do, and it doesn't seem to be a problem for them. As far as the sub-stores are concerned, I took the license off. But most of them were built in collaboration with the sources, and they have links to the sources, which may or may not have a license, but that probably makes things clearer for those. The bottom line is that there are very few sites, if any, which (like sameAs.org) have as their main purpose the provision of sameAs information. On the contrary (like googlejuice SEO) they want any sameAs links to be taken away, so that traffic will come to their sites through the links they have published (like via Google). Thanks for your question - I'm happy to get any advice from anyone, and I hope I can understand it if it comes! Best Hugh On 27 Aug 2013, at 09:09, Ghislain Atemezing auguste.atemez...@eurecom.fr wrote: Hi Hugh, So, for example, if you wanted Adrian's data, then I can give it to you. (I have queried the SPARQL endpoint to put stuff in sameAs.org. Both owl:sameAs and skos:exactMatch.) I have lots of bibliographic ones, especially national libraries, who have often sent me the data. (British, German, US, Japanese, Norwegian, French, Spanish, Hungarian … as best I recall.) I also have the VIAF data. This is all aggregated in http://sameas.org/store/kelle/ and other stuff is kept in some sameAs stores - see http://sameas.org/store/ Nice work!! And a small question….. I was wondering if there is an endpoint in sameAs.org for using SPARQL queries? And for the data sets you receive, do they all have a specific terms of license?
Re: Linked data sets for evaluating interlinking?
Hi, Thanks. Just one comment, relating to the cities example you use. The paper you cite mentions cities and says: For example, the city of Paris is referenced in a number of different Linked Data-sets: ranging from OpenCyc to the New York Times. In DBPedia, a Linked Data export of Wikipedia, these data-sets are connected by owl:sameAs. In particular, dbpedia:Paris is owl:sameAs as both the opencyc:CityOfParisFrance and opencyc:Paris DepartmentFrance, as OpenCyc distinguishes that “the department of Paris. Paris DepartmentFrance is a distinct geopolitical entity from CityOfParisFrance, despite the fact that both share the same territory, while Wikipedia does not make this distinction. So even cities (actually especially cities and other geo things) have significant challenges here. Geo-political v. geographic v. the geo-extent v. the nounSynset etc. And we haven't even mentioned temporal aspects. So I do worry about all this. If the dataset is simple enough that you can ignore the problems, then the question is if the exercise tells you anything useful. If the dataset is more complicated, for example having both geo-political and geographic and wanting to keep them separate, then it is also a question is if the exercise tells you anything useful! But if something is hard and challenging it is more reason to do it, I guess. Good luck. Hugh On 27 Aug 2013, at 16:57, csara...@uni-koblenz.de wrote: Hi Hugh, Hi Cristina, Some interesting issues you raise. One of them is how people publish links (which enables your analysis). There are two ways this happens. 1) People add triples to their dataset that have an equivalence predicate (owl:sameAs, skos:exactMatch, skos:closeMatch, etc.) 2) People use a foreign URI (very commonly a dbpedia URI), because when turning their data into RDF they have decided that the entity they are concerned with is the same as the dbpedia one. The second paragraph of Tom's message describes such a linkage, I think. I think these distinctions are behind the comments of Milorad, where he is assuming the type (2) way. Either of these methods should be fodder for you, and you may well find that the type (2) way is used by a dataset that is useful to you. I agree, it is important to distinguish between different types of links. When I refer to interlinking I have in mind triples (s, p, o), where s and o are resources from different data sets, and p is either a property like owl:sameAs or a domain-specific property like foaf:knows. I think this corresponds to what you specified in 1) and 2). I would like to have both kinds of links in my evaluation (if possible). It may be harder for you to process, as the linkage is not so explicit because there is no distinct URI for the resource in the database, different from the foreign one. But any foreign URI is in fact a link. You will find that people have tended towards type (2) linkage because they can shy away from having lots of equivalence predicates in their datasets, not least because there was a time when RDF stores did not comfortably do owl:sameAs inference, and so they do the linking at RDF conversion time, and use foreign URIs. Another interesting issue is more fundamental to your work. You seem to think that there must be a gold standard or reference interlinking for equivalence. As long-time readers of this list will have seen discussed many times (!), it is not a simple matter. It is a complex matter to have such a thing, which is a necessity for you to do your precision/recall statistics. At its most basic, for example, am I as a private citizen the same as me as a member of my University or me as a member of my company? The answer is, of course yes and no. Another field that has spent a lot of time on this is the FRBR world (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records). If I have a book of the Semantic Web, is it the same as your book of the same name? Perhaps. What if it is a different (corrected) edition? An electronic version? Certainly a library will usually consider each book a different thing, but if you are asking how many books the author has published, you want to treat all the books as the same resource. I understand the point, and I find it very interesting, indeed. I guess that it might depend on the context where the data was created / will be used. This reminds me of the paper about the analysis of identity links ( http://www.w3.org/2009/12/rdf-ws/papers/ws21). However, I think that it is possible to evaluate different interlinking techniques, establishing some gold standard (e.g. the links between the cities of a data set describing the population of European cities and a data set describing the cities as tourist attractions), to be able to analyse the results in terms of precision and recall, and say that one tool is able to certain things, while the other not. Regarding the
Pleiades - was Re: Linked data sets for evaluating interlinking?
Hi Tom, I don't know if you are involved with Pleiades, but I have some questions. I found the data at http://atlantides.org/downloads/pleiades/rdf/ - many thanks. It has some sameAs links :-) But I have some worries: It has triples like http://pleiades.stoa.org/places/991318#this owl:sameAs http://pleiades.stoa.org/places/981510#this . The http://pleiades.stoa.org/places/991318#this goes to a page entitled Duplicate Baetica. In that page it says Link from a duplicate to the master Baetica. I worry a bit about this, as it may be saying that the link page is owl:sameAs master page, which would clearly be wrong. More problematic for me (for sameAs.org!) is that the duplicate link is not Linked Data. If I try to get RDF from it, it gives HTTP/1.1 500 Internal Server Error and html. Is all this the intended behaviour? Great resource, of course! Best Hugh On 26 Aug 2013, at 14:16, Tom Elliott tom.elli...@nyu.edu wrote: Hi all: Two humanities datasets of potential interest in this regard: A number of datasets (around 20 different ones I think) related to the study of antiquity have aligned their geographic/toponymic fields with the Pleiades gazetteer (http://pleiades.stoa.org) and published RDF accordingly. Most of this work has been done under the auspices of something called the Pelagios Project, and the alignment processes used by many of the participants are documented in blog posts at http://pelagios-project.blogspot.com/ (most of them a combination of automated and manual). Pleiades itself is also a linked data resource, and has a growing number (still only a small percentage of its content) of outbound links to dbpedia, geonames, and OSM. All of those outbound links are hand-curated. Contributors to Pleiades, where possible, are aligned to VIAF (manually) and bibliography in Pleiades is also beginning to be aligned to the Open Library and Worldcat (again, manually). On a much smaller scale, I offer the About Roman Emperors dataset, which rather than minting its own URIs for the Roman emperors, uses the dbpedia resource URIs for each: http://www.paregorios.org/resources/roman-emperors/. The primary purpose of the dataset is to provide a comprehensive list of these for easy access and reuse by third parties, and to associate the dbpedia URIs with corresponding Roman imperial mint and minting authority data in nomisma.org and finds.org.uk, and to a static, late-90s-vintage scholarly encyclopedia of Roman emperors: http://www.roman-emperors.org/ Tom Tom Elliott, Ph.D. Associate Director for Digital Programs and Senior Research Scholar Institute for the Study of the Ancient World (NYU) http://isaw.nyu.edu/people/staff/tom-elliott On Aug 26, 2013, at 6:04 AM, Adrian Stevenson wrote: Hi All As part of the LOCAH and Linking Lives projects, the latter in particular, we've being doing a lot of this auto and manual linking work, mainly to VIAF and DBPedia, with some links to things like LCSH and Geonames. We've been doing a lot of work just recently in fact, and we've published a blog post that's picked up quite a bit of interest on this - http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We haven't published our latest run of data yet, but we hope to finish this soon. It'll probably still be about a month or so as a few of us are on holiday soon. We do have quite a few links done semi-automatically in our existing data set accessible via http://data.archiveshub.ac.uk but as I say we are updating this, I'd suggest not taking the URIs and data available there as the final word. A good example is http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer Project URIs: http://archiveshub.ac.uk/locah/ http://archiveshub.ac.uk/linkinglives/ Adrian _ Adrian Stevenson Senior Technical Innovations Coordinator Mimas, The University of Manchester Devonshire House, Oxford Road Manchester M13 9QH Email: adrian.steven...@manchester.ac.uk Tel: +44 (0) 161 275 6065 http://www.mimas.ac.uk http://www.twitter.com/adrianstevenson http://uk.linkedin.com/in/adrianstevenson/ On 22 Aug 2013, at 16:06, Cristina Sarasua wrote: Hi, I am looking for pairs of linked data sets that can be used as gold standard for evaluations. I would need pairs of data sets which have been manually linked, or data sets which have been (semi-)automatically linked with interlinking tools, and afterwards reviewed (to include the links which are not identified by tools). I have looked into the DataHub catalogue and queried VoiD descriptions, but unfortunately the information about how the interlinking process was carried out is often missing. Apart from the data sets which have been used in the OAEI-instance matching track, could anyone recommend (based on past experience) good data sets for evaluating data interlinking processes?
Re: YASGUI: Web-based SPARQL client with bells ‘n wistles
Hi Bernard, And if you are going to change things… I went looking for equivalences (:-)), and found a lot (but not all) of owl:sameAs dbpedia objects that seem to have crept in as strings, e.g. http://rdf.muninn-project.org/ontologies/military#Battalion owl:sameAs dbpedia:Battalion http://lov.okfn.org/endpoint/lov_aggregator?query=PREFIX+owl%3A+++%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0ASELECT+DISTINCT+*+WHERE+%7B+%3Fs+owl%3AsameAs+%3Fo+%7D%0D%0ALIMIT+100format=HTML Best Hugh On 20 Aug 2013, at 18:23, Barry Norton barry.nor...@ontotext.com wrote: Thanks, Bernard. I get ~5000 instances of rdf:Property (and 1643 of rdfs:Class - and oddly 5 instances of rdfs:Property), but more than three times as many for: SELECT (COUNT(?property) AS ?properties) { SELECT DISTINCT ?property WHERE{ {?property rdfs:domain ?domain} UNION {?property rdfs:range ?range} UNION {?property rdfs:subPropertyOf ?super} UNION {?sub rdfs:subPropertyOf ?property} } } I'm guessing, therefore, no inference in this store? Since OWL-implied properties would require a much more sophisticated query, is it possible to get the dataset and re-index this with inference? Barry On 20/08/2013 18:02, Bernard Vatant wrote: Hello Barry I had a reminder today that I never answered the question below, and I am very late indeed ! Properties and classes of all vocabularies in LOV are aggregated in a triple store of which SPARQL endpoint is at http://lov.okfn.org/endpoint/lov_aggregator This is quite raw data but you should find everything you need in there. Otherwise can also use the new API http://lov.okfn.org/dataset/lov/api/v1/vocabs which for each vocabulary provides the prefix and link to the last version stored. Hope that helps Bernard From: Barry Norton barry.nor...@ontotext.com Date: Sat, 06 Jul 2013 11:27:46 +0100 Bernard, does LOV keep a cache of properties and classes? I'd really like to see resource auto-completion in Web-based tools like YASGUI, but a cache is clearly needed for the to be feasible. Barry
Re: {Disarmed} Re: YASGUI: Web-based SPARQL client with bells ‘n wistles
Thanks Ghislain, the right response :-) (It's not our data, if it gets fixed at source we will re-acquire) I think I tracked down the email of the person responsible, so have raised the issue. Best Hugh On 20 Aug 2013, at 20:13, Ghislain Atemezing auguste.atemez...@eurecom.fr wrote: Hi Hugh, [[ My 2 cents ]] I went looking for equivalences (:-)), and found a lot (but not all) of owl:sameAs dbpedia objects that seem to have crept in as strings, e.g. http://rdf.muninn-project.org/ontologies/military#Battalion owl:sameAs dbpedia:Battalion I thing that comes from the ontology http://rdf.muninn-project.org/ontologies/military.html itself, and you may have a look here: http://rdf.muninn-project.org/ontologies/military.html#linkages MailScanner has detected a possible fraud attempt from lov.okfn.org claiming to be http://lov.okfn.org/endpoint/lov_aggregator?query=PREFIX+owl%3A+++%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0ASELECT+DISTINCT+*+WHERE+%7B+%3Fs+owl%3AsameAs+%3Fo+%7D%0D%0ALIMIT+100format=HTML Best Ghislain
Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
This is great! Thanks guys. I normally really, really don't care about all that crypto stuff (it should all happen transparently), but I'm find all this interesting! So yes, I created a p12 (using http://id.myopenlink.net/certgen/ - you sometimes have to trust someone :-) ) and emailed. I am confident (!) that with Keychain things will be fine, but less sure about Windows. Opened it on a Windows box, and it seems to have taken the thing to heart and put it in some certificate management thing. I am a little uncertain what I should put at the URL (non-FOAF) I gave it - the final page gave me some options with micro data, RDFa etc - I am guessing I can just wrap any of them in htmlbody etc? Anyway, I can sort that. So now all (!!!) I really need to do is make my wordpress site look for the ID thing. Hmmm. Melvin, did you get any response to http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ? Or Kinglsey, what did you do on the server side of your photo? Cheers On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 7:47 AM, Norman Gray wrote: Henry, greetings. [replying only on public-lod] Bit of an essay, this one, because I've been mulling this over, since this message appeared a couple of days ago... On 2013 Aug 8, at 16:14, Henry Story wrote: On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote: 1. Certificate Name: maybe there could be some examples of ways to name your certificate. [...] That's why it should be done by the server generating the certificate. The details are here: https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate I appreciate the logic here, and can see how it works technically smoothly for the anticipated use-case (the one illustrated in the WebID video on the webid.info front page). I don't think that's enough, however, because I don't think I could convincingly explain what's happening here, to a motivated but non-technical friend who wants to understand what they've just achieved when I've walked them through getting their WebID certificate from (something like) the social service illustrated in the video. People understand what a username password is (the first is my identity, the second is a secret that proves I am who I claim), and they understand what a door-key is (no identity, but I have this physical token which unlocks a barrier for anyone in possession of the token or a copy). The same is not true of a WebID. Making this a one-click operation is nice (and a Good Thing at some level), but just means that the user knows that it was _this_ click that caused some black magic to happen, and I'm not sure that helps. Therefore... 2. With firefox, after filling out the form, I get a download dialogue for the cert instead of it installing into the browser. So I saved, then went into preferences and import ... which was successful with Successfully restored your security certificate(s) and private key(s). Previously, with my-profile.eu, this was automatically installed into the browser (I was using Chrome then). Though I guess it's better to have it export/save by default so you can install the same cert on any number of browsers without hassle. Still, it creates more steps and could be confusing for new users. In the case of WebID certs downloading the certificate is in fact silly as you can produce a different one for each browser. So that message is a little misleading. A good UI should warn the user about that. Thinking about it, and exploring the behaviours again this week, I'm more and more sure that the browser is a problematic place to do this work. _Technically_, it's exactly the right place, of course, and the HTML5 keygen element is v. clever. But it's killing for users, and coming back to WebIDs and certificates this week, and parachuting into this discussion here, I've been a 'user' this week. A 'web browser' is a passive thing: it's a window through which you look at the web. It quickly disappears, for all but the most hesitant and disoriented users; in particular it's not a thing which takes _actions_, or where you can store things. That means that the browser creating the key-pair, and storing the server-generated certificate, is literally incomprehensible to the majority of anticipated users. And even to me. I have an X.509 e-science certificate which needs renewing every year, and every year I stuff up this renewal in one way or another: the certificate isn't in the right place, or I try to retrieve the replacement with a different browser from the one which generated the CSR, or something else which is sufficiently annoying that I purge the experience from my memory. And I understand about certificates and the whole PKI thing -- someone who doesn't is going to find the experience bamboozling, hateful and stressful. It sounds as if
Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
Thanks Kingsley, On 9 Aug 2013, at 15:09, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 9:51 AM, Hugh Glaser wrote: So now all (!!!) I really need to do is make my wordpress site look for the ID thing. Hmmm. Melvin, did you get any response to http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ? Or Kinglsey, what did you do on the server side of your photo? In my case, I Just made an ACL based on a combination of the identity claims that I know are mirrored in the WebID bearing certificate. In the most basic sense, you can simply start with the basic WebID+TLS test which is part of the basic server side implementation. Thus, I would expect the WordPress plugin to perform aforementioned test. Sorry mate, I have little or know idea what you are talking about. What would an ACL look like? What plugin in wordpress do you mean? It is probably the case that this is now too much detail for the list (I think that the whole discussion has been great for uptake of WebID, which is relevant to Linked Data). And it is probably the case that I am just too ignorant of the whole thing to attempt to do the server side of ti, especially when it is not a raw site, but Wordpress. And people have been too polite to tell me. Thanks for your response Melvin; I guess I got a bit mislead (or hopeful!) because Angelo's wp-linked-data plugin has webid as a keyword. I think I will now consider myself Retired Hurt (http://en.wikipedia.org/wiki/Retired_hurt_(cricket)#Retired_hurt_.28or_not_out.29 )! I hope to return before the end of the innings. Best Hugh BTW -- When you distribute pkcs#12 files, the receiving parties don't actually need to have any knowledge of the actual ACL that you use to protect the resources being shared :-) Kingsley Cheers On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 7:47 AM, Norman Gray wrote: Henry, greetings. [replying only on public-lod] Bit of an essay, this one, because I've been mulling this over, since this message appeared a couple of days ago... On 2013 Aug 8, at 16:14, Henry Story wrote: On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote: 1. Certificate Name: maybe there could be some examples of ways to name your certificate. [...] That's why it should be done by the server generating the certificate. The details are here: https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate I appreciate the logic here, and can see how it works technically smoothly for the anticipated use-case (the one illustrated in the WebID video on the webid.info front page). I don't think that's enough, however, because I don't think I could convincingly explain what's happening here, to a motivated but non-technical friend who wants to understand what they've just achieved when I've walked them through getting their WebID certificate from (something like) the social service illustrated in the video. People understand what a username password is (the first is my identity, the second is a secret that proves I am who I claim), and they understand what a door-key is (no identity, but I have this physical token which unlocks a barrier for anyone in possession of the token or a copy). The same is not true of a WebID. Making this a one-click operation is nice (and a Good Thing at some level), but just means that the user knows that it was _this_ click that caused some black magic to happen, and I'm not sure that helps. Therefore... 2. With firefox, after filling out the form, I get a download dialogue for the cert instead of it installing into the browser. So I saved, then went into preferences and import ... which was successful with Successfully restored your security certificate(s) and private key(s). Previously, with my-profile.eu, this was automatically installed into the browser (I was using Chrome then). Though I guess it's better to have it export/save by default so you can install the same cert on any number of browsers without hassle. Still, it creates more steps and could be confusing for new users. In the case of WebID certs downloading the certificate is in fact silly as you can produce a different one for each browser. So that message is a little misleading. A good UI should warn the user about that. Thinking about it, and exploring the behaviours again this week, I'm more and more sure that the browser is a problematic place to do this work. _Technically_, it's exactly the right place, of course, and the HTML5 keygen element is v. clever. But it's killing for users, and coming back to WebIDs and certificates this week, and parachuting into this discussion here, I've been a 'user' this week. A 'web browser' is a passive thing: it's a window through which you look at the web. It quickly disappears, for all but the most hesitant and disoriented users
Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
Hugh comes back to play / Thanks Kingsley, and Melvin and Henry and Norman. So, trying to cut it down to the minimum. (Sorry, I find some/many of the pages about it really hard going.) If I have a photo on a server, http://example.org/photos/me.jpg, and a WebID at http://example.org/id/you What files do I need on the server so that http://example.org/id/you#me (and no-one else) can access http://example.org/photos/me.jpg? I think that is a sensible question (hopefully!) Cheers Hugh On 9 Aug 2013, at 16:30, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 11:09 AM, Hugh Glaser wrote: Thanks Kingsley, On 9 Aug 2013, at 15:09, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 9:51 AM, Hugh Glaser wrote: So now all (!!!) I really need to do is make my wordpress site look for the ID thing. Hmmm. Melvin, did you get any response to http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ? Or Kinglsey, what did you do on the server side of your photo? In my case, I Just made an ACL based on a combination of the identity claims that I know are mirrored in the WebID bearing certificate. In the most basic sense, you can simply start with the basic WebID+TLS test which is part of the basic server side implementation. Thus, I would expect the WordPress plugin to perform aforementioned test. Sorry mate, I have little or know idea what you are talking about. What would an ACL look like? Okay, to be clearer, there are two things in play re. authentication via WebID+TLS: 1. basic identity verification -- this is the relation lookup against your profile document (this is the minimal that must be implemented by a WebID+TLS server) 2. ACLs and Data Access Policies -- this is where, in addition to #1, you set rules such as: only allow identities that are members of a group or known (i.e., via foaf:knows relation) by some other identity etc.. So starting simple, your first step would be #1. What plugin in wordpress do you mean? I thought there was a WebID plugin for WordPress. Thus, post-installation, you would be able to achieve step #1 i.e., the plugin turns your WordPress installation into a WebID+TLS compliant server. It is probably the case that this is now too much detail for the list (I think that the whole discussion has been great for uptake of WebID, which is relevant to Linked Data). And it is probably the case that I am just too ignorant of the whole thing to attempt to do the server side of ti, especially when it is not a raw site, but Wordpress. And people have been too polite to tell me. Also note, if you are hosting WordPress you can make the plugin yourself. It boils down to a SPARQL ASK on the relation that associates a WebID with a Public Key. Thanks for your response Melvin; I guess I got a bit mislead (or hopeful!) because Angelo's wp-linked-data plugin has webid as a keyword. Yes, that threw me off too. I think I will now consider myself Retired Hurt (http://en.wikipedia.org/wiki/Retired_hurt_(cricket)#Retired_hurt_.28or_not_out.29 )! I hope to return before the end of the innings. I really assumed that circa. 2013 an interested party would have build a WebID+TLS server side plugin for WordPress. Ah! Just realized something, there's an OpenID plugin for WordPress [1], which means you can (if you choose) leverage an OpenID+WebID bridge service [2]. Links: 1. http://wordpress.org/plugins/openid/ -- the OpenID plugin for Wordpress (this gives you the authentication functionality for your WordPress instance) 2. http://bit.ly/OcbR8w -- G+ note I posted about the OpenID+WebID proxy service (which you can leverage in this scenario too!). Kingsley Best Hugh BTW -- When you distribute pkcs#12 files, the receiving parties don't actually need to have any knowledge of the actual ACL that you use to protect the resources being shared :-) Kingsley Cheers On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 7:47 AM, Norman Gray wrote: Henry, greetings. [replying only on public-lod] Bit of an essay, this one, because I've been mulling this over, since this message appeared a couple of days ago... On 2013 Aug 8, at 16:14, Henry Story wrote: On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote: 1. Certificate Name: maybe there could be some examples of ways to name your certificate. [...] That's why it should be done by the server generating the certificate. The details are here: https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate I appreciate the logic here, and can see how it works technically smoothly for the anticipated use-case (the one illustrated in the WebID video on the webid.info front page). I don't think that's enough, however, because I don't think I could convincingly explain what's happening here, to a motivated but non-technical
Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
Thanks. I've looked at quite a bit of this stuff, but still don't see where the ACL document gets stored and used. I am beginning to get the sense that I may have to write some code, other than the ACL rdf to do this. Surely Apache or something else will do this for me? Can't I just put the ACL in a file (as in htpasswd) and point something at it? I certainly don't want to be writing code to make one photo (or simply a static web site) available. Or is that the delegated service you are talking about? I've got my fingers crossed here. On 9 Aug 2013, at 17:35, Kingsley Idehen kide...@openlinksw.com wrote: On 8/9/13 12:22 PM, Hugh Glaser wrote: Hugh comes back to play / Thanks Kingsley, and Melvin and Henry and Norman. So, trying to cut it down to the minimum. (Sorry, I find some/many of the pages about it really hard going.) If I have a photo on a server, http://example.org/photos/me.jpg, and a WebID at http://example.org/id/you What files do I need on the server so that http://example.org/id/you#me (and no-one else) can access http://example.org/photos/me.jpg? I think that is a sensible question (hopefully!) You can need a Turtle document (other RDF document types will do too) comprised of content that describes your ACL based on http://www.w3.org/ns/auth/acl vocabulary terms. You might find http://www.w3.org/wiki/WebAccessControl#this wiki document useful too. My ACL demos leverage the fact that our ODS and Virtuoso platforms have this in-built re. Web Server functionality. I need to check if we built a delegated service for WebID+TLS based ACLs, if not, then (note to self re., new feature zilla) we'll make one :-)
Re: {Disarmed} RWW-Play was: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
Thanks Henry. Well I had looked there, but it all looked quite complicated - I have never cloned a git thingy before and I don't even know if Java is available on the host :-) But emboldened by your encouragement I went for the The short version. I was very encouraged, as it seemed to do quite a lot, but seemed to hang after getting play-2-TLS-e6c58f64585b182f937358fa984474b86984d77d.tar.bz2 But, even when I tried to do it by hand (the Longer Version), I eventually got the java was killed for excessive resource usage. By which tim sit had downloaded 427MB of stuff. I don't think these are the sort of hosting costs I want to have. So I have a sense this is not the solution I was looking for :-) Very best Hugh By the way, the link in An initial implementation of Linked Data Basic Profile does a 404. On 9 Aug 2013, at 18:09, Henry Story henry.st...@bblfish.net wrote: On 9 Aug 2013, at 18:55, Hugh Glaser h...@ecs.soton.ac.uk wrote: Thanks. I've looked at quite a bit of this stuff, but still don't see where the ACL document gets stored and used. I am beginning to get the sense that I may have to write some code, other than the ACL rdf to do this. Surely Apache or something else will do this for me? Can't I just put the ACL in a file (as in htpasswd) and point something at it? I certainly don't want to be writing code to make one photo (or simply a static web site) available. Or is that the delegated service you are talking about? I've got my fingers crossed here. You can follow the instructions on installing https://github.com/stample/rww-play (It's under the Apache Licence and patches and contributions are welcome ) Then you'll be able to do the following: An initial implementation of the Linked Data Platform spec is implemented here. The same way as theApache httpd server it servers resource from the file system and maps them to the web. By default we map the test_www directory's content to http://localhost:8443/2013/. The test_www directory starts with a few files to get you going $ cd test_www $ ls -al total 48 drwxr-xr-x 4 hjs admin 340 9 Jul 19:04 . drwxr-xr-x 15 hjs admin 1224 9 Jul 19:04 .. -rw-r--r-- 1 hjs staff 229 1 Jul 08:10 .acl.ttl -rw-r--r-- 1 hjs admin 109 9 Jul 19:04 .ttl lrwxr-xr-x 1 hjs admin 8 27 Jun 20:29 card - card.ttl -rw-r--r-- 1 hjs admin 167 7 Jul 22:42 card.acl.ttl -rw-r--r-- 1 hjs admin 896 27 Jun 21:41 card.ttl -rw-r--r-- 1 hjs admin 102 27 Jun 22:32 index.ttl drwxr-xr-x 2 hjs admin 102 27 Jun 22:56 raw drwxr-xr-x 3 hjs admin 204 28 Jun 12:51 test All files with the same initial name up to the . are considered to work together, (and in the current implementation are taken care of by the same agent). Symbolic links are useful in that they: • allow one to write and follow linked data that works on the file system without needing to name files by their extensions. For example a statement such as [] wac:agent card#me can work on the file system just as well as on the web. • they guide the web agent to which the default representation should be • currently they also help the web agent decide which are the resources it should serve. There are three types of resources in this directory: • The symbolic links such as card distinguish the default resources that can be found by an httpGET on http://localhost:8443/2013/card. Above the card - card.ttl shows that card has a defaultturtle representation. • Each resource also comes with a Web Access Control List, in this example card.acl.ttl, which set access control restrictions on resources on the file system. • Directories store extra data (in addition to their contents) in the .ttl file. (TODO: not quite working) • Directories also have their access control list which are published in a file named .acl.ttl. These conventions are provisional implementation decisions, and improvements are to be expected here . (TODO: • updates to the file system are not reflected yet in the server • allow symbolic links to point to different default formats ) Let us look at some of these files in more detail The acl for card just includes the acl for the directory/collection . (TODO: wac:include has not yet been defined in the Web Access Control Ontology) $ cat card.acl.ttl @prefix wac: http://www.w3.org/ns/auth/acl# . @prefix foaf: http://xmlns.com/foaf/0.1/ . wac:include .acl . The acl for the directory allows access to all resources in the subdirectories of test_www when accessed from the web as https://localhost:8443/2013/ only to the user authenticated ashttps://localhost:8443/2013/card#me. (TODO: wac:regex is not defined in it's namespace - requires standardisation.) $ cat .acl.ttl @prefix acl: http://www.w3.org/ns/auth/acl# . @prefix foaf: http://xmlns.com/foaf/0.1
Re: {Disarmed} RWW-Play was: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo
Thanks. Fair enough indeed. And thanks for sticking with me through the process. I know it's a pain when n00bs like me get involved trying to use bleeding edge code :-) I look forward to the consumer version. In fact, I have feeling that Kingsley may have found much of what I want at http://dig.csail.mit.edu/2009/mod_authz_webid/README -- ** this might be what you need re. Apache ** . In fact I don't have access to the Apache config on the machine I was using, but I will have a go on a machine I do when I have a minute. If I (or someone else) is successful, a report back with the absolute minimum for doing the whole WebID thing that way would be a nice resource. And of course I would love to see a Wordpress plugin (the Drupal plugin seemed to have to many dependencies for me to even think about writing my first wordpress plugin!) Best Hugh On 9 Aug 2013, at 19:17, Henry Story henry.st...@bblfish.net wrote: On 9 Aug 2013, at 19:34, Hugh Glaser h...@ecs.soton.ac.uk wrote: Thanks Henry. Well I had looked there, but it all looked quite complicated - I have never cloned a git thingy before and I don't even know if Java is available on the host :-) But emboldened by your encouragement I went for the The short version. I was very encouraged, as it seemed to do quite a lot, but seemed to hang after getting play-2-TLS-e6c58f64585b182f937358fa984474b86984d77d.tar.bz2 But, even when I tried to do it by hand (the Longer Version), I eventually got the java was killed for excessive resource usage. By which tim sit had downloaded 427MB of stuff. I don't think these are the sort of hosting costs I want to have. So I have a sense this is not the solution I was looking for :-) Well this is not optimised yet. It's for developers. At the moment you need a powerful modern machine. I am assuming people here on the Linked Data List are interested in working with bleeding edge code, and getting an idea of where things are heading to. If you want the couch potatoe version, then you need to wait for the consumer version. :-) Very best Hugh By the way, the link in An initial implementation of Linked Data Basic Profile does a 404. On 9 Aug 2013, at 18:09, Henry Story henry.st...@bblfish.net wrote: On 9 Aug 2013, at 18:55, Hugh Glaser h...@ecs.soton.ac.uk wrote: Thanks. I've looked at quite a bit of this stuff, but still don't see where the ACL document gets stored and used. I am beginning to get the sense that I may have to write some code, other than the ACL rdf to do this. Surely Apache or something else will do this for me? Can't I just put the ACL in a file (as in htpasswd) and point something at it? I certainly don't want to be writing code to make one photo (or simply a static web site) available. Or is that the delegated service you are talking about? I've got my fingers crossed here. You can follow the instructions on installing https://github.com/stample/rww-play (It's under the Apache Licence and patches and contributions are welcome ) Then you'll be able to do the following: An initial implementation of the Linked Data Platform spec is implemented here. The same way as theApache httpd server it servers resource from the file system and maps them to the web. By default we map the test_www directory's content to http://localhost:8443/2013/. The test_www directory starts with a few files to get you going $ cd test_www $ ls -al total 48 drwxr-xr-x 4 hjs admin 340 9 Jul 19:04 . drwxr-xr-x 15 hjs admin 1224 9 Jul 19:04 .. -rw-r--r-- 1 hjs staff 229 1 Jul 08:10 .acl.ttl -rw-r--r-- 1 hjs admin 109 9 Jul 19:04 .ttl lrwxr-xr-x 1 hjs admin 8 27 Jun 20:29 card - card.ttl -rw-r--r-- 1 hjs admin 167 7 Jul 22:42 card.acl.ttl -rw-r--r-- 1 hjs admin 896 27 Jun 21:41 card.ttl -rw-r--r-- 1 hjs admin 102 27 Jun 22:32 index.ttl drwxr-xr-x 2 hjs admin 102 27 Jun 22:56 raw drwxr-xr-x 3 hjs admin 204 28 Jun 12:51 test All files with the same initial name up to the . are considered to work together, (and in the current implementation are taken care of by the same agent). Symbolic links are useful in that they: • allow one to write and follow linked data that works on the file system without needing to name files by their extensions. For example a statement such as [] wac:agent card#me can work on the file system just as well as on the web. • they guide the web agent to which the default representation should be • currently they also help the web agent decide which are the resources it should serve. There are three types of resources in this directory: • The symbolic links such as card distinguish the default resources that can be found by an httpGET on http://localhost:8443/2013/card. Above the card - card.ttl shows that card has a defaultturtle representation. • Each resource also comes with a Web Access
FOAF Editor - was Re: WebID Frustration
Norman, hello. Very interesting. Yes, I think that works. I think I had got mislead into thinking the issuer was significant - especially as the one I created calls itself Key from my-profile.eu, but of course I could change that in keychain. I was sort of thinking of a FOAF service, which also just happens to do WebID if you click the WebID button (on by default, since people don't even need to know what it is?). So, essentially the next generation of foaf-a-matic. I'm sure I remember talking about this stuff many years ago :-), but maybe WebID makes it even more useful. In some sense this is a way to get WebID more widely adopted - be in a symbiotic relationship with FOAF. Because it also gets FOAF more widely adopted because it does the ID thing. I'm guessing the WebID people have had all these discussions. So the service would create and edit a Personal Profile Document for users. It would look after it itself if you wanted, GET and PUT it on a third party if desired and possible, or give you the edited version to put somewhere yourself. Personally I would love to have something better than vi to edit my FOAF, much as I love it :-) Best Hugh On 6 Aug 2013, at 23:26, Norman Gray nor...@astro.gla.ac.uk wrote: Hugh, hello. On 2013 Aug 6, at 22:58, Hugh Glaser h...@ecs.soton.ac.uk wrote: [...and quoting out of order...] I looked a quite a few sites before choosing where my OpenID would be. So did I, but OpenID allows for some indirection, so that the OpenID that I quote -- http://nxg.me.uk/norman/openid -- isn't committed to a particular OpenID provider. I use versignlabs.com, but could change away from them without disruption. This is relevant because... Actually, this whole thing seems to me (I now realise) nothing to do with WedID per se. It is about creating and editing FOAF files. Aha, yes! This is the key thing, I think. So the question of how to get a WebID may reduce to the question of how to get a certificate which includes a 'good' X.509 Subject Alternative Name, with 'good' here meaning something like 'the FOAF file I (apparently or to my surprise) already have'. Now, while there's a very small number who might want to do the whole thing from scratch, there's a larger number of people who might already have a FOAF file somewhere, and a still larger number of people (possibly all of Facebook? -- did they ever actually do this?) who have a FOAF profile but don't know it by that name. As in... But actually I didn't; what I wanted was a WebID that didn't create an account somewhere (most of the sites I found offer an account that comes with a WebID as a side-effect). So you want the inverse of this, in some loose sense. What probably would work in this case is a service which allows two steps: 1. You can say: I've got a preexisting account at Network X; can you give me a WebID which will point to that? 2. The service says: yes, they do FOAF, so (a) here's a WebID certificate which points to that, for you to put in your browser, and (b) tell Network X to do ... blah. Step 1 is probably not t hard (especially if people can say I've got this FNOF profile thing I've been told you tell you about). Step 2a is still going to be fiddly (X.509 + browser = baldness), but I imagine that it's the 'blah' in step 2b that will require network by network cooperation. Though all it would require is for the user to upload their new WebID certificate to the cooperating service for it to work out what the WebID is that it should add to the preexisting user's FOAF profile. So you choose which network gets to edit and serve your FOAF file for you, and only have to mention that on one occasion, when talking to a make-me-a-WebID service. You'd never have to go back to that WebID-creating service again. In other words, unlike OpenID, you don't even need a redirection step. Does that work? All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK