from:"Hugh Glaser"

Re: Where are the Linked Data Driven Smart Agents (Bots) ?

2016-07-08 Thread Hugh Glaser

Ah, ’twas always thus, in every field I know.
To put it bluntly, that is because much research is about getting papers 
published, not about moving the field on.

Functional programming (I used to be functional):

David Turner (he with a brain the size of a planet) said that his application 
of combinators to functional language implementation had “cost 10 years of 
wasted PhD students”.
This was because they had all striven to improve in tiny ways on the original 
implementation.
Of course most failed, as it had sprung perfectly-formed from his brain, and 
more importantly been perfectly-engineered by him into the system†.
But even when they succeeded, the increment didn't amount to a hill of beans.

And again, watching paper after paper purporting to improve some theoretical 
upper-bound on execution of some variant of the λ-calculus, which in fact could 
never be reached without immense execution overheads, was pretty depressing, in 
terms of wasted research time.

Of course, small increments are not always useless. In Operational Research 
they are bread and butter. But that is because it is a mature field with clear 
applications, and if you can improve a Search/Optimisation technique by a 
fraction of a percent, you might save millions on the billions cost of 
something.

In the first decades of a field, it is highly unlikely that incremental change 
will be significant in the long run, not least because entirely new methods and 
techniques will be discovered, making the base ones redundant, and rendering 
the increment moot.

†One of my favourite comments was in the garbage collector of David's C 
implementation of his SK-reduction machine: "Now follow everything on the C 
stack that looks like a pointer". :-)

> On 8 Jul 2016, at 05:01, Ruben Verborgh  wrote:
> 
> HI Krzysztof,
> 
>> this is all about finding the right balance
> 
> Definitely—but I have the feeling the balance
> is currently tipped very much to one side
> (and perhaps not the side that delivers
> the most urgent components for the SemWeb).
> 
>> as we also do not want to have tons of 'ideas' 
>> papers without any substantial content or proof of concept
> 
> Mere ideas would indeed not be sufficient;
> but even papers with substantial content
> and/or a proof of concept will have a difficult time
> getting accepted if there is no evaluation
> that satisfies the reviewers.
> (And, lacking a framework to evaluate evaluations,
> I see people typically choosing for things they know,
> hence why incremental research gets accepted easily.)
> 
> Best,
> 
> Ruben

CFPs and the lists

2016-07-07 Thread Hugh Glaser

Hmmm.
So I am enjoying the new regime without CfPs on the LOD list (many thanks, 
Phil!).
However, I now find myself thinking I will unsubscribe from the SemWeb list, 
since it is almost all CfPs, few, if any, of which I want.

I think this may be an unintended consequence (although probably predictable) - 
losing people from SemWeb.

There isn’t really a question here - I just thought I would report it.
If there is a question: dare I suggest that now things have settled down, and 
we can see how things are working, that we might want to revisit the idea of 
having a separate list for CfPs, and reclaim the SemWeb list for discussion? 
(Sorry Phil?)
Or is the answer that I simply set mail filters and carry on?

Best
Hugh

[Job Advert] Developer / Data Scientist

2016-05-24 Thread Hugh Glaser

Seme4 is looking for people to do exciting things:
http://www.seme4.com/wp-content/uploads/2016/05/Seme4-developer-vacancy-May2016.pdf
Feel free to email me if you like.
Best
Hugh

--
Hugh Glaser
Chief Architect
Seme4 Limited
International House
Southampton International Business Park
Southampton
Hampshire
SO18 2RZ
Mobile: +44 7595 334155
Main: +44 20 7060 1590

hugh.gla...@seme4.com
www.seme4.com

Re: Deprecating owl:sameAs

2016-04-01 Thread Hugh Glaser

And would we also have owl:differentDifferentButSame?

The built-in OWL property owl:differentDifferentButSame links things to things. 
Such an owl:differentDifferentButSame statement indicates that two URI 
references actually refer to different things but may be the same under some 
circumstances.

> On 1 Apr 2016, at 14:01, Sarven Capadisli  wrote:
> 
> There is overwhelming research [1, 2, 3] and I think it is evident at this 
> point that owl:sameAs is used inarticulately in the LOD cloud.
> 
> The research that I've done makes me conclude that we need to do a massive 
> sweep of the LOD cloud and adopt owl:sameSameButDifferent.
> 
> I think the terminology is human-friendly enough that there will be minimal 
> confusion down the line, but for the the pedants among us, we can define it 
> along the lines of:
> 
> 
> The built-in OWL property owl:sameSameButDifferent links things to things. 
> Such an owl:sameSameButDifferent statement indicates that two URI references 
> actually refer to the same thing but may be different under some 
> circumstances.
> 
> 
> Thoughts?
> 
> [1] https://www.w3.org/2009/12/rdf-ws/papers/ws21
> [2] http://www.bbc.co.uk/ontologies/coreconcepts#terms_sameAs
> [3] http://schema.org/sameAs
> 
> -Sarven
> http://csarven.ca/#i
>

Re: Survey: Use of this list for Calls for Papers

2016-04-01 Thread Hugh Glaser

Hi Phil,
Good question.
I’m afraid none of the username/passwords I have for w3.org seem to work.
Can you give me a hint at which pair I should be using, or tell me how to 
retrieve/reset, please?

While I’m here… :-)
a) I think the idea of allowing CFPs, as long as they clearly have [CFP] or 
whatever in the subject line, is great.
b) We could pick one of the two lists, then we would see less duplication; I 
would suggest semweb, as that embraces LD.
(Maybe I would get to vote that way, but I don’t know what the 4 questions are 
:-) )
c) I don’t want to have CFPs shortened - I often read my email when I am 
offline (in fact I keep such emails to read offline), and it is a pain when the 
information is all “just a click away”, but I can’t get it.

Best
Hugh 

> On 30 Mar 2016, at 12:21, Phil Archer  wrote:
> 
> Dear all,
> 
> A perennial topic at W3C is whether we should allow calls for papers to be 
> posted to our mailing lists. Many argue, passionately, that we should not 
> allow any CfPs on any lists. It is now likely that this will be the policy, 
> with any message detected as being a CfP marked as spam (and therefore 
> blocked).
> 
> Historically, the semantic-web and public-lod lists have been used for CfPs 
> and we are happy for this to continue *iff* you want it.
> 
> Last time we asked, the consensus was that CfPs were seen as useful, but it's 
> time to ask you again.
> 
> Please take a minute to answer the 4 question, no need for free text, survey 
> at https://www.w3.org/2002/09/wbs/1/1/
> 
> Thanks
> 
> Phil.
> 
> -- 
> 
> 
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>

Re: SEMANTiCS 2016, Leipzig, Sep 12-15, Call for Research & Innovation Papers

2016-01-18 Thread Hugh Glaser

Hi.
I’m sort of puzzled by this.
We used to have:
SEMANTiCS 2015 : 11th International Conference on Semantic Systems 
SEMANTICS 2014 : 10th International Conference on Semantic Systems
etc.

> On 18 Jan 2016, at 09:56, Sebastian Hellmann 
>  wrote:
> 
> Call for Research & Innovation Papers
But now we have:
> SEMANTiCS 2016 - The Linked Data Conference
It doesn’t look like the call has changed much.
Linked (Open) Data is now additionally in a bit of one of the topics, and also 
in a couple of other places as usual.

Has anything significant changed?

Obviously I am asking because I have never thought of SEMANTiCS as the go to 
place for Linked Data research publication.
And when I look at the Verticals, for example, it isn’t immediately obvious 
that it is the the list of things to choose for "The Linked Data Conference”.
On the other hand, a conference with all those topics, where there was a 
*requirement* that authors used Linked Data technologies and practices would be 
pretty exciting.
For example, would a submission on "Smart Connectivity, Networking & 
Interlinking” that didn’t use or apply to Linked Data be rejected as out of 
scope?
Is that what’s happening?

Best regards
Hugh
> 
> Transfer // Engineering // Community
> 
> 
> 12th International Conference on Semantic Systems
> 
> Leipzig, Germany
> 
> September 12 -15, 2016
> 
> http://2016.semantics.cc
> 
> 
> Important Dates (Research & Innovation)
> 
>   • Abstract Submission Deadline:April 14, 2016 (11:59 pm, 
> Hawaii time)
>   • Paper Submission Deadline:April 21, 2016 (11:59 pm, 
> Hawaii time)
>   • Notification of Acceptance: May 26, 2016 (11:59 pm, 
> Hawaii time)
>   • Camera-Ready Paper: June 16, 2016 (11:59 pm, 
> Hawaii time)
> Submissions via Easychair: 
> https://easychair.org/conferences/?conf=semantics2016research 
> 
> As in the previous years, SEMANTiCS’16 proceedings are expected to be 
> published by ACM ICP.
> 
> 
> 
> The annual SEMANTiCS conference is the meeting place for professionals who 
> make semantic computing work, who understand its benefits and encounter its 
> limitations. Every year, SEMANTiCS attracts information managers, 
> IT-architects, software engineers and researchers from organisations ranging 
> from NPOs, through public administrations to the largest companies in the 
> world. Attendees learn from industry experts and top researchers about 
> emerging trends and topics in the fields of semantic software, enterprise 
> data, linked data & open data strategies, methodologies in knowledge 
> modelling and text & data analytics. The SEMANTiCS community is highly 
> diverse; attendees have responsibilities in interlinking areas like 
> knowledge management, technical documentation, e-commerce, big data 
> analytics, enterprise search, document management, business intelligence and 
> enterprise vocabulary management.
> 
> The success of last year’s conference in Vienna with more than 280 attendees 
> from 22 countries proves that SEMANTiCS 2016 will continue a long tradition 
> of bringing together colleagues from around the world. There will be 
> presentations on industry implementations, use case prototypes, best 
> practices, panels, papers and posters to discuss semantic systems in 
> birds-of-a-feather sessions as well as informal settings. SEMANTICS addresses 
> problems common among information managers, software engineers, IT-architects 
> and various specialist departments working to develop, implement and/or 
> evaluate semantic software systems.
> 
> The SEMANTiCS program is a rich mix of technical talks, panel discussions of 
> important topics and presentations by people who make things work - just like 
> you. In addition, attendees can network with experts in a variety of fields. 
> These relationships provide great value to organisations as they encounter 
> subtle technical issues in any stage of implementation. The expertise gained 
> by SEMANTiCS attendees has a long-term impact on their careers and 
> organisations. These factors make SEMANTiCS for our community the major 
> industry related event across Europe.
> 
> 
> SEMANTiCS 2016 will especially welcome submissions for the following hot 
> topics:
> 
>   • Data Quality Management
>   • Data Science (Data Mining, Machine Learning, Network Analytics)
>   • Semantics on the Web, Linked (Open) Data & schema.org
>   • Corporate Knowledge Graphs
>   • Knowledge Integration and Language Technologies
>   • Economics of Data, Data Services and Data Ecosystems
> 
> Following the success of previous years, the ‘horizontals’ (research) and 
> ‘verticals’ (industries) below are of interest for the conference:
> 
> Horizontals
> 
>   • Enterprise Linked Data & Data Integration
>   • Knowledge Discovery & Intelligent Search
>   • Business Models, Governance & Data Strategies
>

Re: CfP: WWW2016 workshop on Linked Data on the Web (LDOW2016)

2015-11-05 Thread Hugh Glaser

Many thanks Chris, very helpful information, and very quickly.
And good news too!

> On 3 Nov 2015, at 15:45, Christian Bizer  wrote:
> 
> Hi Hugh,
> 
>> Hi Chris et al,
>> Great stuff.
>> Can you tell me please if it will be possible to register for the workshop 
>> on its own, or will a registration for the full WWW be required to register 
>> for the workshop?
> 
> The WWW2016 workshop track chairs just confirmed that it will be possible to 
> register again for the workshop days (not a specific workshop) similar to the 
> arrangement last year.
> 
> The concrete prices seem not to be set yet, but last year the fees were 410 
> Euro just for the workshop days compared to 850 Euro for the full pass.
> 
> See http://www.www2015.it/registrations/
> 
> Cheers and hope to see you in Montreal,
> 
> Chris
> 
>> 
>>> On 2 Nov 2015, at 09:06, Christian Bizer  wrote:
>>> 
>>> Hi all,
>>> 
>>> Sören Auer, Tim Berners-Lee, Tom Heath, and I are organizing the 9th edition
>>> of the Linked Data on the Web workshop at WWW2016 in Montreal, Canada. The
>>> paper submission deadline for the workshop is  24 January, 2016. Please find
>>> the call for papers below.
>>> 
>>> We are looking forward to having another exciting workshop and to seeing
>>> many of you in Montreal.
>>> 
>>> Cheers,
>>> 
>>> Chris, Tim, Sören, and Tom
>>> 
>>> 
>>> 
>>> 
>>> Call for Papers: 9th Workshop on Linked Data on the Web (LDOW2016)
>>> 
>>> 
>>>Co-located with 25th International World Wide Web Conference
>>>   April 11 to 15, 2016 in Montreal, Canada
>>> 
>>> 
>>>  http://events.linkeddata.org/ldow2016/
>>> 
>>> 
>>> 
>>> The Web is developing from a medium for publishing textual documents into a
>>> medium for sharing structured data. This trend is fueled on the one hand by
>>> the adoption of the Linked Data principles by a growing number of data
>>> providers. On the other hand, large numbers of websites have started to
>>> semantically mark up the content of their HTML pages and thus also
>>> contribute to the wealth of structured data available on the Web.
>>> 
>>> The 9th Workshop on Linked Data on the Web (LDOW2016) aims to stimulate
>>> discussion and further research into the challenges of publishing,
>>> consuming, and integrating structured data from the Web as well as mining
>>> knowledge from the global Web of Data. The special focus of this year’s LDOW
>>> workshop will be Web Data Quality Assessment and Web Data Cleansing.
>>> 
>>> 
>>> *Important Dates*
>>> 
>>> * Submission deadline: 24 January, 2016 (23:59 Pacific Time)
>>> * Notification of acceptance: 10 February, 2016
>>> * Camera-ready versions of accepted papers: 1 March, 2016
>>> * Workshop date: 11-13 April, 2016
>>> 
>>> 
>>> *Topics of Interest*
>>> 
>>> Topics of interest for the workshop include, but are not limited to, the
>>> following:
>>> 
>>> Web Data Quality Assessment
>>> * methods for evaluating the quality and trustworthiness of web data
>>> * tracking the provenance of web data
>>> * profiling and change tracking of web data sources
>>> * cost and benefits of web data quality assessment
>>> * web data quality assessment benchmarks
>>> 
>>> Web Data Cleansing
>>> * methods for cleansing web data
>>> * data fusion and truth discovery
>>> * conflict resolution using semantic knowledge
>>> * human-in-the-loop and crowdsourcing for data cleansing
>>> * cost and benefits of web data cleansing
>>> * web data quality cleansing benchmarks
>>> 
>>> Integrating Web Data from Large Numbers of Data Sources
>>> * linking algorithms and heuristics, identity resolution
>>> * schema matching and clustering
>>> * evaluation of linking and schema matching methods
>>> 
>>> Mining the Web of Data
>>> * large-scale derivation of implicit knowledge from the Web of Data
>>> * using the Web of Data as background knowledge in data mining
>>> * techniques and methodologies for Linked Data mining and analytics
>>> 
>>> Linked Data Applications
>>> * application showcases including Web data browsers and search engines
>>> * marketplaces, aggregators and indexes for Web Data
>>> * security, access control, and licensing issues of Linked Data
>>> * role of Linked Data within enterprise applications (e.g. ERP, SCM, CRM)
>>> * Linked Data applications for life-sciences, digital humanities, social
>>> sciences etc.
>>> 
>>> 
>>> *Submissions*
>>> 
>>> We seek two kinds of submissions:
>>> 
>>> 1. Full scientific papers: up to 10 pages in ACM format
>>> 2. Short scientific and position papers: up to 5 pages in ACM format
>>> 
>>> Submissions must be formatted using the ACM SIG template available at
>>> http://www.acm.org/sigs/publications/proceedings-templates. Accepted papers
>>> will be presented at the workshop and included in the CEUR workshop
>>> proceedings. At least one author of each paper has to register for the
>>> workshop and to present the paper.
>>> 
>>> 
>>> *Organizing Committee*
>>> 
>>> Christian Bizer,

Re: CfP: WWW2016 workshop on Linked Data on the Web (LDOW2016)

2015-11-02 Thread Hugh Glaser

Hi Chris et al,
Great stuff.
Can you tell me please if it will be possible to register for the workshop on 
its own, or will a registration for the full WWW be required to register for 
the workshop?
Thanks.
Best
Hugh

> On 2 Nov 2015, at 09:06, Christian Bizer  wrote:
> 
> Hi all,
> 
> Sören Auer, Tim Berners-Lee, Tom Heath, and I are organizing the 9th edition
> of the Linked Data on the Web workshop at WWW2016 in Montreal, Canada. The
> paper submission deadline for the workshop is  24 January, 2016. Please find
> the call for papers below.
> 
> We are looking forward to having another exciting workshop and to seeing
> many of you in Montreal.
> 
> Cheers,
> 
> Chris, Tim, Sören, and Tom
> 
> 
> 
> 
>  Call for Papers: 9th Workshop on Linked Data on the Web (LDOW2016)
> 
> 
> Co-located with 25th International World Wide Web Conference
>April 11 to 15, 2016 in Montreal, Canada
> 
> 
>   http://events.linkeddata.org/ldow2016/
> 
> 
> 
> The Web is developing from a medium for publishing textual documents into a
> medium for sharing structured data. This trend is fueled on the one hand by
> the adoption of the Linked Data principles by a growing number of data
> providers. On the other hand, large numbers of websites have started to
> semantically mark up the content of their HTML pages and thus also
> contribute to the wealth of structured data available on the Web.
> 
> The 9th Workshop on Linked Data on the Web (LDOW2016) aims to stimulate
> discussion and further research into the challenges of publishing,
> consuming, and integrating structured data from the Web as well as mining
> knowledge from the global Web of Data. The special focus of this year’s LDOW
> workshop will be Web Data Quality Assessment and Web Data Cleansing.
> 
> 
> *Important Dates*
> 
> * Submission deadline: 24 January, 2016 (23:59 Pacific Time)
> * Notification of acceptance: 10 February, 2016
> * Camera-ready versions of accepted papers: 1 March, 2016
> * Workshop date: 11-13 April, 2016
> 
> 
> *Topics of Interest*
> 
> Topics of interest for the workshop include, but are not limited to, the
> following:
> 
> Web Data Quality Assessment
> * methods for evaluating the quality and trustworthiness of web data
> * tracking the provenance of web data
> * profiling and change tracking of web data sources
> * cost and benefits of web data quality assessment
> * web data quality assessment benchmarks
> 
> Web Data Cleansing
> * methods for cleansing web data
> * data fusion and truth discovery
> * conflict resolution using semantic knowledge
> * human-in-the-loop and crowdsourcing for data cleansing
> * cost and benefits of web data cleansing
> * web data quality cleansing benchmarks
> 
> Integrating Web Data from Large Numbers of Data Sources
> * linking algorithms and heuristics, identity resolution
> * schema matching and clustering
> * evaluation of linking and schema matching methods
> 
> Mining the Web of Data
> * large-scale derivation of implicit knowledge from the Web of Data
> * using the Web of Data as background knowledge in data mining
> * techniques and methodologies for Linked Data mining and analytics
> 
> Linked Data Applications
> * application showcases including Web data browsers and search engines
> * marketplaces, aggregators and indexes for Web Data
> * security, access control, and licensing issues of Linked Data
> * role of Linked Data within enterprise applications (e.g. ERP, SCM, CRM)
> * Linked Data applications for life-sciences, digital humanities, social
> sciences etc.
> 
> 
> *Submissions*
> 
> We seek two kinds of submissions:
> 
>  1. Full scientific papers: up to 10 pages in ACM format
>  2. Short scientific and position papers: up to 5 pages in ACM format
> 
> Submissions must be formatted using the ACM SIG template available at
> http://www.acm.org/sigs/publications/proceedings-templates. Accepted papers
> will be presented at the workshop and included in the CEUR workshop
> proceedings. At least one author of each paper has to register for the
> workshop and to present the paper. 
> 
> 
> *Organizing Committee*
> 
> Christian Bizer, University of Mannheim, Germany
> Tom Heath, Open Data Institute, UK
> Sören Auer, University of Bonn and Fraunhofer IAIS, Germany
> Tim Berners-Lee, W3C/MIT, USA
> 
> 
> *Contact Information*
> 
> For further information about the workshop, please contact the workshops
> chairs at:  ldow2...@events.linkeddata.org
> 
> 
> --
> Prof. Dr. Christian Bizer
> Data and Web Science Group
> University of Mannheim, Germany 
> ch...@informatik.uni-mannheim.de
> http://dws.informatik.uni-mannheim.de/bizer
> 
> 
> 
> 
>

Re: Discovering a query endpoint associated with a given Linked Data resource

2015-08-26 Thread Hugh Glaser

 information about the query 
 endpoints. 
 
 -
 dbpedia:Sri_Lanka void:inDataset _:DBpedia .
 _:DBpedia a void:Dataset;
 void:sparqlEndpoint http://dbpedia.org/sparql;
 void:uriLookupEndpoint http://fragments.dbpedia.org/2014/en?subject= .
 --
 or 
 
 
 Link: http://dbpedia.org/void/Dataset; 
 rel=http://rdfs.org/ns/void#inDataset;
 
 
 Best Regards,
 Nandana
 
 [1] http://www.w3.org/TR/void/#discovery-links
 
 On Wed, Aug 26, 2015 at 11:05 AM, Miel Vander Sande 
 miel.vandersa...@ugent.be wrote:
 Hi Nandana,
 
 I guess VoID would be the best fit
 
 In case of LDF you could use
 
 ... void:uriLookupEndpoint http://fragments.dbpedia.org/2014/en?subject=
 
 But wether these exists in practice? Probably not. I'd leave it up to the 
 dereference publisher to provide this triple in te response, rather than 
 doing the .well_known thing.
 
 Best,
 
 Miel
 
 On 26 Aug 2015, at 10:57, Víctor Rodríguez Doncel vrodrig...@fi.upm.es 
 wrote:
 
 
  Well, you might try to look in this folder location:
  .well-known/void
  And possibly find a void:sparqlEndpoint.
 
  But this would be too good to be true.
 
  Regards,
  Víctor
 
  El 26/08/2015 10:45, Nandana Mihindukulasooriya escribió:
  Hi,
 
  Is there a standard or widely used way of discovering a query endpoint 
  (SPARQL/LDF) associated with a given Linked Data resource?
 
  I know that a client can use the follow your nose and related link 
  traversal approaches such as [1], but if I wonder if it is possible to 
  have a hybrid approach in which the dereferenceable Linked Data 
  resources that optionally advertise query endpoint(s) in a standard way 
  so that the clients can perform queries on related data.
 
  To clarify the use case a bit, when a client dereferences a resource URI 
  it gets a set of triples (an RDF graph) [2].  In some cases, it might be 
  possible that the returned graph could be a subgraph of a named graph / 
  default graph of an RDF dataset. The client wants to discover if a query 
  endpoint that exposes the relevant dataset, if one is available.
 
  For example, something like the following using the search link 
  relation [3].
 
  --
  HEAD /resource/Sri_Lanka
  Host: http://dbpedia.org
  --
  200 OK
  Link: http://dbpedia.org/sparql; rel=search; type=sparql, 
  http://fragments.dbpedia.org/2014/en#dataset; rel=search; type=ldf
  ... other headers ...
  --
 
  Best Regards,
  Nandana
 
  [1] 
  http://swsa.semanticweb.org/sites/g/files/g524521/f/201507/DissertationOlafHartig_0.pdf
  [2] 
  http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-rdf-graph
  [3] http://www.iana.org/assignments/link-relations/link-relations.xhtml
 
 
  --
  Víctor Rodríguez-Doncel
  D3205 - Ontology Engineering Group (OEG)
  Departamento de Inteligencia Artificial
  Facultad de Informática
  Universidad Politécnica de Madrid
 
  Campus de Montegancedo s/n
  Boadilla del Monte-28660 Madrid, Spain
  Tel. (+34) 91336 3672
  Skype: vroddon3
 
 
  ---
  El software de antivirus Avast ha analizado este correo electrónico en 
  busca de virus.
  https://www.avast.com/antivirus
 
 
 
  
 -- 
 Prof. Dr. Heiko Paulheim
 Data and Web Science Group
 University of Mannheim
 Phone: 
 +49 621 181 2646
 
 B6, 26, Room C1.08
 D-68159 Mannheim
 
 Mail: 
 he...@informatik.uni-mannheim.de
 
 Web: 
 www.heikopaulheim.com
 
 
 -- 
 Prof. Dr. Heiko Paulheim
 Data and Web Science Group
 University of Mannheim
 Phone: +49 621 181 2646
 B6, 26, Room C1.08
 D-68159 Mannheim
 
 Mail: 
 he...@informatik.uni-mannheim.de
 
 Web: 
 www.heikopaulheim.com

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Discovering a query endpoint associated with a given Linked Data resource

2015-08-26 Thread Hugh Glaser

Thanks.
Yeah, pretty close, although I doubt that the RDF returned would have anything 
of type dcat:Distribution in it to let me form the triple.

Sorry, looking at my email I realise I was having a brain fart on seeAlso - of 
course endpoints are Resources (unless I am also having a senior moment :-) ).
Also, sorry, by the way, Laurens, I think backlinks services such as LOD 
Laundromat are great, and an important part of the LD world.
It is just that I think it should be simpler for consumers in this case (and 
more efficient)

Perhaps I should have suggested:
http://sws.geonames.org/3405870/ void:inDataset :DBpedia .
:DBpedia void:sparqlEndpoint http://dbpedia.org/sparql .
to be delivered with the RDF.
I think I could cope with 2 triples!

Unfortunately, void:inDataset has foaf:Document as the range.
So that would mean I would need to do a bit more stuff, and it is getting more 
complicated again.

And also:
http://www.w3.org/TR/void/ 6.3 says
Providing metadata about the entire dataset in such a scenario should not be 
done by including VoID details in every document. Rather, a single VoID 
description of the entire dataset should be published, and individual documents 
should point to this description via backlinks.”
(My brain really hurts from trying to remember all this stuff form many years 
ago now.)

However, I take that to mean that I shouldn’t put *all* the VoID stuff in each 
document.
If I have licence stuff etc., then it should be in a common void.ttl or 
whatever.

I don’t think there is any harm putting some selected bits of VoID in the 
document - in fact, if the VoID data is in the store itself (as I assume it 
should be), then some of it will arrive as a natural part of the SCBD.

Is there some simple idiom we could all use to carry the info?
I dunno, something like:

:DBpedia void:uriRegexPattern “^http://sws\\.geonames\\.org/3405870/“ .
:DBpedia void:sparqlEndpoint http://dbpedia.org/sparql .
But that does require some processing.
Any better ideas? I suspect I am just being thick.

Would that do what I want?
(Would it also do what Nandan wants? :-) )

Cheers
 On 26 Aug 2015, at 13:44, Ghislain Atemezing auguste.atemez...@eurecom.fr 
 wrote:
 
 Hi Hugh,
 
 Le 26 août 2015 à 14:23, Hugh Glaser h...@glasers.org a écrit :
 
 Another major reason is that the publisher may not have the rights to 
 publish .well-known and its ilk.
 And if it comes with the RDF we can be really confident of the provenance 
 and trust of who has recommended it.
 Also, it is a damn sight easier to maintain, than to rebuild the vOID 
 document every time something changes.
 
 At the data level, DCAT [1] already defines a property 
 http://www.w3.org/ns/dcat#accessURL which points to “a landing page, feed, 
 SPARQL endpoint or any other type of resource that gives access to the 
 distribution of the dataset.”
 
 Of course, the question will remain if no one uses this property at least at 
 dataset/data catalog level. 
 
 Best,
 
 Ghislain
 
 [1] http://www.w3.org/TR/vocab-dcat/ 
 
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: [ANN] nature.com/ontologies - July 2015 Release

2015-08-09 Thread Hugh Glaser

Hi Tony,
 On 8 Aug 2015, at 10:19, Hammond, Tony tony.hamm...@macmillan.com wrote:
 
 Hi Hugh:
 
 Many thanks for the comments. This is exactly the kind of thing we need to
 hear.
Thanks you for your responses, and the manner in which your received my 
comments.
 
 So, I think you may have raised four separate points which I'll try to
 answer in turn:
 
 ==
 1. Examples
 
 You are right. We've been sloppy. These were intended more for reading
 than parsing. So we took liberties with omitting common namespaces,
 abbreviating strings, etc. Punctuation, we were just careless.
Sure thing.
 
 But I agree there is real value in making these examples complete. We will
 address this in our next release.
 
 2. Dereference
 
 We really have no defence here. **We do not support dereference at this
 time.** The datasets are outputs from our production systems and HTTP URIs
 are used for namespacing only. We need to figure out a strategy for
 supporting dereference.
 
 So, if that means we are Bad Guys for violating Principle 2, then so be
 it. We are not trying to claim that this is true Linked Data - it's only
 common-or-garden linked data (of the RDF kind).
 
 That's not to say that we are not interested in adding in dereference.
 Only that these things take time to implement and we are proceeding with
 our data publishing in an incremental manner.
Again, sure thing.

What you have done is lovely Semantic Web stuff - it is only the Linked Data 
bit that seems to be missing.
However, looking at the site, I think you may be being hard on yourself.
Were you to have used the term Semantic Web instead of Linked Data on 
http://www.nature.com/ontologies/ and the other explanation pages, I think it 
would have been less misleading.
I certainly would not have then gone to the data with the expectation of URIs 
resolving.

 
 
 So, for now, sorry!
 
 3. URNs, etc
 
 Note that the RDF you obtained from dereferencing the DOI is from CrossRef
 - not from ourselves. So we cannot properly answer for something retrieved
 from a third party. That said, CrossRef are also in the early stages of
 data publishing, and may not themselves have reached the Linked Data
 standard. Again, seems like it's only RDF at this time.
OK - got that. Not your RDF.
(I would rename your DOCS dir to be VoID, by the way, as it makes it more 
attractive to people like me :-) )
 
 4. Mappings
 
 Am a little perplexed as to the distinction between mappings and links,
 although maybe I can see where you're coming from. Note that we're anyway
 planning to decouple our ontology mappings and put those in separate files
 and list then under Mappings.
Sounds good.
A personal view:
A mapping is something that says to things are pretty much the same (in some 
sense we won’t go into);
A link is the use of a URI that comes from as different source, such as your 
use of the dbpedia URIs.
So I would say that something like
http://dx.doi.org/10.1038/003022a0 http://xmlns.com/foaf/0.1/topic 
http://dbpedia.org/resource/Thomas_Henry_Huxley
is a link from your dataset to dbpedia.
If you also had a URI for http://dbpedia.org/resource/Thomas_Henry_Huxley that 
you wanted to say was the same, then you would say so using a skos or owl 
predicate, and that would be a mapping. 
 Our core and domain ontologies generally
 have SKOS mappings, i.e. we use skos:closeMatch, skos:broadMatch,
 skos:exactMatch, skos:relatedMatch, etc. This feels appropriate for the
 ontology and the taxonomies.
 
 I guess we are cautiously feeling our way forward and want to be a little
 careful about using owl:sameAs.
I’m happy to get those as well :-)
 
 ==
 
 So, I hope we've clarified some things here. There's a couple obvious
 things we can do/are doing (examples, mappings). Some other things are out
 of our hands (DOI dereference). And some will need more time for us to
 implement (dereference generally).
 
 Anyway, many thanks again for all your comments. It's really good to hear
 back from real users. Otherwise it can feel like we are whistling in the
 wind.
Pleasure.

Hugh
 
 Tony
 
 
 
 
 
 On 07/08/2015 12:48, Hugh Glaser h...@glasers.org wrote:
 
 Hi Tony,
 Great stuff!
 So I start exploring, looking for more fodder for sameAs.org Š :-)
 
 It may be that my questions are too specific for the list - feel free to
 go off-list in response, and then we can summarise.
 And there is rather a lot here, I¹m afraid.
 
 Some possible problemettes I hit:
 http://www.nature.com/ontologies/datasets/articles/#data_example
 might be confusing for people (and awkward when I tried to rapper it).
 Since quite a few prefixes are not declared, most notably one of yours:
 npg, but also the usual suspects (xsd, dc, bibo, foaf and also prism).
 There is also a missing foaf:homepage that causes a syntax error.
 And some semi-colons missing off the last few lines.
 
 A slightly more challenging problem is that the URI for that example
 doesn¹t resolve.
 It unqualifies to http://ns.nature.com/articles/nrg3870 (I

Re: [ANN] nature.com/ontologies - July 2015 Release

2015-08-07 Thread Hugh Glaser

 the sender and delete it from your mailbox or any other 
 storage mechanism. Neither Macmillan Publishers Limited nor Macmillan 
 Publishers International Limited nor any of their agents accept liability for 
 any statements made which are clearly the sender's own and not expressly made 
 on behalf of Macmillan Publishers Limited or Macmillan Publishers 
 International Limited or one of their agents. 
 Please note that neither Macmillan Publishers Limited nor Macmillan 
 Publishers International Limited nor any of their agents accept any 
 responsibility for viruses that may be contained in this e-mail or its 
 attachments and it is your responsibility to scan the e-mail and 
 attachments (if any). No contracts may be concluded on behalf of Macmillan 
 Publishers Limited or Macmillan Publishers International Limited or their 
 agents by means of e-mail communication. 
 Macmillan Publishers Limited. Registered in England and Wales with registered 
 number 785998. Macmillan Publishers International Limited. Registered in 
 England and Wales with registered number 02063302. 
 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS 
 Pan Macmillan, Priddy and MDL are divisions of Macmillan Publishers 
 International Limited. 
 Macmillan Science and Education, Macmillan Science and Scholarly, Macmillan 
 Education, Language Learning, Schools, Palgrave, Nature Publishing Group, 
 Palgrave Macmillan, Macmillan Science Communications and Macmillan Medical 
 Communications are divisions of Macmillan Publishers Limited. 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

UK Open Data data vignette

2015-08-04 Thread Hugh Glaser

The ODI (http://theodi.org) has published a research report, Research: 
Open data means business (http://theodi.org/open-data-means-business), 
along with a Google sheet of Open Data Companies 
(https://docs.google.com/spreadsheets/d/1xwxNIaxXSEMLktb-oo1UoYTd5WMPFkRLytRVVBr5OMA).


It seemed a nice idea to map the companies and make the data accessible 
as 5 * Open Data at http://opendatacompanies.data.seme4.com


So we have loaded the company data into a Linked Data store with a 
SPARQL endpoint (http://opendatacompanies.data.seme4.com/sparql/), and 
made the URIs resolve, such as 
http://opendatacompanies.data.seme4.com/id/company/od001


We have also used the UK postcodes (and OS services) to plot the 
companies on a map: http://opendatacompanies.data.seme4.com/services/map/


There you go.
This is all a little rough-and-ready, just to see what it looks like, 
and see if anyone wants to use it. If you do, and want any changes, 
please ask.


Actually, if anyone wanted to produce a similar GSheet for other places, 
we could suck that in too.


Best
Hugh

Re: UK Open Data data vignette

2015-08-04 Thread Hugh Glaser


Yeah, the Seme4 data ain't too good either :-)

Try Tom (Tom Heath tom.he...@theodi.org) at the ODI.

On 04/08/2015 17:00, Kingsley Idehen wrote:

On 8/4/15 11:35 AM, Hugh Glaser wrote:
The ODI (http://theodi.org) has published a research report, 
Research: Open data means business 
(http://theodi.org/open-data-means-business), along with a Google 
sheet of Open Data Companies 
(https://docs.google.com/spreadsheets/d/1xwxNIaxXSEMLktb-oo1UoYTd5WMPFkRLytRVVBr5OMA). 



It seemed a nice idea to map the companies and make the data 
accessible as 5 * Open Data at http://opendatacompanies.data.seme4.com


So we have loaded the company data into a Linked Data store with a 
SPARQL endpoint (http://opendatacompanies.data.seme4.com/sparql/), 
and made the URIs resolve, such as 
http://opendatacompanies.data.seme4.com/id/company/od001


We have also used the UK postcodes (and OS services) to plot the 
companies on a map: 
http://opendatacompanies.data.seme4.com/services/map/


There you go.
This is all a little rough-and-ready, just to see what it looks like, 
and see if anyone wants to use it. If you do, and want any changes, 
please ask.


Actually, if anyone wanted to produce a similar GSheet for other 
places, we could suck that in too.


Best
Hugh


Nice work!

BTW -- how actually handles editing of the original spreadsheet? The 
information on OpenLink is really messed up. The ultimate 
demonstration of identity and identifiers gone very wrong. Half of the 
description is based on OpenLink Financials and the other half 
OpenLink Software :(

Open Position: Developer / Data Scientist at Seme4

2015-07-29 Thread Hugh Glaser


Seme4 Ltd, a leading Linked Data company founded by ECS Professors Sir Nigel 
Shadbolt and Dame Wendy Hall, is looking to recruit
experienced developers to join the technical team. For the right candidate this 
poses an exciting opportunity to work with cutting edge
technology alongside experts in the field, in an interesting, varied and 
rewarding role.
http://www.seme4.com/jobs/
Please see attached for full details.
Best
Hugh



Seme4-job-vacancy.pdf
Description: Adobe PDF document

Re: DBpedia-based RDF dumps for Wikidata

2015-05-26 Thread Hugh Glaser

Thanks Dimitris - well done to the whole team.

In case it helps anyone, I have brought up a sameAs store for the sameAs 
relations in this dataset alone:
http://sameas.org/store/wikidata_dbpedia/

In passing, it is interesting to note that the example URI, 
http://wikidata.dbpedia.org/resource/Q586 , has 110 sameAs URIs in this dataset 
alone.
What price now the old view that everybody would use the same URIs for Things?!

Best
Hugh

 On 15 May 2015, at 11:28, Dimitris Kontokostas 
 kontokos...@informatik.uni-leipzig.de wrote:
 
 Dear all,
 
 Following up on the early prototype we announced earlier [1] we are happy to 
 announce a consolidated Wikidata RDF dump based on DBpedia.
 (Disclaimer: this work is not related or affiliated with the official 
 Wikidata RDF dumps)
 
 We provide:
  * sample data for preview http://wikidata.dbpedia.org/downloads/sample/
  * a complete dump with over 1 Billion triples: 
 http://wikidata.dbpedia.org/downloads/20150330/
  * a  SPARQL endpoint: http://wikidata.dbpedia.org/sparql
  * a Linked Data interface: http://wikidata.dbpedia.org/resource/Q586
 
 Using the wikidata dump from March we were able to retrieve more that 1B 
 triples, 8.5M typed things according to the DBpedia ontology along with 48M 
 transitive types, 6.4M coordinates and 1.5M depictions. A complete report for 
 this effort can be found here:
 http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf
 
 The extraction code is now fully integrated in the DBpedia Information 
 Extraction Framework.
 
 We are eagerly waiting for your feedback and your help in improving the 
 DBpedia to Wikidata mapping coverage 
 http://mappings.dbpedia.org/server/ontology/wikidata/missing/
 
 Best,
 
 Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian 
 Hellmann
 
 [1] 
 http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg06936.html
 
 -- 
 Dimitris Kontokostas
 Department of Computer Science, University of Leipzig  DBpedia Association
 Projects: http://dbpedia.org, http://http://aligned-project.eu
 Homepage:http://aksw.org/DimitrisKontokostas
 Research Group: http://aksw.org
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Ontology to link food and diseases

2015-05-04 Thread Hugh Glaser

One of the datasets at http://data.totl.net is

Causes of Cancer
The Daily Mail has put a huge amount of effort in instructing the UK public 
about what causes and prevents cancer. Here we have marked up the causes and 
preventions where possible linking to a valid dbpedia URI. Also includes 
references to the relevant news stories.

Browse:
http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fdata.totl.net%2Fcancer_causes.rdf

It has a fair bit of data, but not a very rich ontology (hardly any), so 
possibly not a huge help.
But certainly good reading.

Hugh

 On 3 May 2015, at 22:20, Marco Brandizi brand...@ebi.ac.uk wrote:
 
 Hi all, 
 
 I'm looking for an ontology/controlled vocabulary/alike that links food 
 ingredients/substances/dishes to human diseases/conditions, like 
 intolerances, allergies, diabetes etc. 
 
 Examples of information I'd like to find coded (please assume they're true, 
 I'm no expert):
   - gluten must be avoided by people affected by coeliac disease
   - omega-3 is good for people with high cholesterol
   - sugar should be avoided by people with diabetes risk
 
 I also would like linked data about commercial food products, but even an 
 ontology without 'instances' would be useful.  
 
 So far, I've found an amount of literature (eg, [1-3]) and vocabularies like 
 AGROVOC[4], but nothing like the above.
 
 Thanks in advance for any help!
 Marco
 
 [1]
   http://fruct.org/publications/abstract14/files/Kol_21.pdf
  
 [2]  
 http://www.researchgate.net/publication/224331263_FOODS_A_Food-Oriented_Ontology-Driven_System
 [3]  http://www.hindawi.com/journals/tswj/aip/475410/
 [4] http://tinyurl.com/ndtdhwn
 
  
 -- 
 
 ===
 Marco Brandizi, PhD 
 brand...@ebi.ac.uk, http://www.marcobrandizi.info
 
 
 Functional Genomics Group - Sr Software Engineer
 
 http://www.ebi.ac.uk/microarray
 
 
 European Bioinformatics Institute (EMBL-EBI)
 European Molecular Biology Laboratory
 Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom 
 
 Office V2-26, Phone: +44 (0)1223 492 613, Fax: +44 (0)1223 492 620  
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Survey on Faceted Browsers for RDF data ?

2015-04-28 Thread Hugh Glaser

Nice.
 On 28 Apr 2015, at 08:33, Michael Brunnbauer bru...@netestate.de wrote:
 
 
 Hello Hugh,
 
 On Mon, Apr 27, 2015 at 03:24:57PM +0100, Hugh Glaser wrote:
 one probably needs to materialize data in some other more facets-friedly 
 system (e.g. solr, elastic search) to gain good performances (I might be 
 wrong but this is what my - limited - experience told me).
 Woah there!
 I would say that is exactly where a faceted browser stops.
 
 And where a faceted search starts - which usually uses a faceted browser and
 may even be called faceted browser by some people. Not sure where to draw
 lines here.
 
 Regards,
 
 Michael Brunnbauer
 
 -- 
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89 
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Enterprise information system

2015-02-27 Thread Hugh Glaser

Thank you all.

Sigh…

 On 26 Feb 2015, at 22:44, Jean-Marc Vanel jeanmarc.va...@gmail.com wrote:
 
 Hi Hugh
 
 
 2015-02-25 23:06 GMT+01:00 Hugh Glaser h...@glasers.org:
 
 But what if you start from scratch?
 
 So, the company wants to base all its stuff around Linked Data technologies, 
 starting with information about employees, what they did and are doing, 
 projects, etc.,
 ... 
 Is there a solution out of the box for all the data capture from individuals, 
 and reports, queries, etc.?
 
 There is no out of the box solution,
 if one would exist you would know about it :) , it would have the features, 
 plus the LD advantages.
 
 BUT there are people like me working toward the Semantic Enterprise 
 information system.
 
 Even if there is no need to share and publish your data, let me remind the 
 advantages over traditional development ( traditional i.e. SQL or MongoDB) :
   • Data models and data sources are available (Linked Open Data)
   • 40 implementations of graph datases to the W3C standard SPARQL
   • RDF data models are more flexible than SQL in terms of cardinality
   • simple inferences out of the box (inheritance)
   • easy to have interconnected yet independent applications by sharing 
 URI's of common objects
   • no need of Object RDF mapping, there are DSL to express business 
 logic in terms of RDF
   • Easily customized open source generic applications
 The point 7 is were progress is being made.
 There is not yet the equivalent of Ruby On Rails, Symphony, or Django for the 
 Semantic Web and SPARQL databases, but work is being done. The strategic item 
 is the input form management.
 Some frameworks exist that faciitate the creation of applications with form 
 specifications in RDF, leveraging on RDF vocabularies, and storing in RDF. I 
 have writen a review of semantic_based frameworks :
 http://svn.code.sf.net/p/eulergui/code/trunk/eulergui/html/semantic_based_apps_review.html
 of which the most promising seem :
   • semantic_forms : https://github.com/jmvanel/semantic_forms
   • Vitro https://github.com/vivo-project/Vitro
 
 -- 
 Jean-Marc Vanel
 Déductions SARL - Consulting, services, training,
 Rule-based programming, Semantic Web
 +33 (0)6 89 16 29 52
 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
 
 
 
 -- 
 Jean-Marc Vanel
 Déductions SARL - Consulting, services, training,
 Rule-based programming, Semantic Web
 http://deductions-software.com/
 +33 (0)6 89 16 29 52
 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Enterprise information system

2015-02-25 Thread Hugh Glaser

So, here’s a thing.

Usually you talk to a company about introducing Linked Data technologies to 
their existing IT infrastructure, emphasising that you can add stuff to work 
with existing systems (low risk, low cost etc.) to improve all sorts of stuff 
(silo breakdown, comprehensive dashboards, etc..)

But what if you start from scratch?

So, the company wants to base all its stuff around Linked Data technologies, 
starting with information about employees, what they did and are doing, 
projects, etc., and moving on to embrace the whole gamut.
(Sort of like a typical personnel management core, plus a load of other related 
DBs.)

Let’s say for an organisation of a few thousand, roughly none of whom are 
technical, of course.

It’s a pretty standard thing to need, and gives great value.
Is there a solution out of the box for all the data capture from individuals, 
and reports, queries, etc.?
Or would they end up with a team of developers having to build bespoke things?
Or, heaven forfend!, would they end up using conventional methods for all the 
interface management, and then have the usual LD extra system?

Any thoughts?

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Quick Poll - Results

2015-02-01 Thread Hugh Glaser

Hi,
Thanks to all the responders (although there were not an awful lot!)
I think there was a slight leaning towards putting the “home” URI in the 
subject, but only slight.

The reason I asked this, by the way, is for sameAs import (of course!).
sameAs services recommend a “canon” to be used. The canon can be set 
explicitly, but if it is not, then a decision needs to be made, since it has to 
have something. So we currently use the subject of the last sameAs triple we 
got for the bundle.
I was trying to work out if there was a better way, in terms of subject or 
object.
(We can of course do other stuff, such as the shortest, or alpha order, or from 
a priority list of domains, etc. but we are talking default behaviour.)

Best
Hugh
 On 23 Jan 2015, at 11:39, Hugh Glaser h...@glasers.org wrote:
 
 I would be really interested to know, please.
 I suggest answers by email, and I’ll report back eventually.
 
 Here goes:
 Imagine you have some of your own RDF using URIs on your base/domain.
 And you have reconciled some of your URIs against some other stuff, such as 
 dbpedia, freebase, geonames...
 Now, visualise the owl:sameAs (or skos:whatever) triples you have made to 
 represent that.
 
 Q1: Where are your URIs?
 a) subject, b) object, c) both
 Q2: Do all the triples have one of your URIs in them?
 a) yes, b) no
 
 It’s just for a choice I have about the input format for sameAs services, so 
 I thought I would ask :-)
 
 Best
 Hugh
 -- 
 Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Quick Poll

2015-01-25 Thread Hugh Glaser

Thanks Stian and Alasdair,
Just going back to the original question for a moment, I’ll try another way of
putting it for people who don’t have their own URIs.

When you want to create a set of links (of the sorts of properties you are
talking about, but only the symmetric ones), you have often started with
candidate URIs, and then found other URIs that have that relationship.
When you create the triples to record this valuable information, does the
original candidate appear as
a) subject, b) object, or c) just whatever, or d) maybe you assert 2 triples
both ways?

It’s a very qualitative and woolly question, I realise :-)
Thanks.

On 25 Jan 2015, at 10:45, Stian Soiland-Reyes
soiland-re...@cs.manchester.ac.uk wrote:

For properties you would need to use owl:equivalentProperty or
rdfs:subPropertyOf in either direction.

SKOS is very useful here as an alternative when the logic gets dirty due to
loose term definition.

As an example see this SKOS mapping from PAV onto Dublin Core Terms (which
are notoriously underspecified and vague):

http://www.jbiomedsem.com/content/4/1/37/table/T5

http://www.jbiomedsem.com/content/4/1/37 (Results)

Actual SKOS: http://purl.org/pav/mapping/dcterms

Here we found SKOS as a nice way to do the mapping independently (and
justified) as the inferences from OWL make DC Term incompatible with any
causal provenance ontology like PROV and PAV.

On 23 Jan 2015 17:59, Hugh Glaser h...@glasers.org wrote:
Thanks, and thanks for all the answers so far.

On 23 Jan 2015, at 16:23, Stian Soiland-Reyes
soiland-re...@cs.manchester.ac.uk wrote:

Not sure where you are going, but you are probably interested in
linksets - as a way to package equivalence relations - typically in a
graph of its own.
Thanks - I have a lot of linksets :-)

http://www.w3.org/TR/void/#describing-linksets

To answer the questions:

Q1: d) in subject, property, object, or multiple of those.
I don’t understand where property comes in for using owl:sameAs (or whatever)
in stating equivalence between URIs, so I’ll read that as c)

Q2: No. We already reuse existing vocabularies and external
identifiers, and there could be a nested structure which is only
indirectly connected to our URIs.
I realise that this second question wasn’t as clear as it might have been.
What I meant was concerned with the sameAs triples only (as was explicit for
Q1).
So, to elaborate, if you have decided that:
http://mysite.com/foo, http://dbpedia.org/resource/foo,
http://rdf.freebase.com/ns/m.05195d8
are aligned (the same), then what do the triples describing that look like?
In particular, do you have any that look like
http://dbpedia.org/resource/foo owl:sameAs
http://rdf.freebase.com/ns/m.05195d8 .
(or vice versa), or do you equivalent everything to a “mysite” URI?

But I guess for OpenPHACTS this doesn’t apply, since I understand from what
you say below that you never mint a URI of your own where you know there is
an external one.
Although it does beg the question, perhaps, of what you do when you alter
find equivalences.

Best
Hugh

http://example.com/our/own pav:authoredBy
http://orcid.org/-0001-9842-9718 .
http://orcid.org/-0001-9842-9718 foaf:name Stian Soiland-Reyes .

It's true you would also get the second triple from ORCID (remember
content negotiation!), but it's very useful for presentation and query
purposes to include these directly, e.g. in a VOID file.

In most cases we do however not have any our URIs except for
provenance statements. But perhaps Open PHACTS is special in that
regard as we are integrating other people's datasets and shouldn't be
making up any data of our own. :)

Perhaps also of interest:

In the Open PHACTS project http://www.openphacts.org/ we use this
extensively - we let the end-user choose which linksets of weak and
strong equivalences they want to apply when a query is made. Such a
collection of linksets and their application we call a lense - so
you apply lenses to merge/unmerge your data. See
http://www.slideshare.net/alasdair_gray/gray-compcoref

In our identity mapping service
http://www.openphacts.org/about-open-phacts/how-does-open-phacts-work/identities-within-open-phacts
we pass in several parameters - the minimal is the URI to map.

See http://openphacts.cs.man.ac.uk:9092/QueryExpander/mapURI and use
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_2443 as
the URI.

We also have a piece of magic that can rewrite a SPARQL query to use
the mapped URIs for a given variable (adding FILTER statements) try -
http://openphacts.cs.man.ac.uk:9092/QueryExpander/

On 23 January 2015 at 11:39, Hugh Glaser h...@glasers.org wrote:
I would be really interested to know, please.
I suggest answers by email, and I’ll report back eventually.

Here goes:
Imagine you have some of your own RDF using URIs on your

Re: Quick Poll

2015-01-23 Thread Hugh Glaser

Thanks, and thanks for all the answers so far.

On 23 Jan 2015, at 16:23, Stian Soiland-Reyes
soiland-re...@cs.manchester.ac.uk wrote:

Not sure where you are going, but you are probably interested in
linksets - as a way to package equivalence relations - typically in a
graph of its own.
Thanks - I have a lot of linksets :-)

http://www.w3.org/TR/void/#describing-linksets

To answer the questions:

But I guess for OpenPHACTS this doesn’t apply, since I understand from what you
say below that you never mint a URI of your own where you know there is an
external one.
Although it does beg the question, perhaps, of what you do when you alter find
equivalences.

Best
Hugh

http://example.com/our/own pav:authoredBy
http://orcid.org/-0001-9842-9718 .
http://orcid.org/-0001-9842-9718 foaf:name Stian Soiland-Reyes .

It's true you would also get the second triple from ORCID (remember
content negotiation!), but it's very useful for presentation and query
purposes to include these directly, e.g. in a VOID file.

Perhaps also of interest:

In our identity mapping service
http://www.openphacts.org/about-open-phacts/how-does-open-phacts-work/identities-within-open-phacts
we pass in several parameters - the minimal is the URI to map.

See http://openphacts.cs.man.ac.uk:9092/QueryExpander/mapURI and use
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_2443 as
the URI.

We also have a piece of magic that can rewrite a SPARQL query to use
the mapped URIs for a given variable (adding FILTER statements) try -
http://openphacts.cs.man.ac.uk:9092/QueryExpander/

On 23 January 2015 at 11:39, Hugh Glaser h...@glasers.org wrote:
I would be really interested to know, please.
I suggest answers by email, and I’ll report back eventually.

Here goes:
Imagine you have some of your own RDF using URIs on your base/domain.
And you have reconciled some of your URIs against some other stuff, such as
dbpedia, freebase, geonames...
Now, visualise the owl:sameAs (or skos:whatever) triples you have made to
represent that.

Q1: Where are your URIs?
a) subject, b) object, c) both
Q2: Do all the triples have one of your URIs in them?
a) yes, b) no

It’s just for a choice I have about the input format for sameAs services, so
I thought I would ask :-)

Best
Hugh
--
Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

--
Stian Soiland-Reyes, eScience Lab
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/http://orcid.org/-0001-9842-9718

--
Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Survey on Faceted Browsers for RDF data ?

2015-01-23 Thread Hugh Glaser


 On 23 Jan 2015, at 11:42, Christian Morbidoni christian.morbid...@gmail.com 
 wrote:
 
 Hi all,
 
 I'm doing some research to get a comprehensive (as much as possible) view on 
 what faceted browsers are out there today for RDF data and what features they 
 offer.
 I collected a lot of links to papers, web sites and demos... but I found very 
 few comparison/survey papers about this specific topic. [1] contains a 
 section on faceted browsers, but not so exhaustive, [2] mentions some 
 interesting systems but is a bit outdated.
 
 So, my questions are:
 1) Do someone know a better paper/resource I can look at for a survey?
 2) Is someone currently working on a survey like this?
 3) Does someone have notable additions to my list? (pasted at the end of the 
 mail)
I think http://www.dotac.info/explorer/ might fit your definition, as might the 
older http://www.rkbexplorer.com

Best
Hugh
 At this stage I'm interested in both: automatic and configuration based 
 browsers, free and commercial products, hierarchical and flat facets, 
 simple and pivoting.
 
 thank you in advance
 
 best,
 
 Christian
 
 
 [1] Survey of linked data based exploration systems (2014)
 http://ceur-ws.org/Vol-1279/iesd14_8.pdf 
 
 [2] From Keyword Search to Exploration: How Result Visualization Aids 
 Discovery on the Web
 http://hcil2.cs.umd.edu/trs/2008-06/2008-06.pdf 
 
 
 
 My current, randomly ordered list:
 
 tFacets - http://www.visualdataweb.org/tfacet.php
 
 Exhibit (3) + Babel 
 
 Virtuoso built-in search + faceted browser
 
 RDF-faceted-browser -Blog post: 
 https://shr.wordpress.com/2012/02/08/a-faceted-browser-over-sparql-endpoints/
  
 
 Facete -http://aksw.org/Projects/Facete.html 
 
 PivotBrowser - http://www.sindicetech.com/pivotbrowser.html 
 
 Rhizomik - http://rhizomik.net/html/ 
 
 /facets
 Paper: http://homepages.cwi.nl/~media/publications/iswc06.pdf 
 
 gFacets - Paper: 
 http://www.sfb716.uni-stuttgart.de/uploads/tx_vispublications/eswc10-heimErtlZiegler.pdf
  
 
 Flamenco 
 
 Nested Facets Browser - Demo: 
 http://people.csail.mit.edu/dfhuynh/projects/nfb/
 
 Humboldt
 
 mSpace
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Quick Poll

2015-01-23 Thread Hugh Glaser

I would be really interested to know, please.
I suggest answers by email, and I’ll report back eventually.

Here goes:
Imagine you have some of your own RDF using URIs on your base/domain.
And you have reconciled some of your URIs against some other stuff, such as 
dbpedia, freebase, geonames...
Now, visualise the owl:sameAs (or skos:whatever) triples you have made to 
represent that.

Q1: Where are your URIs?
a) subject, b) object, c) both
Q2: Do all the triples have one of your URIs in them?
a) yes, b) no

It’s just for a choice I have about the input format for sameAs services, so I 
thought I would ask :-)

Best
Hugh
-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Semantic Web Dogfood

2015-01-12 Thread Hugh Glaser

Does anyone have the data?
I (or someone else) could at least stuff it in a browsable store if someone can 
get it to me?
It is all rather an embarrassment now, I would say - maybe it should be 
switched off, if we can’t access or update it?
Best
Hugh
 On 21 Dec 2014, at 00:54, Andreas Harth andr...@harth.org wrote:
 
 Hi,
 
 I have a similar problem when accessing RDF files (e.g., [1]).
 ad...@data.semanticweb.org still bounces.
 
 It would be great to get access to these files again.
 
 Cheers,
 Andreas.
 
 [1] http://data.semanticweb.org/workshop/cold/2010/PC/rdf
 
 ***
 
 I'm getting error messages when accessing RDF files of workshops and
 conferences.
 
 $ wget http://data.semanticweb.org/workshop/cold/2010/PC/rdf;
 $ more rdf
 br /
 bFatal error/b:  Call to a member function writeRdfToString() on a
 non-object in
 b/var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module/b on
 line
 b174/bbr /
 $
 
 On 2012-03-28 07:56, Hugh Glaser wrote:
 Sorry, I have been here before, and can't remember who to email 
 (ad...@data.semanticweb.org bounces).
 And I know some brave people were trying to sort it out.
 Anyway:
 
 Hi there,
 Sorry to report, but it seems things are a bit broken.
 Eg
 Resource URI on the dog food server: 
 http://data.semanticweb.org/person/dan-brickley
 Email Hash: 748934f32135cfcf6f8c06e253c53442721e15e7
 
 Eg transcript:
 hg@cohen [2012-03-28T15:43:32] acm.rkbexplorer.com/acquisition rdfget 
 http://data.semanticweb.org/person/libby-miller
 HTTP/1.1 303 See Other
 Date: Wed, 28 Mar 2012 16:15:23 GMT
 Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 
 mod_ssl/2.2.3 OpenSSL/0.9.8c
 X-Powered-By: PHP/5.2.0-8+etch16
 Set-Cookie: 
 SESS002fbfc63133341c13dbc400422ca44a=40e15aa64d8febbf4530d9d3bd778487; 
 expires=Fri, 20 Apr 2012 19:48:43 GMT; path=/; domain=.data.semanticweb.org
 Expires: Sun, 19 Nov 1978 05:00:00 GMT
 Last-Modified: Wed, 28 Mar 2012 16:15:23 GMT
 Cache-Control: store, no-cache, must-revalidate
 Cache-Control: post-check=0, pre-check=0
 Location: http://data.semanticweb.org/person/libby-miller/rdf
 Access-Control-Allow-Origin: *
 Transfer-Encoding: chunked
 Content-Type: text/html; charset=utf-8
 
 HTTP/1.1 200 OK
 Date: Wed, 28 Mar 2012 16:15:23 GMT
 Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 
 mod_ssl/2.2.3 OpenSSL/0.9.8c
 X-Powered-By: PHP/5.2.0-8+etch16
 Set-Cookie: 
 SESS002fbfc63133341c13dbc400422ca44a=a6cd8a43718d688ec6192079abe7a400; 
 expires=Fri, 20 Apr 2012 19:48:43 GMT; path=/; domain=.data.semanticweb.org
 Expires: Sun, 19 Nov 1978 05:00:00 GMT
 Last-Modified: Wed, 28 Mar 2012 16:15:23 GMT
 Cache-Control: store, no-cache, must-revalidate
 Cache-Control: post-check=0, pre-check=0
 Access-Control-Allow-Origin: *
 Content-Length: 186
 Content-Type: application/rdf+xml; charset=utf-8
 
 br /
 bFatal error/b:  Call to a member function writeRdfToString() on a 
 non-object in 
 b/var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module/b on line 
 b171/bbr /
 
 
 It only gives the 200 response after a very looong time.
 Best
 Hugh
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Microsoft OLE

2014-12-15 Thread Hugh Glaser

Thanks Paul,
 On 15 Dec 2014, at 19:07, Paul Houle ontolo...@gmail.com wrote:
 
 Most Windows programmers would instantiate OLE objects in the applications 
 and query them to get results;  
Ah, the first problem - I’m not a Windows programmer :-)
In fact, I want to access OLE-published stuff without any need to have 
knowledge of Windows at all.
http resolution to IIS or Apache running on the Windows machine seemed like a 
good choice.
 commonly people write XML or JSON APIs,  but writing RDF wouldn't be too 
 different.
 
 The next step up is to have a theory that converts OLE data structures to and 
 from RDF either in general or in a specific case with help from a schema.  
 Microsoft invested a lot in making SOAP work well with OLE,  so you might do 
 best with a SOAP to RDF mapping.  
So yes - a service that did some mapping from the retrieved OLE data structure 
to RDF; and a general one was what I was thinking of.

The incoming URI would be interpretable as an OLE data object (I guess with 
some server config), which then got fetched and converted to RDF.
In fact, it seems an obvious way of exposing Word docs, Excel spreadsheets and 
even Access DBs live, but there is probably some stuff I don’t understand that 
means it is crazy.

I suspect the silence (except you and Barry) means that this isn’t something 
anyone has done, at least yet.

Best
Hugh
 
 This caught my eye though,  because I've been looking at the relationships 
 between RDF and OMG,  a distant outpost of standardization.  You can find 
 competitive products on the market,  one based on UML and another based on 
 RDF, OWL, SKOS and so forth.  The products do more or less the same thing,  
 but described in such different language and vocabulary that it's hard to 
 believe that they compete for any sales.
 
 There is lots of interesting stuff there,  but the big theme is ISO Common 
 logic,  which adds higher-arity predicates and a foundation for inference 
 that people will actually want to use.  It's not hard to convince the 
 enterprise that first-order-logic is ready for the big time because banks 
 and larger corporations all use FOL-based systems on production rules to 
 automate decisions.
 
 
 
 On Sat, Dec 13, 2014 at 7:30 AM, Hugh Glaser h...@glasers.org wrote:
 Anyone know of any work around exposing OLE linked objects as RDF?
 I could envisage a proxy that gave me URIs and metadata for embedded objects.
 
 Is that even a sensible question? :-)
 
 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 
 
 
 
 -- 
 Paul Houle
 Expert on Freebase, DBpedia, Hadoop and RDF
 (607) 539 6254paul.houle on Skype   ontolo...@gmail.com
 http://legalentityidentifier.info/lei/lookup

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Microsoft OLE

2014-12-13 Thread Hugh Glaser

Anyone know of any work around exposing OLE linked objects as RDF?
I could envisage a proxy that gave me URIs and metadata for embedded objects.

Is that even a sensible question? :-)

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: How to model valid time of resource properties?

2014-10-16 Thread Hugh Glaser

Hi,
 On 15 Oct 2014, at 23:02, John Walker john.wal...@semaku.com wrote:
 
 Hi
  
 On October 15, 2014 at 2:59 PM Kingsley Idehen kide...@openlinksw.com 
 wrote: 
 
 On 10/15/14 8:36 AM, Frans Knibbe | Geodan wrote:
...
 Personally I would not use this approach for foaf:age and foaf:based_near as 
 these capture a certain snapshot/state of (the information about) a resource. 
 Having some representation where the foaf:age triple could be entailed could 
 lead to having multiple conflicting statements with no easy way to find the 
 truth.
 
 Having a clear understanding of the questions you want to ask of your 
 knowledge base should help steer modelling choices.
This undoubtedly true, and very important - is the modelling fit for purpose?
Proper engineering.

 In the cases known to me that require the recording of history of 
 resources, all resource properties (except for the identifier) are things 
 that can change in time. If this pattern would be applied, it would have to 
 be applied to all properties, leading to vocabularies exploding and 
 becoming unwieldy, as described in the Discussion paragraph. 
 
 I think that the desire to annotate statements with things like valid time 
 is very common. Wouldn't it be funny if the best solution to a such a 
 common and relatively straightforward requirement is to create large custom 
 vocabularies?
 If you want to be able to capture historical states of a resource, using 
 named graphs to provide that context would be my first thought.
However, there is a downside to this.
If all that is happening is that Frans is gathering his own data into a store, 
and then using that data for some understood application of his, then this will 
be fine.
Then he knows exactly the structure to impose on his RDF using named Graphs.

But this is Linked Open Data, right?
So what happens about use by other people?
Or if Frans wants to build other queries over the same data?
If he hasn’t foreseen the other structure, and therefore ensured that the 
required Named Graphs exist, then it won;t be possible to make the statements 
required about the RDF.

The problem is that in choosing the Named Graph structure, the data publisher 
makes very deep assumptions and even decisions about how the dataset will be 
used.
This is not really good practice in an Open world - in fact, one of the claimed 
advantages of Semantic Web technologies is that such assumptions (such as the 
choice of tables in a typical database) are no longer required!

I’m not saying that Named Graphs aren’t useful and often appropriate, but 
choosing to use Named Graphs can really make the data hard to consume.
And if they are used, the choice of how really needs to be considered very much 
with the modelling.
(This is particularly important in the absence of any ability to nest Named 
Graphs.)

Cheers
 
 If that resource consists of just one triple, then RDF reification of that 
 statement would also work as Kingsley mentions.
 
 
 Regards, 
 Frans
 
 Frans, 
 
 How about reified RDF statements? 
 
 I think discounting RDF reification vocabulary is yet another act of 
 premature optimization, in regards to the Semantic Web meme :) 
 
 Some examples: 
 
 [1] http://bit.ly/utterances-since-sept-11-2014 -- List of statements made 
 from a point in time. 
 [2] http://linkeddata.uriburner.com/c/8EPG33 -- About Connotation 
 
 -- 
 Regards,
 
 Kingsley Idehen   
 Founder  CEO 
 OpenLink Software 
 Company Web: 
 http://www.openlinksw.com
 
 Personal Weblog 1: 
 http://kidehen.blogspot.com
 
 Personal Weblog 2: 
 http://www.openlinksw.com/blog/~kidehen
 
 Twitter Profile: 
 https://twitter.com/kidehen
 
 Google+ Profile: 
 https://plus.google.com/+KingsleyIdehen/about
 
 LinkedIn Profile: 
 http://www.linkedin.com/in/kidehen
 
 Personal WebID: 
 http://kingsley.idehen.net/dataspace/person/kidehen#this
 
  

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser


 On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote:
 
...
 Basic metadata is good. Publishing datasets with the paper is good. Having
 typed links in the paper is good. But I would not demand to go further.
 
+1
++1 - the dataset publishing can include the workflow, tools etc, and metadata 
about that.


-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser

Hi Alexander,
 On 5 Oct 2014, at 15:57, Alexander Garcia Castro alexgarc...@gmail.com 
 wrote:
 
 metadata, sure. it is a must. BUT good and thought for the web of data. not 
 designed for paper based collections. From my experience it is not so much 
 about representing everything from the paper as triplets. there will be 
 statements that won't be representable, also, such approach may not be 
 efficient. 
 
 why don't we just go a little bit further up from the lowest hanging fruit 
 and start talking about self describing documents? well annotated documents 
 with well structured metadata that are interoperable. this is easy, 
 achievable, requires little tooling, does not put any burden on the author, 
 delivers interoperability beyond just simple hyperlinks, it is much more 
 elegant than adhering to HTML, etc.   
You lost me here.
Who or what does the well annotated documents and well structured metadata”? 
If it isn’t any burden for the authors.
Easy and little tooling - I wonder what methods and tools you have in mind?

These have proved to be hard problems - otherwise we wouldn’t be having this 
painful discussion.

Best
Hugh
 
 On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote:
 
  On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote:
 
 ...
  Basic metadata is good. Publishing datasets with the paper is good. Having
  typed links in the paper is good. But I would not demand to go further.
 
 +1
 ++1 - the dataset publishing can include the workflow, tools etc, and 
 metadata about that.
 
 
 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 
 
 
 
 
 -- 
 Alexander Garcia
 http://www.alexandergarcia.name/
 http://www.usefilm.com/photographer/75943.html
 http://www.linkedin.com/in/alexgarciac
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser

 is the math
 part. And this is the saddest story of all: MathML has been around for a
 long time, and it is, actually, part of ePUB as well, but authoring
 proper mathematics is the toughest with the tools out there. Sigh...
 
 P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that
 space...
 
 
 [1] https://atlas.oreilly.com
 [2] http://metrodigi.com
 [3] https://www.inkling.com
 
 
 
 On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote:
 
 As is often the case on the Internet, this discussion gives me a
 terrible sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW
 conference series, of which I am part) first tried to require HTML for
 the WWW conference paper submissions, then was forced to make it
 optional because authors simply refused to write in HTML, and eventually
 dropped it because NO ONE (ok, very very few hardy souls) actually sent
 in HTML submissions.
 Our conclusion at the time was that the tools simply were not there,
 and it was too much of a PITA for people to produce HTML instead of
 using the text editors they are used to. Things don't seem to have
 changed much since.
 And this is simply looking at formatting the pages, never mind the
 whole issue of actually producing hypertext (ie., turning the article's
 text into linked hypertext), beyond the easily automated ones (e.g.,
 links to authors, references to papers, etc..). Producing good
 hypertext, and consuming it, is much harder than writing plain text. And
 most authors are not trained in producing this kind of content. Making
 this actually semantic in some sense is still, in my view, a research
 topic, not a routine reality.
 Until we have robust tools that make it as easy for authors to write
 papers with the advantages afforded by PDF, without its shortcomings, I
 do not see this changing.
 I would love to see experiments (e.g., certain workshops) to try it out
 before making this a requirement for whole conferences.
 Bernadette's suggestions are a good step in this direction, although I
 suspect it is going to be harder than it looks (again, I'd love to be
 proven wrong ;-)).
 Just my personal 2c
 Daniel
 
 
 On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider
 pfpschnei...@gmail.com wrote:
 
 In my opinion PDF is currently the clear winner over HTML in both the
 ability to produce readable documents and the ability to display
 readable documents in the way that the author wants them to display.
 In the past I have tried various means to produce good-looking HTML and
 I've always gone back to a setup that produces PDF.  If a document is
 available in both HTML and PDF I almost always choose to view it in
 PDF.  This is the case even though I have particular preferences in how
 I view documents.
 
 If someone wants to change the format of conference submissions, then
 they are going to have to cater to the preferences of authors, like me,
 and reviewers, like me.  If someone wants to change the format of
 conference papers, then they are going to have to cater to the
 preferences of authors, like me, attendees, like me, and readers, like
 me.
 
 I'm all for *better* methods for preparing, submitting, reviewing, and
 publishing conference (and journal) papers.  So go ahead, create one.
 But just saying that HTML is better than PDF in some dimension, even if
 it were true, doesn't mean that HTML is better than PDF for this
 purpose.
 
 So I would say that the semantic web community is saying that there
 are better formats and tools for creating, reviewing, and publishing
 scientific papers than HTML and tools that create and view HTML.  If
 there weren't these better ways then an HTML-based solution might be
 tenable, but why use a worse solution when a better one is available?
 
 peter
 
 
 
 
 
 On 10/03/2014 08:02 AM, Phillip Lord wrote:
 [...]
 
 As it stands, the only statement that the semantic web community are
 making is that web formats are too poor for scientific usage.
 [...]
 
 Phil
 
 
 Daniel Schwabe  Dept. de Informatica, PUC-Rio
 Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225
 Fax: +55-21-3527 1530   Rio de Janeiro, RJ 22453-900, Brasil
 http://www.inf.puc-rio.br/~dschwabe
 
 
 
 
 
 
 
 
 Ivan Herman, W3C 
 Digital Publishing Activity Lead
 Home: http://www.w3.org/People/Ivan/
 mobile: +31-641044153
 GPG: 0x343F1A3D
 WebID: http://www.ivan-herman.net/foaf#me
 
 
 
 
 
 
 
 
 
 
 Ivan Herman, W3C 
 Digital Publishing Activity Lead
 Home: http://www.w3.org/People/Ivan/
 mobile: +31-641044153
 GPG: 0x343F1A3D
 WebID: http://www.ivan-herman.net/foaf#me
 
 
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Hugh Glaser

 sense 
 is still, in my view, a research topic, not a routine reality.
 Until we have robust tools that make it as easy for authors to write papers 
 with the advantages afforded by PDF, without its shortcomings, I do not see 
 this changing.
 I would love to see experiments (e.g., certain workshops) to try it out 
 before making this a requirement for whole conferences.
 Bernadette's suggestions are a good step in this direction, although I 
 suspect it is going to be harder than it looks (again, I'd love to be proven 
 wrong ;-)).
 Just my personal 2c
 Daniel
 
 
 On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider 
 pfpschnei...@gmail.com wrote:
 
 In my opinion PDF is currently the clear winner over HTML in both the 
 ability to produce readable documents and the ability to display readable 
 documents in the way that the author wants them to display.  In the past I 
 have tried various means to produce good-looking HTML and I've always gone 
 back to a setup that produces PDF.  If a document is available in both HTML 
 and PDF I almost always choose to view it in PDF.  This is the case even 
 though I have particular preferences in how I view documents.
 
 If someone wants to change the format of conference submissions, then they 
 are going to have to cater to the preferences of authors, like me, and 
 reviewers, like me.  If someone wants to change the format of conference 
 papers, then they are going to have to cater to the preferences of authors, 
 like me, attendees, like me, and readers, like me.
 
 I'm all for *better* methods for preparing, submitting, reviewing, and 
 publishing conference (and journal) papers.  So go ahead, create one.  But 
 just saying that HTML is better than PDF in some dimension, even if it were 
 true, doesn't mean that HTML is better than PDF for this purpose.
 
 So I would say that the semantic web community is saying that there are 
 better formats and tools for creating, reviewing, and publishing scientific 
 papers than HTML and tools that create and view HTML.  If there weren't 
 these better ways then an HTML-based solution might be tenable, but why use 
 a worse solution when a better one is available?
 
 peter
 
 
 
 
 
 On 10/03/2014 08:02 AM, Phillip Lord wrote:
 [...]
 
 As it stands, the only statement that the semantic web community are
 making is that web formats are too poor for scientific usage.
 [...]
 
 Phil
 
 
 Daniel Schwabe  Dept. de Informatica, PUC-Rio
 Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225
 Fax: +55-21-3527 1530   Rio de Janeiro, RJ 22453-900, Brasil
 http://www.inf.puc-rio.br/~dschwabe
 
 
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Searching for references to a certain URI

2014-09-25 Thread Hugh Glaser

Me?
Well, obviously I use http://sameas.org/ :-)

Eg
http://sameas.org/?uri=http%3A%2F%2Fd-nb.info%2Fgnd%2F120273152

Best
Hugh
 On 25 Sep 2014, at 09:59, Neubert Joachim j.neub...@zbw.eu wrote:
 
 What strategies do you use to find all references to a certain URI, 
 e.g.http://d-nb.info/gnd/120273152, on the (semantic) web?
 
  
 
 I used Sindice for this, but sadly the service is discontinued, and the data 
 becomes more and more outdated. Google 
 link:/info:https://en.wikipedia.org/wiki/Horst_Siebert) are excluded by 
 rel=nofollow links, and pure RDF links (e.g. from dbpedia) don’t show up at 
 all.
 
  
 
 Cheers, Joachim
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Position Advertised - Web Developer

2014-09-19 Thread Hugh Glaser

We are keen to have someone who can do Linked Data technologies, although it 
would be hard to make it a requirement, as it would restrict the market too 
much.

http://www.timewisejobs.co.uk/job/6726/webmaster-developer-part-time-/

Best
Hugh

--
Hugh Glaser
Partner
Ethos Valuable Outcomes
+44 23 8061 5652  |  hugh.gla...@ethosvo.org  |  Skype: hugh_glaser   |  
www.ethosvo.org
Solving complex problems through collaboration, trust and moderation

Re: testable properties of repositories that could be used to rate them

2014-09-15 Thread Hugh Glaser

Thanks Stuart,

 On 14 Sep 2014, at 22:49, Stuart Yeates stuart.yea...@vuw.ac.nz wrote:
 
 
 On 15/09/14 09:25, Hugh Glaser wrote:
 
 I've greyed out the 'everything' requirement, since I'm not sure that 
 'everything' is script-testable.
 Yes, I was puzzling over that (how it could be made scriptable).
 Certainly quite a lot of the other things in the list make assumptions about 
 repository identifiers being available - otherwise how can you get started, 
 or ask if dc:title is used, for example?
 So how do you find the repository identifiers in a scriptable manner?
 Let’s assume that there is no OAI-PMH support, for example.
 
 In the community in which I am working (and which this list grew out of) 
 'repository' is effectively defined as a document-full website with a working 
 OAI-PMH feed and the backing of a long-lived institution or organisation. 
 Without an OAI-PMH feed, the answer is 'get an OAI-PMH feed.’
Seems sensible to me!
 
 So for this, maybe I could move it to after number 3 (where we know there is 
 RDF) and then I could list the predicates that must have URIs (rather than 
 strings)?
 
 I've grey'ed out the content negotiation requirements since I'm not aware 
 that any repositories or prototypes that try and do this (I'm happy to be 
 corrected).
 The standard ePrints 3 software supports content negotiation - e.g. 
 http://oro.open.ac.uk/id/eprint/40795
 
 I've un-greyed this item. (I confess that most of the input into the document 
 so-far has come from the dspace world)
Great.
 
 I've recast most of this in the document. I've not gone for exact reflection 
 of what the design doc says, but script-testable easily-understandable items 
 that encourage useful steps towards best practice.
I’ll make some more suggestions to try to capture a crucial thing - that 
authors are identified by URI.
Best
Hugh
 
 cheers
 stuart
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: testable properties of repositories that could be used to rate them

2014-09-14 Thread Hugh Glaser

Hi.
 On 14 Sep 2014, at 22:06, Stuart Yeates stuart.yea...@vuw.ac.nz wrote:
 
 The initial aim of this was to counter an apparently arbitrary repository 
 ranking algorithm (which I won't deign link to) with a set of web standards 
 that we (repository developers and maintainers) can collectively work 
 towards, with an emphasis on breadth of different standards that could be 
 applied.
Sure.
 
 --
 
 I've greyed out the 'everything' requirement, since I'm not sure that 
 'everything' is script-testable.
Yes, I was puzzling over that (how it could be made scriptable).
Certainly quite a lot of the other things in the list make assumptions about 
repository identifiers being available - otherwise how can you get started, or 
ask if dc:title is used, for example?
So how do you find the repository identifiers in a scriptable manner?
Let’s assume that there is no OAI-PMH support, for example.
So for this, maybe I could move it to after number 3 (where we know there is 
RDF) and then I could list the predicates that must have URIs (rather than 
strings)?
 
 I've grey'ed out the content negotiation requirements since I'm not aware 
 that any repositories or prototypes that try and do this (I'm happy to be 
 corrected).
That actually seems rather a strange statement - if you had said that there was 
no interest in it, then that would be fine.
But surely your rating should list anything useful that a repository might 
offer?
Is there nothing else in your list that is not currently supported? Is RDFa 
supported anywhere?
But fear not, there are many examples in the wild!
The standard ePrints 3 software supports content negotiation - e.g. 
http://oro.open.ac.uk/id/eprint/40795
I see it does rdf+xml and text/n3 - I haven’t tried any others.
 
 I've found a better URL for the RDFa requirement.
Nice.

 
 cheers
Cheers
Hugh
 stuart
 
 
 
 On 13/09/14 22:58, Hugh Glaser wrote:
 The messages below should make sense.
 Stuart is trying to make a doc for rating repositories.
 
 I’ve added some stuff about Linked Data:
 From http://www.w3.org/DesignIssues/LinkedData.html (Linked Data Principles)
  Everything has a URI - publications, documents, people, organisations, 
 categories, ...
  These URIs are HTTP or HTTPS
  When RDF is requested, the URIs return RDF metadata
  RDF/XML supported
  N3 supported
  Turtle supported
  JSON-LD supported
  There are URIs that are not from this repository
  There are URIs from other repositories
  There is a SPARQL endpoint
  RDFa is embedded in the HTML
 
 Is there somewhere I could have taken this from that would be suitable?
 Anyone care to contribute?
 It seems like it is a really useful thing to have (modulo a bit of 
 specialisation for any particular domain).
 (I didn’t want to go over the top on formats, by the way.)
 Cheers
 
 Begin forwarded message:
 
 From: Stuart Yeates stuart.yea...@vuw.ac.nz
 Subject: RE: testable properties of repositories that could be used to rate 
 them
 Date: 13 September 2014 10:31:36 BST
 To: Hugh Glaser h...@ecs.soton.ac.uk
 Cc: jisc-repositor...@jiscmail.ac.uk jisc-repositor...@jiscmail.ac.uk
 
 I notice there is nothing about Linked Data and Semantic Web - would it be 
 sensible to have something on this?
 
 If there's something that's recommended by some standard / recommendation 
 and is script-testable, you're welcome to add it.
 
 So for example does it provide RDF at all?
 
 It has a question based on  http://validator.w3.org/feed/  which validates 
 RSS, which in turn is either RDF (v1.0) or can trivially be converted to it 
 (v2.0/atom). I've added a note that this is RSS.
 
 cheers
 stuart
 
 Begin forwarded message:
 
 From: Hugh Glaser h...@ecs.soton.ac.uk
 Subject: Re: testable properties of repositories that could be used to rate 
 them
 Date: 12 September 2014 14:05:34 BST
 To: jisc-repositor...@jiscmail.ac.uk
 Reply-To: Hugh Glaser h...@ecs.soton.ac.uk
 
 Very interesting (and impressive!)
 
 I notice there is nothing about Linked Data and Semantic Web - would it be 
 sensible to have something on this?
 Well, actually there is Semantic Web:- right up at the start there is a 
 Cool URI reference, which is the the W3C Cool URIs for the Semantic Web” 
 note!
 
 Perhaps there should be a section on this - maybe starting with with 
 whether it is 5* Linked Data.
 http://en.wikipedia.org/wiki/Linked_data
 http://www.w3.org/DesignIssues/LinkedData.html
 
 But it probably useful to unpick some of this in a less structured way.
 So for example does it provide RDF at all?
 Formats? RDF, N3, JSON-LD…
 
 Best
 Hugh
 On 12 Sep 2014, at 03:29, Stuart Yeates stuart.yea...@vuw.ac.nz wrote:
 
 A couple of us have drawn up a bit of a list of script-testable properties 
 of repositories that could be used to rate them. We’re tried to both avoid 
 arbitrary judgements and the implication that every repository should meet 
 every item:
 
 https://docs.google.com/document/d

Fwd: testable properties of repositories that could be used to rate them

2014-09-13 Thread Hugh Glaser

The messages below should make sense.
Stuart is trying to make a doc for rating repositories.

I’ve added some stuff about Linked Data:
From http://www.w3.org/DesignIssues/LinkedData.html (Linked Data Principles)
Everything has a URI - publications, documents, people, organisations, 
categories, ...
These URIs are HTTP or HTTPS
When RDF is requested, the URIs return RDF metadata
RDF/XML supported
N3 supported
Turtle supported
JSON-LD supported
There are URIs that are not from this repository
There are URIs from other repositories
There is a SPARQL endpoint
RDFa is embedded in the HTML

Is there somewhere I could have taken this from that would be suitable?
Anyone care to contribute?
It seems like it is a really useful thing to have (modulo a bit of 
specialisation for any particular domain).
(I didn’t want to go over the top on formats, by the way.)
Cheers

 Begin forwarded message:
 
 From: Stuart Yeates stuart.yea...@vuw.ac.nz
 Subject: RE: testable properties of repositories that could be used to rate 
 them
 Date: 13 September 2014 10:31:36 BST
 To: Hugh Glaser h...@ecs.soton.ac.uk
 Cc: jisc-repositor...@jiscmail.ac.uk jisc-repositor...@jiscmail.ac.uk
 
 I notice there is nothing about Linked Data and Semantic Web - would it be 
 sensible to have something on this?
 
 If there's something that's recommended by some standard / recommendation and 
 is script-testable, you're welcome to add it.
 
 So for example does it provide RDF at all?
 
 It has a question based on  http://validator.w3.org/feed/  which validates 
 RSS, which in turn is either RDF (v1.0) or can trivially be converted to it 
 (v2.0/atom). I've added a note that this is RSS.
 
 cheers
 stuart

 Begin forwarded message:
 
 From: Hugh Glaser h...@ecs.soton.ac.uk
 Subject: Re: testable properties of repositories that could be used to rate 
 them
 Date: 12 September 2014 14:05:34 BST
 To: jisc-repositor...@jiscmail.ac.uk
 Reply-To: Hugh Glaser h...@ecs.soton.ac.uk
 
 Very interesting (and impressive!)
 
 I notice there is nothing about Linked Data and Semantic Web - would it be 
 sensible to have something on this?
 Well, actually there is Semantic Web:- right up at the start there is a Cool 
 URI reference, which is the the W3C Cool URIs for the Semantic Web” note!
 
 Perhaps there should be a section on this - maybe starting with with whether 
 it is 5* Linked Data.
 http://en.wikipedia.org/wiki/Linked_data
 http://www.w3.org/DesignIssues/LinkedData.html
 
 But it probably useful to unpick some of this in a less structured way.
 So for example does it provide RDF at all?
 Formats? RDF, N3, JSON-LD…
 
 Best
 Hugh
 On 12 Sep 2014, at 03:29, Stuart Yeates stuart.yea...@vuw.ac.nz wrote:
 
 A couple of us have drawn up a bit of a list of script-testable properties 
 of repositories that could be used to rate them. We’re tried to both avoid 
 arbitrary judgements and the implication that every repository should meet 
 every item:
 
 https://docs.google.com/document/d/1sEDqPS2bfAcbunpjNzHwB56f5CY1SxJunSBLFtom3IM/edit
 
 cheers
 stuart

Re: URIs within URIs

2014-09-04 Thread Hugh Glaser

Nice.
That enumerates the choices, I think.
In a world where the services are themselves being used as LD URIs (because 
everything is a LD URI, of course!) there is the orthogonal question of whether 
the URI needs to be URLEncoded.
And in fact I think all the prefixing patterns fail that test?
If you are still updating patterns, you might like to add a note?
Cheers

 On 28 Aug 2014, at 15:12, Leigh Dodds le...@ldodds.com wrote:
 
 Hi,
 
 I documented all the variations of this form of URI construction I was
 aware of in the Rebased URI pattern:
 
 http://patterns.dataincubator.org/book/rebased-uri.html
 
 This covers generating one URI from another. What that new URI returns
 is a separate concern.
 
 Cheers,
 
 L.
 
 On Fri, Aug 22, 2014 at 4:56 PM, Bill Roberts b...@swirrl.com wrote:
 Hi Luca
 
 We certainly find a need for that kind of feature (as do many other linked 
 data publishers) and our choice in our PublishMyData platform has been the 
 URL pattern {domain}/resource?uri={url-encoded external URI} to expose info 
 in our databases about URIs in other domains.
 
 If there was a standard URL route for this scenario, we'd be glad to 
 implement it
 
 Best regards
 
 Bill
 
 On 22 Aug 2014, at 16:44, Luca Matteis lmatt...@gmail.com wrote:
 
 Dear LOD community,
 
 I'm wondering whether there has been any research regarding the idea
 of having URIs contain an actual URI, that would then resolve
 information about what the linked dataset states about the input URI.
 
 Example:
 
 http://foo.com/alice - returns data about what foo.com has regarding alice
 
 http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't
 just resolve the alice URI above, but returns what bar.com wants to
 say about the alice URI
 
 For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could 
 return:
 
 http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice a void:Dataset .
 http://foo.com/alice #some #data .
 
 I know SPARQL endpoints already have this functionality, but was
 wondering whether any formal research was done towards this direction
 rather than a full-blown SPARQL endpoint.
 
 The reason I'm looking for this sort of thing is because I simply need
 to ask certain third-party datasets whether they have data about a URI
 (inbound links).
 
 Best,
 Luca
 
 
 
 
 
 
 -- 
 Leigh Dodds
 Freelance Technologist
 Open Data, Linked Data Geek
 t: @ldodds
 w: ldodds.com
 e: le...@ldodds.com
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Education

2014-08-25 Thread Hugh Glaser

Hi Leif,
I’m not sure you meant to do Reply-all. :-)

But a Reply-all from me what said that you exactly have the point.
It is entirely appropriate that more than half the course, or even more, would 
be on scripting itself.
And that the students would start from essentially no knowledge - that is the 
target audience.

The course is about the students learning scripting and data stuff - it just 
happens that the examples used are Linked data related, giving useful added 
value.

By the way, someone else is going to to the course that prompted this, and will 
use Python with data processing and stats stuff as the subject, so not so far 
off.

Best
Hugh

On 23 Aug 2014, at 20:06, Leif Isaksen leif...@googlemail.com wrote:

 Hi Hugh
 
 sorry for a slow reply. I was away have been digging through my email
 backlog ever since. I think this is a really interesting question,
 although I suspect you are setting the bar a bit high for humanists
 (and probably social scientists too). The majority of them have no
 experience in scripting at all (although some do, and many are willing
 to try). I think you'd probably need to spend at least half the course
 (or more) dealing with the basic principles of scripting before you
 could start touching on these topics. Having said that, if you can get
 them inspired by the possibilities, I've found they are often willing
 to invest a lot of their own time learning the skills. Of course, you
 can't really write that into the syllabus...
 
 As it happens, my colleague with whom I co-teach our Masters module on
 'Web technologies in the Humanities' has just gone on leave, so if you
 feel like trialling any of these ideas, I have a captive audience for
 you :-)
 
 All the best
 
 L.
 
 PS and as an ex Java developer I'm also sad to agree that java and
 Linked Data are probably a terrible mix, conceptually speaking at any
 rate. I remember in my first ever Semantic Web application we used
 Remote Procedure Calls to transfer RDF :-S
 
 On Sat, Jul 12, 2014 at 12:02 PM, Hugh Glaser h...@glasers.org wrote:
 The other day I was asked if I would like to run a Java module for some 
 Physics  Astronomy students.
 I am so far from plain Java and that sort of thing now there was almost a 
 cognitive dissonance.
 
 But it did cause me to ponder on about what I would do for such a 
 requirement, given a blank sheet.
 
 For people whose discipline is not primarily technical, what would a 
 syllabus look like around Linked Data as a focus, but also causing them to 
 learn lots about how to just do stuff on computers?
 
 How to use a Linked Data store service as schemaless storage:
 bit of intro to triples as simply a primitive representation format;
 scripting for data transformation into triples - Ruby, Python, PHP, awk or 
 whatever;
 scripting for http access for http put, delete to store;
 simple store query for service access (over http get);
 scripting for data post-processing, plus interaction with any data analytic 
 tools;
 scripting for presentation in html or through visualisation tools.
 
 It would be interesting for scientists and, even more, social scientists, 
 archeologists, etc (alongside their statistical package stuff or whatever).
 I think it would be really exciting for them, and they would get a lot of 
 skills on the way - and of course they would learn to access all this Open 
 Data stuff, which is becoming so important.
 I’m not sure they would go for it ;-)
 
 Just some thoughts.
 And does anyone knows of such modules, or even is teaching them?
 
 Best
 Hugh
 --
 Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: URIs within URIs

2014-08-25 Thread Hugh Glaser

On 22 Aug 2014, at 22:43, Ruben Verborgh ruben.verbo...@ugent.be wrote:

Hi Hugh,

Can you tell me id there is a pattern for the uri= style stuff, where you
want everything the service wants to say about the URI, in any position?

The current triple pattern fragments spec does not mandate this, but:
- each response will give you the controls (links and/or form) to find the
other patterns
Not very nice if all I want is to get what the service wants to tell me about
that URI.
- the server is free to include more triples than asked for
Sounds better.
- future extensions (that are planned) can support this
Even better :-)

And I guess that raises the question of bnodes as well.

My answer to that is always:
bnodes are Semantic Web, but not Linked Data.
If a node doesn't have a universal identifier, it cannot be addressed.
I find this comment strange.
If you mean that I can’t query using a bnode, then sure.
If you mean that I never get any bnodes back as a result of a Linked Data URI
GET, then I think not.
But then again, I think my comment was a bit confused itself :-)

Cheers

That might seem like the simple explanation—because it is—
but it's the only satisfying answer I have found so far.

I suppose I am looking at LDF from the point of view of it is a way of
specifying the invoking URI pattern, and what my services would look like if
they were using such patterns to be invoked - although maybe that is misuse?

You could do that; that's one way of looking at it.
The important thing is that a client doesn't have
to guess or know anything about the server.
Just by getting one arbitrary response (fragment),
it is able to retrieve any other. No URL hacking needed.

Best,

Ruben

PS Something I didn't mention in the earlier mail:
it does combine nicely with dereferencing.
For instance, the URL http://data.mmlab.be/people/Ruben+Verborgh
303s to
http://data.mmlab.be/mmlab?subject=http%3A%2F%2Fdata.mmlab.be%2Fpeople%2FRuben%2BVerborgh.

--
Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: URIs within URIs

2014-08-22 Thread Hugh Glaser

Hi Luca,
You mean things like
http://sameas.org/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh
I think.
And for something many years old, and with other flags:
http://www.rkbexplorer.com/network/?uri=http://southampton.rkbexplorer.com/id/person-2f876940347fe251382724b34c27346f-cb9c89b02b078212e440a8016915856atype=person-personformat=foafknowsn3

So yes, they are out there (I have lots of other sites and services that do 
this), but no, I don’t know any research, or even what the topic might be.

Actually, we use a more Cool URI/Restful-like invocation now:
http://sociam-pub.ecs.soton.ac.uk/sameas/symbols/http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh
is much preferable, I think.

Hope that helps.
Best
Hugh
On 22 Aug 2014, at 16:44, Luca Matteis lmatt...@gmail.com wrote:

 Dear LOD community,
 
 I'm wondering whether there has been any research regarding the idea
 of having URIs contain an actual URI, that would then resolve
 information about what the linked dataset states about the input URI.
 
 Example:
 
 http://foo.com/alice - returns data about what foo.com has regarding alice
 
 http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't
 just resolve the alice URI above, but returns what bar.com wants to
 say about the alice URI
 
 For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could return:
 
 http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice a void:Dataset .
 http://foo.com/alice #some #data .
 
 I know SPARQL endpoints already have this functionality, but was
 wondering whether any formal research was done towards this direction
 rather than a full-blown SPARQL endpoint.
 
 The reason I'm looking for this sort of thing is because I simply need
 to ask certain third-party datasets whether they have data about a URI
 (inbound links).
 
 Best,
 Luca
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: URIs within URIs

2014-08-22 Thread Hugh Glaser

Hi Ruben,
Cool posting.

Can you tell me id there is a pattern for the uri= style stuff, where you want 
everything the service wants to say about the URI, in any position?
For a simple site this might look like the SCBD for the URI?
And I guess that raises the question of bnodes as well.
I have looked a bit at the paper and the spec, but couldn’t find it and I’m 
feeling lazy - sorry :-)

I suppose I am looking at LDF from the point of view of it is a way of 
specifying the invoking URI pattern, and what my services would look like if 
they were using such patterns to be invoked - although maybe that is misuse?

Best
Hugh

On 22 Aug 2014, at 17:19, Ruben Verborgh ruben.verbo...@ugent.be wrote:

 Hi Luca,
 
 I'm wondering whether there has been any research regarding the idea
 of having URIs contain an actual URI, that would then resolve
 information about what the linked dataset states about the input URI.
 
 Example:
 
 http://foo.com/alice - returns data about what foo.com has regarding alice
 
 http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't
 just resolve the alice URI above, but returns what bar.com wants to
 say about the alice URI
 
 This specific use case has been one of the motivations behind Triple Pattern 
 Fragments [1][2].
 Section 4.3 of our publication on the Linked Data on the Web workshop [2] 
 specifically tackles this issue.
 
 The problem with dereferencing is that the URI of a concept
 only leads to the information about this concept by the particular source
 that has created this specific URI—even though there might be others.
 For instance, even if http://example.org/#company was the official URI of
 the company EXAMPLE, it is unlikely the source of the most objective 
 information
 about this company. But how can we find that information then?
 
 And the problem gets worse with URIs like http://xmlns.com/foaf/0.1/Person.
 This URI gives you exactly 0 persons, as strange as this might seem to an 
 outsider.
 
 
 With Triple Pattern Fragments, you can say:
 “give me all information this particular dataset has about concept X.”
 For instance, given the resource http://dbpedia.org/resource/Barack_Obama,
 here is data for this person *in a specific dataset*:
 http://data.linkeddatafragments.org/dbpedia?subject=http%3A%2F%2Fdbpedia.org%2Fresource%2FBarack_Obama
 
 Here is data about http://xmlns.com/foaf/0.1/Person in that same dataset:
 http://data.linkeddatafragments.org/dbpedia?object=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
 
 Note how these resources are *not* created by hacking URI patterns manually;
 instead, you can find them through a hypermedia form:
- http://data.linkeddatafragments.org/dbpedia
 This form woks for both HTML and RDF clients, thanks to the Hydra Core 
 Vocabulary.
 In other words, this interface is a hypermedia-driven REST interface through 
 HTTP.
 
 
 This gets us to a deeper difference between (current) Linked Data and the 
 rest of the Web:
 Linked Data uses only links as hypermedia controls,
 whereas the remainder of the Web uses links *and forms*.
 Forms are a much more powerful mechanism to discover information.
 
 So part of what we want to achieve with Triple Pattern Fragments
 is to broaden the usage of Linked Data from links to more expressive 
 hypermedia.
 This truly allows “anybody to say anything about anything”—
 and to discover that information, too!
 
 I know SPARQL endpoints already have this functionality, but was
 wondering whether any formal research was done towards this direction
 rather than a full-blown SPARQL endpoint.
 
 
 The reason I'm looking for this sort of thing is because I simply need
 to ask certain third-party datasets whether they have data about a URI
 (inbound links).
 
 Consider using a Triple Pattern Fragments server [3].
 Their handy and very cheap to host in comparison to SPARQL servers!
 
 Best,
 
 Ruben
 
 [1] http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
 [2] http://ceur-ws.org/Vol-1184/ldow2014_paper_04.pdf
 [3] https://github.com/LinkedDataFragments/Server.js

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-17 Thread Hugh Glaser


On 16 Aug 2014, at 12:57, David Wood da...@3roundstones.com wrote:

 
 On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote:
 
 On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote:
 
 Hi,
 
 But I wonder where so many other sites (including mine) went ?
 
 The problem with crawling the Web of Linked Data is really that it is hard 
 to get the datasets on the edges that set RDF links to other sources but 
 are not the target of links from well-connected sources.
 
 I'm curious, why you don't just crawl the whole Web looking for linked data?
 
 
 Or better yet, work with one of the search engines or Open Crawl so you can 
 use their indexes. 
Well there is possibly a quick answer to this.
Google, at least, doesn’t index Linked Data.
Well, certainly not the kind that does conneg.
See other recent messages on this list about the problem of SEO of Linked Data, 
which is another side of the same coin.

Checking Google:
Looking at http://dbpedia.org/resource/Birching
If I take a URI from (the RDF I get from) that page, and search for it in 
Google, I think I would expect it to take me to quite a few RDF documents in 
various formats.
But, for example,
https://www.google.com/#filter=0q=%22http://ru.dbpedia.org/resource/Розги%22
(asking for all results in the filter=0), shows no RDF documents at all.
Of course, RDF documents would have …/data/… in them, rather than …/resource/… 
or …/page/…
And, in fact, searching for dbpedia/data
https://www.google.com/#q=%22dbpedia.org%2Fdata%22
only gives 1.2M hits, which is way short of what it would be.

Not my field, so I may have it wrong, but I felt like checking it out on a 
stormy Sunday afternoon!

Best
Hugh

 
 Regards,
 Dave
 --
 http://about.me/david_wood
 Sent from my iPad
 
 
 Mark.
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram - First draft and last feedback.

2014-08-15 Thread Hugh Glaser

Feedback:
Awesome, just awesome - no “but”s.

I was wondering, if not even doubtful, that the next versions would be useful, 
because there would be so much.
This version is actually possibly more useful than previous ones.
Not so much for finding datasets, although it is good for that; in addition, at 
a distance it gives you a real sense of the different sectors, and how they are 
connected, while the inter-sector connections are visualised.
Of course it helps I have a 30” screen, so I can even read the words while 
looking at the whole picture, and without my glasses :-)

It makes me think that perhaps I was right, and sameAs.org would have spoilt 
it:- we’ll see next time, I guess.

Well done team!

On 15 Aug 2014, at 08:07, Christian Bizer ch...@bizer.de wrote:

 Hi all,
 
 on July 24th, we published a Linked Open Data (LOD) Cloud diagram containing
 crawlable linked datasets and asked the community to point us at further
 datasets that our crawler has missed [1]. 
 
 Lots of thanks to everybody that did respond to our call and did enter
 missing datasets into the DataHub catalog [2].
 
 Based on your feedback, we have now drawn a draft version of the LOD cloud
 containing:
 1.the datasets that our crawler discovered
 2.the datasets that did not allow crawling
 3.the datasets you pointed us at.
 
 The new version of the cloud altogether contains 558 linked datasets which
 are connected by altogether 2883 link sets. As we were pointed at quite a
 number of linguistic datasets [3], we added linguistic data as a new
 category to the diagram.
 
 The current draft version of the LOD Cloud diagram is found at:
 
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/extendedLO
 DCloud/extendedCloud.png
 
 Please note that we only included datasets that are accessible via
 dereferencable URIs and are interlinked with other datasets.
 
 It would be great if you could check if we correctly included your datasets
 into the diagram and whether we missed some link sets pointing from your
 datasets to other datasets.
 
 If we did miss something, it would be great if you could point us at what we
 have missed and update your entry in the DataHub catalog [2] accordingly.
 
 Please send us feedback until August 20th. Afterwards, we will finalize the
 diagram and publish the final August 2014 version.
 
 Cheers,
 
 Chris, Max and Heiko
 
 --
 Prof. Dr. Christian Bizer
 Data and Web Science Research Group
 Universität Mannheim, Germany 
 ch...@informatik.uni-mannheim.de
 www.bizer.de
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram - First draft and last feedback. - sameAs.org

2014-08-15 Thread Hugh Glaser

Hi Chris,
On 15 Aug 2014, at 11:15, Christian Bizer ch...@bizer.de wrote:

 Hi Hugh,
 
 thank you very much for your positive feedback.
Richly deserved.
 
 Yes, we decided not to include sameAs.org as we understand it to be more a
 service that works on top of the LOD cloud than an actual dataset that
 contributes additional data to the cloud.
 We hope that this interpretation is OK with you.
It is certainly OK leaving it out.
But I don’t agree it does not contribute additional data to the cloud.
It publishes millions of triples that are not available (all the inferred 
sameAs triples), and they would be very hard for people to construct 
themselves, as they are cross-domain.
It also bridges gaps between different equivalence predicates - although of 
course some people won’t want that!
In that sense the main sameAs.org store is a search engine, and provides 
discovery that would be practically impossible to do any other way.

Anyway, encouraged by Kingsley ( :-) ), I have opened all the sameAs sites up 
to LDSpider:- so next time the crawl is likely to get a load of them.
We’ll get to see what it looks like!

Best
Hugh
 
 Cheers,
 
 Chris
 
 -Ursprüngliche Nachricht-
 Von: Hugh Glaser [mailto:h...@glasers.org] 
 Gesendet: Freitag, 15. August 2014 11:57
 An: Christian Bizer
 Cc: public-lod@w3.org
 Betreff: Re: Updated LOD Cloud Diagram - First draft and last feedback.
 
 Feedback:
 Awesome, just awesome - no “but”s.
 
 I was wondering, if not even doubtful, that the next versions would be
 useful, because there would be so much.
 This version is actually possibly more useful than previous ones.
 Not so much for finding datasets, although it is good for that; in addition,
 at a distance it gives you a real sense of the different sectors, and how
 they are connected, while the inter-sector connections are visualised.
 Of course it helps I have a 30” screen, so I can even read the words while
 looking at the whole picture, and without my glasses :-)
 
 It makes me think that perhaps I was right, and sameAs.org would have spoilt
 it:- we’ll see next time, I guess.
 
 Well done team!
 
 On 15 Aug 2014, at 08:07, Christian Bizer ch...@bizer.de wrote:
 
 Hi all,
 
 on July 24th, we published a Linked Open Data (LOD) Cloud diagram 
 containing crawlable linked datasets and asked the community to 
 point us at further datasets that our crawler has missed [1].
 
 Lots of thanks to everybody that did respond to our call and did enter 
 missing datasets into the DataHub catalog [2].
 
 Based on your feedback, we have now drawn a draft version of the LOD 
 cloud
 containing:
 1.   the datasets that our crawler discovered
 2.   the datasets that did not allow crawling
 3.   the datasets you pointed us at.
 
 The new version of the cloud altogether contains 558 linked datasets 
 which are connected by altogether 2883 link sets. As we were pointed 
 at quite a number of linguistic datasets [3], we added linguistic data 
 as a new category to the diagram.
 
 The current draft version of the LOD Cloud diagram is found at:
 
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/exte
 ndedLO
 DCloud/extendedCloud.png
 
 Please note that we only included datasets that are accessible via 
 dereferencable URIs and are interlinked with other datasets.
 
 It would be great if you could check if we correctly included your 
 datasets into the diagram and whether we missed some link sets 
 pointing from your datasets to other datasets.
 
 If we did miss something, it would be great if you could point us at 
 what we have missed and update your entry in the DataHub catalog [2]
 accordingly.
 
 Please send us feedback until August 20th. Afterwards, we will 
 finalize the diagram and publish the final August 2014 version.
 
 Cheers,
 
 Chris, Max and Heiko
 
 --
 Prof. Dr. Christian Bizer
 Data and Web Science Research Group
 Universität Mannheim, Germany
 ch...@informatik.uni-mannheim.de
 www.bizer.de
 
 
 
 
 --
 Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Just what does robots.txt mean for a LOD site?

2014-07-30 Thread Hugh Glaser

Thanks all.
OK, I can live with that.

So things like Tabulator, Sig.ma and SemWeb Browsers can be expected to go 
through a general robots.txt Disallow, which is what I was hoping.

Yes, thanks Aidan, I know I can do various User-agents, but I really just 
wanted to stop anything like googlebot.

By the way, have I got my robots.txt right?
http://ibm.rkbexplorer.com/robots.txt
In particular, is the
User-agent: LDSpider
correct?
Should I worry about case-sensitivity?

Thanks again, all.
Hugh


On 27 Jul 2014, at 19:23, Gannon Dick gannon_d...@yahoo.com wrote:

 
 
 On Sat, 7/26/14, aho...@dcc.uchile.cl aho...@dcc.uchile.cl wrote:
 
 The difference in opinion remains to what extent Linked Data
 agents need to pay attention to the robots.txt file.
 
 As many others have suggested, I buy into the idea of any
 agent not relying document-wise on user input being subject to
 robots.txt.
 
 =
 +1
 Just a comment.
 
 Somewhere, sometime, somebody with Yahoo Mail decided that public-lod mail 
 was spam, so every morning I dig it out because I value the content.
 
 Of course, I could wish for a Linked Data Agent which does that for me, but 
 that would be to complete a banal or vicious cycle, depending on the circle 
 classification scheme in use.  I'm looking gor virtuous cycles and in the 
 case of robots.txt, The lady doth protest too much, methinks.
 --Gannon
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Call for Linked Research

2014-07-29 Thread Hugh Glaser

This is of course an excellent initiative.
But I worry that it feels like people are talking about building stuff from
scratch, or even lashing things together.

Is it really the case that a typical research approach to what you are calling
Linked Research doesn’t turn up theories and systems that can inform what we do?

What I think you are talking about is what I think is commonly called e-Science.
And there is a vast body of research on this topic.
This initiative also impinges on the Open Archives/Access/Repositories
movements, who are deeply concerned about how to capture all research outputs.
See for example http://www.openarchives.org/ore/

In e-Science I know of http://www.myexperiment.org, for example, which has been
doing what I think is very related stuff for 6 or 7 years now, with significant
funding, so is a mature system.
And, of course, it is compatible with all our Linked Data goodness (I hope).
Eg http://www.myexperiment.org/workflows/59
We could do worse than look to see what they can do for us?
And it appears that things can be skinned within the system:
http://www.myexperiment.org/packs/106

You are of course right, that it is a social problem, rather than a technical
problem; this is why others’ experience in solving the social problem is of
great interest.

Maybe myExperiment or a related system would do what you want pretty much out
of the box?

Note that it goes even further than you are suggesting, as it has facilities to
allow other researchers to actually run the code/workflows.

It would take us years to get anywhere close to this sort of thing, unless we
(LD people) could find serious resources.
And I suspect we would end up with something that looks very similar!

Very best
Hugh

On 29 Jul 2014, at 10:02, Sarven Capadisli i...@csarven.ca wrote:

On 2014-07-29 09:43, Andrea Perego wrote:
You might consider including in your call an explicit reference to
nanopublications [1] as an example of how to address point (5).

About source code, there's a project, SciForge [1], working on the
idea of making scientific software citable.

My two cents...

[1]http://nanopub.org/
[2]http://www.gfz-potsdam.de/en/research/organizational-units/technology-transfer-centres/cegit/projects/sciforge/

Thanks for the heads-up, Andrea. The article on my site has an open comment
system, which is intended to have an open discussion or have suggestions for
the others (like the ones you've proposed). Not that I'm opposed to
continuing the discussion here, but you are welcome to contribute there so
that the next person that comes along can get a hold of that information.

It wasn't my intention to refer to all workshops that play nicely towards
open science, vocabularies to use, exact tooling to use, or all efforts out
there e.g., nanopublications.

You have just cited two hyperlinks in that email. Those URLs are accessible
by anything in existence that can make an HTTP GET request. Pardon my
ignorance, but, why do we need off-band software when we have something that
works remarkably well?

-Sarven
http://csarven.ca/#i

--
Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Just what does robots.txt mean for a LOD site?

2014-07-26 Thread Hugh Glaser

Hi.

I’m pretty sure this discussion suggest that we (the LD community) should come 
try to come to some consensus of policy on exactly what it means if an agent 
finds a robots.txt on a Linked Data site.

So I have changed the subject line - sorry Chris, it should have been changed 
earlier.

Not an easy thing to come to, I suspect, but it seems to have become 
significant.
Is there a more official forum for this sort of thing?

On 26 Jul 2014, at 00:55, Luca Matteis lmatt...@gmail.com wrote:

 On Sat, Jul 26, 2014 at 1:34 AM, Hugh Glaser h...@glasers.org wrote:
 That sort of sums up what I want.
 
 Indeed. So I agree that robots.txt should probably not establish
 whether something is a linked dataset or not. To me your data is still
 linked data even though robots.txt is blocking access of specific
 types of agents, such as crawlers.
 
 Aidan,
 
 *) a Linked Dataset behind a robots.txt blacklist is not a Linked Dataset.
 
 Isn't that a bit harsh? That would be the case if the only type of
 agent is a crawler. But as Hugh mentioned, linked datasets can be
 useful simply by treating URIs as dereferenceable identifiers without
 following links.
In Aidan’s view (I hope I am right here), it is perfectly sensible.
If you start from the premise that robots.txt is intended to prohibit access be 
anything other than a browser with a human at it, then only humans could fetch 
the RDF documents.
Which means that the RDF document is completely useless as a 
machine-interpretable semantics for the resource, since it would need a human 
to do some cut and paste or something to get it into a processor.

It isn’t really a question of harsh - it is perfectly logical from that view of 
robots.txt (which isn’t our view, because we think that robots.txt is about 
specific types of agents”, as you say).

Cheers
Hugh

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram -freebase and :baseKB

2014-07-25 Thread Hugh Glaser

Thanks Chris,
Great stuff.
Maybe I’ll change the robots.txt - but I may need to buy more disk space for 
caching before I do :-), or flush the cache more aggressively when I know 
spidering is happening.

It is an awesome picture!!
Previously I was doubtful whether the next version would give much added value, 
but it really does.

Very best
Hugh


On 25 Jul 2014, at 11:12, Christian Bizer ch...@bizer.de wrote:

 Hi Hugh,
 
 thank you very much for your feedback :-)
 
 Yes, your data sources and all data sources in this list
 
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/tables/not
 CrawlableDatasets.tsv
 
 will reappear in the final version.
 
 Freebase is heavily interlinked from DBpedia and also gives you something
 back if you dereference their URIs like http://rdf.freebase.com/ns/m.0156q
 We will check why LDspider did not manage to retrieve data from freebase
 (Andreas: Thank you for your explanation on the topic)
 
 Does anybody know if :baseKB is served via dereferencable URIs and if they
 set any links pointing at other data sets?
 
 If yes, we would love to include them into the final version of the diagram.
 
 Cheers,
 
 Chris
 
 
 -Ursprüngliche Nachricht-
 Von: Hugh Glaser [mailto:h...@glasers.org] 
 Gesendet: Freitag, 25. Juli 2014 01:07
 An: Mike Liebhold
 Cc: Christian Bizer; public-lod@w3.org
 Betreff: Re: Updated LOD Cloud Diagram - Please enter your linked datasets
 into the datahub.io catalog for inclusion.
 
 Awesome achievement, Chris and team!
 
 Yes Mike, there is quite a lot missing from the LOD Cloud we have grown to
 know and love.
 Some of that is I understand because it says it only has stuff that allowed
 spidering (that is, robots.txt permitted it, etc.).
 (I notice this because it means everything I used to have in the LOC Cloud
 has disappeared!) However, the announcement message says that these sets
 will re-appear, so that is good.
 I don’t know if that applies to Freebase; and I think :baseKB is not there
 either, but maybe that doesn’t have any links.
 
 I have to say that it is not clear to me that it is good practice to refer
 to this image as the current/updated version of the LOD Cloud diagram”.
 It seems that you didn’t understand the significance of this from Chris’
 message, and I suspect that you will not be alone.
 
 Best
 Hugh
 
 On 24 Jul 2014, at 23:39, Mike Liebhold m...@well.com wrote:
 
 I recall earlier versions of the LOD Cloud diagram included freebase - I
 don't see it here, - or  the google knowledge graph either.
 
 am I missing something?
 
 ??
 
 
 On 7/24/14, 5:18 AM, Christian Bizer wrote:
 Hi all,
 
 Max Schmachtenberg, Heiko Paulheim and I have crawled of the Web of
 Linked Data and have drawn an updated LOD Cloud diagram based on the results
 of the crawl.
 
 This diagram showing all linked datasets that our crawler managed to
 discover in April 2014 is found here:
 
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/LOD
 CloudDiagram.png
 
 We also analyzed the compliance of the different datasets with the Linked
 Data best practices and a paper presenting the results of the analysis is
 found below. The paper will appear at ISWC 2014 in the Replication,
 Benchmark, Data and Software Track.
 
 http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/Sc
 hmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf
 
 The raw data used for our analysis is found on this page:
 
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/
 
 Our crawler did discover 77 dataset that do not allow crawling via their
 robots.txt files and these datasets were not included into our analysis and
 are also not included in the current version of the LOD Cloud diagram.
 
 A list of these datasets is found at  
 http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/tab
 les/notCrawlableDatasets.tsv
 
 In order to give a comprehensive overview of all Linked Data sets that
 are currently online, we would like to draw another version of the LOD Cloud
 diagram including the datasets that our crawler has missed as well as the
 datasets that do not allow crawling.
 
 Thus, if you publish or know about linked datasets that are not in the
 diagram or in the list of not crawlable datasets yet, please:
 
 1.   Enter them into the datahub.io data catalog until August 8th.
 2.   Tag them in the catalog with the tag ‘lod’
 (http://datahub.io/dataset?tags=lod)
 3.   Send an email to Max and Chris pointing us at the entry in the
 catalog.
 
 We will include all datasets into the updated version of the cloud
 diagram, that fulfill the following requirements:
 
 1.   Data items are accessible via dereferencable URIs.
 2.   The dataset sets at least 50 RDF links pointing at other
 datasets or at least one other dataset is setting 50 RDF links pointing at
 your dataset.
 
 Instructions on how to describe your dataset in the catalog are found
 here:
 
 https://www.w3.org/wiki

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser

Hi Aiden,
I think I probably agree with everything you say, but with one exception:

On 25 Jul 2014, at 19:14, aho...@dcc.uchile.cl wrote:

 found that the crawl encountered many problems accessing the various
 datasets in the catalogue: robots.txt, 401s, 502s, bad conneg, 404/dead,
 etc.
The idea that having a robots.txt that Disallows spiders is a “problem” for a 
dataset is rather bizarre.
It is of course a problem for the spider, but is clearly not a problem for a 
typical consumer of the dataset.
By that measure, serious numbers of the web sites we all use on a daily basis 
are problematic.

By the way, the reason this has come up for me is because I was quite happy not 
to be spidered for the BTC (a conscious decision), but I think that some of my 
datasets might be useful for people, so would prefer to see them included in 
the LOD Cloud.
I actually didn’t submit a seed list to the BTC; but I had forgotten that we 
had robots.txt everywhere, so it wouldn’t have done it in any case! :-)

Anyway, we just need to get around the problem, if we feel that this is all 
useful.
So…
Let’s do something about it.
I’m no robots.txt expert, but I have changed the appropriate robots.txt to have:
User-agent: LDSpider
Allow: *
User-agent: *
Sitemap: http:/{}.rkbexplorer.com/sitemap.xml
Disallow: /browse/
...

I wonder whether this (or something similar) is useful?
I realise that it is now too late for the current activity (I assume), but I’ll 
just leave it all there for future stuff.

Cheers

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

sameAs.org The LOD Cloud Diagram - advice please

2014-07-25 Thread Hugh Glaser

So sameAs.org never appears in any of this stuff.
That’s deliberate.
The whole idea of it is that it doesn’t add to the plethora of URIs by 
generating new ones.
But it seems that people do find it useful - I get emails from people about it, 
especially when it does strange things :-)

sameAs.org is a service that consumes LD URIs from other PLDs and delivers the 
answer as LD.
So it is, in fact, a proper Linked Data site.
So you can do the stuff you expect with a URI like
http://www.sameas.org/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FEdinburgh

As you can see, this is not a Cool URI (I think), and as I said, I really don’t 
want people to think of this as the ID for a sameAs bundle, although in fact it 
is!

So what should I do?
Keep sameAs.org living outside the BTC and LOD Cloud world?
Or change things so that it becomes a more normal part of the LOD world?

In addition, I have quite a lot of other services that work over LD URIs to 
produce LD about the URI, but as with sameAs.org are also LD URIs because the 
service interface itself supports the LD model.
Should these brought into the PR fold, and if so how?
It would be very painful to allow these to be spidered, because they are heavy 
computations, and running the service over all the URIs that could be run would 
be many years of computation for my server.

I suspect that others have similar services, so a policy might be useful.

In fact, as LD becomes more mature (!), I think we are finding that it is more 
a communication mechanism between cooperating service than simply delivering 
RDF in response to an identifying URI - how do we capture this massive LD 
resource?

I hope that makes some sort of sense.

Best
Hugh
-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser

Very interesting.
On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote:

 On 25/07/2014 14:44, Hugh Glaser wrote:
 The idea that having a robots.txt that Disallows spiders
 is a “problem” for a dataset is rather bizarre.
 It is of course a problem for the spider, but is clearly not a problem
 for a
 typical consumer of the dataset.
 By that measure, serious numbers of the web sites we all use on a daily
 basis are problematic.
 snip
 
 I think the general interpretation of the robots in robots.txt is any
 software agent accessing the site automatically (versus a user manually
 entering a URL).
I had never thought this.
My understanding of the agents that should respect the robots.txt is what are 
usually called crawlers or spiders.
Primarily search engines, but also including things that aim to automatically 
get a whole junk of a site.
Of course, there is no de jure standard, but the places I look seem to lean to 
my view.
http://www.robotstxt.org/orig.html
WWW Robots (also called wanderers or spiders) are programs that traverse many 
pages in the World Wide Web by recursively retrieving linked pages.”
https://en.wikipedia.org/wiki/Web_robot
Typically, bots perform tasks that are both simple and structurally 
repetitive, at a much higher rate than would be possible for a human alone. “
It’s all about scale and query rate.
So a php script that fetches one URI now and then is not the target for the 
restriction - nor indeed is my shell script that daily fetches a common page I 
want to save on my laptop.

So, I confess, when my system trips over a dbpedia (or any other) URI and does 
follow-your-nose to get the RDF, it doesn’t check that the site robots.txt 
allows it.
And I certainly don’t expect Linked Data consumers doing simple URI resolution 
to check my robots.txt

But you are right, if I am wrong - robots.txt would make no sense in the Linked 
Data world, since pretty much by definition it will always be an agent doing 
the access.
But then I think we really need a convention (User-agent: ?) that lets me tell 
search engines to stay away, while allowing LD apps to access the stuff they 
want.

Best
Hugh
 
 If we agree on that interpretation, a robots.txt blacklist prevents
 applications from following links to your site. In that case, my
 counter-question would be: what is the benefit of publishing your content
 as Linked Data (with dereferenceable URIs and rich links) if you
 subsequently prevent machines from discovering and accessing it
 automatically? Essentially you are requesting that humans (somehow) have
 to manually enter every URI/URL for every source, which is precisely the
 document-centric view we're trying to get away from.
 
 Put simply, as far as I can see, a dereferenceable URI behind a robots.txt
 blacklist is no longer a dereferenceable URI ... at least for a respectful
 software agent. Linked Data behind a robots.txt blacklist is no longer
 Linked Data.
 
 (This is quite clear in my mind but perhaps others might disagree.)
 
 Best,
 Aidan
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser

Hi,
Well, as you might guess, I can’t say I agree.
Firstly, as you correctly say, if there is a robots.txt with Disallow / on the 
RDF on a LD site, then it effectively prohibits any LD app from accessing the 
LD.
So clearly that can’t be what the publisher intended (the idea of publishing 
RDF for humans to fetch is not a big market).
So what did the publisher intend? This should be what the consumer aims to 
comply with.
If you take a pragmatic (rather than perhaps more literal) view of what someone 
might mean when they put such a robots.txt on a LD site, then it can only mean 
please only access my site in the sort of usage patterns that I might expect 
from a person” or similar.

Secondly, I think in discussing robots, it is central to the issue to try to 
answer the question of what is a robot?”, which is why I included that 
discussion, which is linked off reference to robots on the wikipedia page that 
you quote, rather than just the page you quote.
The systems you describe are good questions, and I would say that in the end 
the builders have to decide whether their system is what the publisher might 
have thought of as a robot.
My system (if I recall correctly!), monitors what it is accessing to ensure 
that it does not make undue demands on the LD sites it accesses; this is just 
good practice, irrespective of whether there is a Disallow or not, I think.

I am guessing we will just have to differ on all this!

Best
Hugh

On 25 Jul 2014, at 22:13, aho...@dcc.uchile.cl wrote:

 
 On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting.
 On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote:
 
 On 25/07/2014 14:44, Hugh Glaser wrote:
 The idea that having a robots.txt that Disallows spiders
 is a “problem” for a dataset is rather bizarre.
 It is of course a problem for the spider, but is clearly not a problem
 for a
 typical consumer of the dataset.
 By that measure, serious numbers of the web sites we all use on a daily
 basis are problematic.
 snip
 
 I think the general interpretation of the robots in robots.txt is any
 software agent accessing the site automatically (versus a user manually
 entering a URL).
 I had never thought this.
 My understanding of the agents that should respect the robots.txt is what
 are usually called crawlers or spiders.
 Primarily search engines, but also including things that aim to
 automatically get a whole junk of a site.
 Of course, there is no de jure standard, but the places I look seem to
 lean to my view.
 http://www.robotstxt.org/orig.html
 WWW Robots (also called wanderers or spiders) are programs that traverse
 many pages in the World Wide Web by recursively retrieving linked pages.”
 https://en.wikipedia.org/wiki/Web_robot
 Typically, bots perform tasks that are both simple and structurally
 repetitive, at a much higher rate than would be possible for a human
 alone. “
 It’s all about scale and query rate.
 So a php script that fetches one URI now and then is not the target for
 the restriction - nor indeed is my shell script that daily fetches a
 common page I want to save on my laptop.
 
 So, I confess, when my system trips over a dbpedia (or any other) URI and
 does follow-your-nose to get the RDF, it doesn’t check that the site
 robots.txt allows it.
 And I certainly don’t expect Linked Data consumers doing simple URI
 resolution to check my robots.txt
 
 But you are right, if I am wrong - robots.txt would make no sense in the
 Linked Data world, since pretty much by definition it will always be an
 agent doing the access.
 But then I think we really need a convention (User-agent: ?) that lets me
 tell search engines to stay away, while allowing LD apps to access the
 stuff they want.
 
 Then it seems our core disagreement is on the notion of a robot, which is
 indeed a grey area. With respect to robots only referring to
 warehouses/search engines, this was indeed the primary use-case for
 robots.txt, but for me it's just an instance of what robots.txt is used
 for.
 
 Rather than focus on what is a robot, I think it's important to look at
 (some of the commonly quoted reasons) why people use robots.txt and what
 the robots.txt requests:
 
 
 Charles Stross claims to have provoked Koster to suggest robots.txt,
 after he wrote a badly-behaved web spider that caused an inadvertent
 denial of service attack on Koster's server. [1]
 
 Note that robots.txt has an optional Crawl-delay primitive. Other reasons:
 
 A robots.txt file on a website will function as a request that specified
 robots ignore specified files or directories when crawling a site. This
 might be, for example, out of a preference for privacy from search engine
 results, or the belief that the content of the selected directories might
 be misleading or irrelevant to the categorization of the site as a whole,
 or out of a desire that an application only operate on certain data. [1]
 
 
 So moving aside from the definition of a robot, more importantly, I think
 a domain administrator has

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser

Hi Luca,
Thanks for asking.

I have resources that number 100Ms and even 1Bs of resolvable URIs.
I even have datasets with effectively infinite numbers of URIs.
Some people seem to find them useful, in the sense that they want to look 
specific things up.
These are not documents - they are dynamically generated RDF documents from 
SQL, triple or other storage mechanisms.
It can be a serious cost to me in terms of server processor, network and disk 
cost (I do some caching to trade processor cost against disk space) to allow 
crawlers to try to spider serious parts or all of the dataset.
Some of the documents can take several seconds of CPU to generate.
(Since all this is unfunded most costs come out of my pocket, by the way.)
So it may be that avoiding spiders is the difference between me offering the 
dataset and not - or at least it means that the service that the “real” users 
get is not overwhelmed by the bots.

So what I want to do is make the datasets available, but I don’t want to bear 
the costs of having Google, Bing, or anyone else, actually crawling the site.
And no, I don’t want to have anything more than URI resolution, by having 
people register or authenticate - I want access to be as easy as possible - URI 
resolution.
Actually, spidering is what the sitemap (which I put work into building if one 
is possible) is for.

Oh, and I should say that the dynamic nature of the data means that the last 
modified and similar headers cannot be reliable set, and so bots would find 
incremental spidering rather challenging.

And I do think what I say applies to the web of documents.
Would a web site manager really object to me having a script that occasionally 
got some news or weather and displayed it on a web page.

By the way, I see that the standard Drupal instance puts this in the robots.txt:
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these robots where not to go on your site,
# you save bandwidth and server resources.

That sort of sums up what I want.

But now I seem to be repeating myself :-)

Best
Hugh

On 25 Jul 2014, at 23:23, Luca Matteis lmatt...@gmail.com wrote:

 Robots.txt to me works well for a web of documents. That is, wanting
 only humans to access certain resources. But for a web of data, why
 resort to a robots.txt when you could simply not put the resource
 online in the first place?
 
 On Fri, Jul 25, 2014 at 11:54 PM, Hugh Glaser h...@glasers.org wrote:
 Hi,
 Well, as you might guess, I can’t say I agree.
 Firstly, as you correctly say, if there is a robots.txt with Disallow / on 
 the RDF on a LD site, then it effectively prohibits any LD app from 
 accessing the LD.
 So clearly that can’t be what the publisher intended (the idea of publishing 
 RDF for humans to fetch is not a big market).
 So what did the publisher intend? This should be what the consumer aims to 
 comply with.
 If you take a pragmatic (rather than perhaps more literal) view of what 
 someone might mean when they put such a robots.txt on a LD site, then it can 
 only mean please only access my site in the sort of usage patterns that I 
 might expect from a person” or similar.
 
 Secondly, I think in discussing robots, it is central to the issue to try to 
 answer the question of what is a robot?”, which is why I included that 
 discussion, which is linked off reference to robots on the wikipedia page 
 that you quote, rather than just the page you quote.
 The systems you describe are good questions, and I would say that in the end 
 the builders have to decide whether their system is what the publisher might 
 have thought of as a robot.
 My system (if I recall correctly!), monitors what it is accessing to ensure 
 that it does not make undue demands on the LD sites it accesses; this is 
 just good practice, irrespective of whether there is a Disallow or not, I 
 think.
 
 I am guessing we will just have to differ on all this!
 
 Best
 Hugh
 
 On 25 Jul 2014, at 22:13, aho...@dcc.uchile.cl wrote:
 
 
 On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting.
 On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote:
 
 On 25/07/2014 14:44, Hugh Glaser wrote:
 The idea that having a robots.txt that Disallows spiders
 is a “problem” for a dataset is rather bizarre.
 It is of course a problem for the spider, but is clearly not a problem
 for a
 typical consumer of the dataset.
 By that measure, serious numbers of the web sites we all use on a daily
 basis are problematic.
 snip
 
 I think the general interpretation of the robots in robots.txt is any
 software agent accessing the site automatically (versus a user manually
 entering a URL).
 I had never thought this.
 My understanding of the agents that should respect the robots.txt is what
 are usually called crawlers or spiders.
 Primarily search engines, but also including things that aim to
 automatically get a whole junk of a site.
 Of course

Re: Education

2014-07-14 Thread Hugh Glaser

Thanks Sarven,
Sounds like you have flesh on very much the sort of thing I was thinking.
And in fact you are reporting success too, which is great.
And yes, definitely turtle/n3, and even the command line world too!
Very best
Hugh
On 12 Jul 2014, at 13:38, Sarven Capadisli i...@csarven.ca wrote:

 On 2014-07-12 13:02, Hugh Glaser wrote:
 The other day I was asked if I would like to run a Java module for some 
 Physics  Astronomy students.
 I am so far from plain Java and that sort of thing now there was almost a 
 cognitive dissonance.
 
 But it did cause me to ponder on about what I would do for such a 
 requirement, given a blank sheet.
 
 For people whose discipline is not primarily technical, what would a 
 syllabus look like around Linked Data as a focus, but also causing them to 
 learn lots about how to just do stuff on computers?
 
 How to use a Linked Data store service as schemaless storage:
 bit of intro to triples as simply a primitive representation format;
 scripting for data transformation into triples - Ruby, Python, PHP, awk or 
 whatever;
 scripting for http access for http put, delete to store;
 simple store query for service access (over http get);
 scripting for data post-processing, plus interaction with any data analytic 
 tools;
 scripting for presentation in html or through visualisation tools.
 
 It would be interesting for scientists and, even more, social scientists, 
 archeologists, etc (alongside their statistical package stuff or whatever).
 I think it would be really exciting for them, and they would get a lot of 
 skills on the way - and of course they would learn to access all this Open 
 Data stuff, which is becoming so important.
 I’m not sure they would go for it ;-)
 
 Just some thoughts.
 And does anyone knows of such modules, or even is teaching them?
 
 Best
 Hugh
 
 
 Hi Hugh,
 
 I teach a few introductory lectures on Linked Data, HTTP, URI, RDF, SPARQL as 
 part of a Web and Internet Technologies course to students in Business IT at 
 the Bern University of Applied Sciences. Majority of the students do not have 
 a developer profile. Focus of the lessons is not about the inner technical 
 details of these technologies, but via some practical work, what they can 
 take away: understanding some publishing and consuming challenges for data on 
 the Web, and potentially communicating problems and solutions to their 
 colleagues with technical expertise in the future.
 
 What I have observed:
 
 * Before going any further, examples on the state of things and the 
 potentials of what can be accomplished is vital. If they are not remotely 
 excited, it sets the tone for the remainder of the lectures.
 
 * At first they do not completely take the importance of HTTP/URI seriously. 
 They've seen them, they know mentality. The exercises around that is about 
 designing their own URI patterns for their site/profile, and repeating the 
 importance of Cool URIs and what that entails over and over.
 
 * Majority of the students understand the RDF data model and can express 
 statements (either using human language or one of the formats). I usually 
 bounce back and forth between drawing graphs on the board, and showing, 
 dereferencing, browsing RDF resources, and pointing at people and objects in 
 and outside of the room.
 
 * As far as their comprehension for the formats i.e., how to write some 
 statements that's mostly syntactically valid, Turtle/N-Triples lead the pack. 
 RDF/XML and RDFa usually turn out to be a disaster. Most do not bother with 
 JSON(-LD).
 
 * Once they get the hang of Turtle, they do relatively well in SPARQL. I've 
 noticed that it is via SPARQL examples, trials and errors, they really get 
 the potential of Linked Data. Along the way, it appears to reassure them that 
 RDF and friends are powerful and will come in handy.
 
 
 IMHO:
 
 Although I welcome them to use any format for exercises and whatnot, I 
 encourage them to use Turtle or N-Triples. I tell them that learning Turtle 
 is the best investment because they can use that knowledge towards SPARQL. 
 However, Turtle comes with a few syntactical traps and declarations, that, 
 I secretly wish that they use N-Triples instead to learn to create statements 
 for the sake of simplicity. After all, N-Triples is as WYSIWYG as it gets!
 
 With a blank slate:
 
 In most cases: I have a strong bias towards *nix command-line toolbox and 
 shell scripting over alternative programming languages. *Out of the box*, the 
 shell environment is remarkable and indispensable. The documentation is baked 
 in. Working in this environment leads to some design decisions as described 
 in http://www.faqs.org/docs/artu/ch01s06.html. One can do everything from 
 data processing, transformations, inspection, analysis to parallelization 
 here. Besides, it is the perfect glue for everything else.
 
 
 -Sarven
 http://csarven.ca/#i

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533

Education

2014-07-12 Thread Hugh Glaser

The other day I was asked if I would like to run a Java module for some Physics 
 Astronomy students.
I am so far from plain Java and that sort of thing now there was almost a 
cognitive dissonance.

But it did cause me to ponder on about what I would do for such a requirement, 
given a blank sheet.

For people whose discipline is not primarily technical, what would a syllabus 
look like around Linked Data as a focus, but also causing them to learn lots 
about how to just do stuff on computers?

How to use a Linked Data store service as schemaless storage:
bit of intro to triples as simply a primitive representation format;
scripting for data transformation into triples - Ruby, Python, PHP, awk or 
whatever;
scripting for http access for http put, delete to store;
simple store query for service access (over http get);
scripting for data post-processing, plus interaction with any data analytic 
tools;
scripting for presentation in html or through visualisation tools.

It would be interesting for scientists and, even more, social scientists, 
archeologists, etc (alongside their statistical package stuff or whatever).
I think it would be really exciting for them, and they would get a lot of 
skills on the way - and of course they would learn to access all this Open Data 
stuff, which is becoming so important.
I’m not sure they would go for it ;-)

Just some thoughts.
And does anyone knows of such modules, or even is teaching them?

Best
Hugh
-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Alternative Linked Data principles

2014-04-28 Thread Hugh Glaser

Something like doing Linked Data over P2P networks, that is, using distributed 
hash tables?
You might like to have a look at some of that research (you can probably google 
better than I can).
It weakens the provenance etc of the DNS angle, but of course then enables 
others to publish Linked Data against identifiers where they don’t have the DNS.
There are various people who have been interested in it over the years - I have 
had interesting discussions with some.
I played with putting some RDF on bittorrent using IDs that were Linked Data 
http URIs a while ago (in fact they may still be out there!), but it seems we 
have enough problems trying to get the http world working before trying other 
frameworks :-)
Hugh

On 28 Apr 2014, at 16:55, Luca Matteis lmatt...@gmail.com wrote:

 Thanks John but not really.
 
 I was specifically looking for research that wasn't based on protocols
 such as HTTP, URIs and RDF. But that is still in the field of
 achieving a global interconnected database.
 
 I know webby standards are implemented so no need to reinvent the
 wheel, but I think it's healthy to look things from a different
 prospective; who knows maybe UDP works better for achieving federated
 queries. Or maybe triples aren't really the only way to represent the
 real world.
 
 Luca
 
 On Mon, Apr 28, 2014 at 5:41 PM, John Erickson olyerick...@gmail.com wrote:
 Luca, I think you are not asking quite the right question; I think
 what you want to ask is whether the Linked Data Principles can be
 applied to different...
 
 * entity identifiers...
 * protocols with which to resolve and retrieve information about those 
 entities
 * protocols with which to retrieve manifestations of resources
 associated with those named entities...
 * file formats with which to serialize manifestations of resources...
 * standards for modelling relationships between entities...
 
 The value of the Linked Data Principles as bound to Webby standards
 is that they are specific and readily implemented; no make believe...
 
 John
 
 On Mon, Apr 28, 2014 at 11:23 AM, Luca Matteis lmatt...@gmail.com wrote:
 The current Linked Data principles rely on specific standards and
 protocols such as HTTP, URIs and RDF/SPARQL. Because I think it's
 healthy to look at things from a different prospective, I was
 wondering whether the same idea of a global interlinked database (LOD
 cloud) was portrayed using other principles, perhaps based on
 different protocols and mechanisms.
 
 Thanks,
 Luca
 
 
 
 
 --
 John S. Erickson, Ph.D.
 Deputy Director, Web Science Research Center
 Tetherless World Constellation (RPI)
 http://tw.rpi.edu olyerick...@gmail.com
 Twitter  Skype: olyerickson
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Encoding an incomplete date as xsd:dateTime

2014-02-10 Thread Hugh Glaser

It may be worth reminding ourselves (?) that in RDF you can use “all of the 
above”.
That is, if you want to make your RDF as consumable as possible, you may well 
represent the same stuff using more than one ontology.
This is true of lots of things, such as peoples’ names or different 
bibliographic ontologies (although sometimes subproperty will do), but is 
particularly true for date and time.

http://www.w3.org/TR/owl-time/
has an example at the end:

:meetingStart
  a   :Instant ;
  :inDateTime
  :meetingStartDescription ;
  :inXSDDateTime
  2006-01-01T10:30:00-5:00 .

:meetingStartDescription
  a   :DateTimeDescription ;
  :unitType :unitMinute ;
  :minute 30 ;
  :hour 10 ;
  :day 1 ;
  :dayOfWeek :Sunday ;
  :dayOfYear 1 ;
  :week 1 ;
  :month 1 ;
  :timeZone tz-us:EST ;
  :year 2006 .
and it might well be the case that you would have both (and other 
representations) in your dataset, and of course you would only assert the bits 
of the second representation that were appropriate.

Of course, that doesn’t answer your original question, Heiko (sorry!), about 
what the xsd version should look like.

Hugh

On 10 Feb 2014, at 15:53, Niklas Lindström lindstr...@gmail.com wrote:

 Hi Heiko,
 
 Unless you want to use another ontology (e.g. BIO [1][2] or schema.org [3]), 
 I'd probably go ahead and break that contract, although it is not technically 
 safe (AFAIK, it's a violation of OWL semantics). It depends on the expected 
 consumption of your data.
 
 I would say that the vcard ontology formally needs to be fixed to allow for 
 more variation. It actually seems to have been amended somewhat in 2010 [4], 
 to at least not require the exact second (or fraction thereof) of the birth. 
 But that's hardly enough. A lot of the point of datatyped literals in RDF is 
 lost when datatype properties are locked down like this.
 
 Cheers,
 Niklas
 
 [1]: http://vocab.org/bio/0.1/.html
 [2]: http://wiki.foaf-project.org/w/BirthdayIssue
 [3]: http://schema.org/birthDate
 [4]: http://www.w3.org/Submission/2010/SUBM-vcard-rdf-20100120/
 
 
 
 
 On Mon, Feb 10, 2014 at 3:55 PM, Heiko Paulheim 
 he...@informatik.uni-mannheim.de wrote:
 Hi Jerven,
 
 this looks like a pragmatic solution. But I wonder if it may lead to any 
 conflicts, e.g., the vcard ontology defines the bday property with 
 xsd:dateTime as its range explicitly. Is it safe to simply use an xsd:gYear 
 value as its object?
 
 Best,
 Heiko
 
 
 
 Am 10.02.2014 15:43, schrieb Jerven Bolleman:
 
 Hi Heiko,
 
 http://www.w3.org/TR/xmlschema-2/#gYear and 
 http://www.w3.org/TR/xmlschema-2/#gYeargYearMonth
 are the datatypes that you should use.
 
 Regards,
 Jerven
 
 On 10 Feb 2014, at 15:37, Heiko Paulheim he...@informatik.uni-mannheim.de 
 wrote:
 
 Hi all,
 
 xsd:dateTime and xsd:date are used frequently for encoding dates in RDF, 
 e.g., for birthdays in the vcard ontology [1]. Is there any best practice to 
 encode incomplete date information, e.g., if only the birth *year* of a 
 person is known?
 
 As far as I can see, the XSD spec enforces the provision of all date 
 components [2], but 1997-01-01 seems like a semantically wrong way of 
 expressing that someone is born in 1997, but the author does not know exactly 
 when.
 
 Thanks,
 Heiko
 
 [1] http://www.w3.org/2006/vcard/ns
 [2] http://www.w3.org/TR/xmlschema-2/#dateTime
 [3] http://www.w3.org/TR/xmlschema-2/#date
 
 -- 
 Dr. Heiko Paulheim
 Research Group Data and Web Science
 University of Mannheim
 Phone: +49 621 181 2646
 B6, 26, Room C1.08
 D-68159 Mannheim
 
 Mail: he...@informatik.uni-mannheim.de
 Web: www.heikopaulheim.com
 
 
 
 
 
 -- 
 Dr. Heiko Paulheim
 Research Group Data and Web Science
 University of Mannheim
 Phone: +49 621 181 2646
 B6, 26, Room C1.08
 D-68159 Mannheim
 
 Mail: he...@informatik.uni-mannheim.de
 Web: www.heikopaulheim.com
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Extracting URIs - rapper --trace?

2014-02-09 Thread Hugh Glaser

Hi.
I wanted to extract the URIs from some rdf, and it struck me that rapper 
probably did it for me.
And yes, that is what the -t/—trace flag says it does, I think.
   -t, --trace
  Print URIs retrieved during parsing.  Especially useful for 
monitor-
  ing what the guess and GRDDL parsers are doing.”

But I can’t get it to make any difference - am I doing something wrong, please?
Best
Hugh
-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: HTTPS for RDF URIs?

2014-01-31 Thread Hugh Glaser


On 31 Jan 2014, at 11:29, ☮ elf Pavlik ☮ perpetual-trip...@wwelves.org wrote:

 On 01/30/2014 09:10 PM, Kingsley Idehen wrote:
 On 1/30/14 1:09 PM, Melvin Carvalho wrote:
 
 
If not bad, is there any provision for allowing that an HTTPS URI
that only differs in the scheme part from HTTPS URI be identified
as the same resource?
 
 
 http and https are fundamentally different resources, but you can link
 them together with owl : sameAs, I think ...
 
 Yes.
 
 You simply use an http://www.w3.org/2002/07/owl#sameAs relation to
 indicate that a common entity is denoted [1] by the http: and https:
 scheme URIs in question.
 does it make sense then to use https: IRIs if we state that one can treat 
 http: version as equivalent?
Yes.
Because you get a different description of the NIR back from the other URI.

I’m tempted to say that the s after the http is no different to adding an s to 
the end - they are both valid URIs, and so simply opaque identifiers.
But someone will probably tell me that is too sloppy :-)

On the other hand, if I was to be pedantic, adding the s to the end of the http 
does take you out of Linked Data (although it is Semantic Web) according to the 
Principles.
But I have never let a little thing like that bother me.

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: HTTPS for RDF URIs?

2014-01-31 Thread Hugh Glaser


On 31 Jan 2014, at 12:46, Alfredo Serafini ser...@gmail.com wrote:

 Hi all 
 
 regarding opaque uri: maybe a difference in the scheme could be seen as a 
 complementary to a different type extension.
 If i'm referring for example to the resource http://wiki/page.html or 
 http://wiki/page.rdf i probably expect two different representation on the 
 same resource, from a technical REST-like approach. Should we interpet also 
 those as opaques?
 Sorry if this is probably a sort of recurring question. 
 If the formats for type extension are acceptable, the best would be in using 
 also the schem much like in the same way. For example I suppose that I could 
 have also have something like: file://wiki/page.html, for a local copy. Is 
 this acceptable in theory?
 

Well, it is a URI, and as Kingsley says, denotes the resource it denotes, which 
may of may not be the same as anything else in this world.
So you could do this, if you wanted.
And you could use mailto for your local URI and ftp or http for the public one 
for a resource which is an email address - or vice versa.
But it is all in your mind (or some more complex RDF and OWL if you want to - 
as elf says, these things are all opaque.

But I’m afraid you just crossed a line for me.
http://www.w3.org/DesignIssues/LinkedData.html
says (number 2):
Use HTTP URIs so that people can look up those names.”
and all the other versions have something similar.
I can happily accept that “HTTP” in this is a shorthand for “HTTP or HTTPS”, 
since they perform very similarly in terms of why the Principles specify HTTP.
But if you move to “file:”, then you have lost all the things that Principle 2 
was aiming for.

Of course, if this discussion was happening on the Semantic Web list, I would 
not make these comments (or at least not the same way); but this is the LOD 
list, and I think that globally-recognised identifiers are more de rigueur as a 
sine qua non, to use a couple of English phrases :-)
Best
Hugh

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: HTTPS for RDF URIs?

2014-01-30 Thread Hugh Glaser

And of course I would be happy to host such triples at sameAs.org :-)
(And maybe a separate store that was devoted only to such triples would be a 
useful idea?)
Just send me them to tell me where they are…

Best
On 30 Jan 2014, at 18:09, Melvin Carvalho melvincarva...@gmail.com wrote:

 
 
 
 On 29 January 2014 22:36, Maloney, Christopher (NIH/NLM/NCBI) [C] 
 malon...@ncbi.nlm.nih.gov wrote:
 Apologies if this topic has come up before (I feel certain that it has) but 
 I've searched the archives and Googled, and can't find anything (maybe too 
 many false positives).
 
 What are the current best practice recommendations regarding the use of HTTPS 
 URIs for resources in RDF?  Are they bad?
 
 No, they are good.  We use them over at https://w3id.org/ (one of the reasons 
 that was created)
  
 
 If not bad, is there any provision for allowing that an HTTPS URI that only 
 differs in the scheme part from HTTPS URI be identified as the same resource?
 
 http and https are fundamentally different resources, but you can link them 
 together with owl : sameAs, I think ...
  
 
 Thanks!
 
 Chris Maloney
 NIH/NLM/NCBI (Contractor)
 Building 45, 5AN.24D-22
 301-594-2842
 
 
 
 
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: LOD publishing question

2014-01-30 Thread Hugh Glaser

Hi Giovanni,
Thank you for the update.
I am sorry to hear that Sindice is going into a frozen state, and that 
circumstances are making that happen, but of course pleased that you are able 
to keep it going at all.
I send you and your team my personal thanks for the service you have provided 
over the last 5 or so years, and wish you all well.
Very best
Hugh.


On 28 Jan 2014, at 14:19, Giovanni Tummarello g.tummare...@gmail.com wrote:

 With respect to Sindice
 
 for a number of reasons, the people who originally created it, the former 
 Data Intensive Infrastructure group, are either not working in the original 
 institution hosting it, National University of Ireland Galway, institute 
 formerly known as DERI or have been assigned to other tasks. 
 
 Sindice has been operating for 5+ years, updating its index, (though we were 
 never perfect) and we believe supported a lot of works on the field,  but its 
 now time to move on.  In the meanwhile the project will continue answer 
 queries but without updating its index. 
 
 Apologies for the inconvenience of course, we'll be posting on this soon and 
 update the homepage to reflect the change.
 
 Giovanni
 
 
 
 On Tue, Jan 28, 2014 at 11:27 AM, Hugh Glaser h...@glasers.org wrote:
 Good question.
 I’ll report what I found, rather than advising.
 
 So I went there when you published that email, looking for stuff to put in my 
 sameas.org site.
 I tried exploring, and when I went to Browse I only found a few things, so 
 wasn’t encouraged :-)
 (And, as an aside, Advanced Search didn’t seem to do anything, and the search 
 links at the bottom were not links.)
 So I decided that it wasn’t really mature enough to make it worth the effort 
 (yet?), even though there should be massive scope for linkage eventually.
 
 But the real problem was that I couldn’t find any Linked Data, or even an RDF 
 store.
 The URIs you use are not very Cool URIs, and I tried to see if there was RDF 
 at the end of them by doing Content Negotiation, but there wasn’t.
 I am thinking of things like 
 http://tundra.csd.sc.edu/rol/view-person.php?id=291
 
 So I went away :-)
 
 For people like me, you could put something about how to see the RDF in an 
 About page (or if it is there, make it easier to find). You only get one 
 chance to snare people on the web, after all.
 Of course as Alfredo says, for spidering search engines, and it would have 
 helped me too, you need robots.txt (which I couldn’t find either), sitemap, 
 sitemap.xml, voiD description.
 
 Good luck!
 Hugh
 
 On 28 Jan 2014, at 04:12, WILDER, COLIN wilde...@mailbox.sc.edu wrote:
 
  Another question to you very helpful people–
 
  and apologies again for semi cross-posting
 
  Our LOD working group is having trouble publishing our data (see email 
  below) in RDF form. Our programmer, a master’s student, who is working 
  under the supervision of myself and a computer science professor, has 
  mapped sample data into RDF, has the triplestore on a D2RQ server 
  (software) on our server and has set up a SPARQL end-point on the latter. 
  But he has been unsuccessful so far getting 3 candidate semantic web search 
  engines (Falcons, Swoogle and Sindice) to  be able to find our data when he 
  puts a test query in to them. He has tried communicating with the people 
  who run these, but to little avail. Any suggestions about sources of 
  information, pointers, best practices for this actual process of publishing 
  LOD? Or, if you know of problems with any of those three search engines and 
  would suggest a different candidate, that would be great too.
 
  Thanks again,
 
  Colin Wilder
 
 
  From: WILDER, COLIN [mailto:wilde...@mailbox.sc.edu]
  Sent: Thursday, January 16, 2014 11:51 AM
  To: 'public-lod@w3.org'
  Subject: LOD for historical humanities information about people and texts
 
  To the many people who have kindly responded to my recent email:
 
  Thanks for your suggestions and clarifying questions. To explain a bit 
  better, we have a data curation platform called RL, which is a large, 
  complex web-based MySQL database designed for users to be able to simply 
  input, store and share data about social and textual networks with each 
  other, or to share it globally in RL’s data commons. The data involved are 
  individual data items, such as info about one person’s name, age, a book 
  title, a specific social relationship, etc. The entity types (in the 
  ordinary-language sense of actors and objects, not in the database tabular 
  sense) can be seen athttp://tundra.csd.sc.edu/rol/browse.php. The data 
  commons in RL is basically a subset of user data that users have elected 
  (irrevocably) to share with all other users of the system. NB there is a 
  lot of dummy data in the data commons right now because of testing.
 
  We are designing an expansion of RL’s functionality so as to publish data 
  from the data commons as LOD, so I am doing some preliminary work to assess 
  feasibility and fit

Re: General tuning for Dbpedia Spotlight

2014-01-17 Thread Hugh Glaser

Thank you for the responses, both on- and off-list.

So I see perhaps I should recast my question, with maybe wider scope.

I have a load of abstract-style text fragments - that is perhaps 100 words 
each, on a wide variety of topics, although there is a bit of a technical bent.

I want to be able to do linkage between them and to other things, based around 
our lovely Linked Data world.
That is, have lots triples something like
:docIDn :some-pred :conceptURI
It would be a bonus to know which words in the text triggered the generation of 
the triple.
Of course, the system doesn’t actually have to generate the triples - I can 
build them if I get sufficiently sensible output, including the sort of html 
output that Spotlight does.
And because it goes automatically to users, I need quite high precision, even 
if recall suffers (I think is the terminology).
Oh, and ideally free, although not necessarily.
My current preference is for dbpedia or freebase URIs, but wordnet is probably 
OK too.

I think this must be something that there are people who have done this (a 
lot). Or at least there should be.
There are certainly quite a lot of systems that can do it, some more or less 
playing well with Linked Data URIs.

I think my problem (apart from laziness) is that the systems I look at seem to 
want me to care about what they do, or at least engage with tuning and things, 
which means I need some understanding of what they do, which I don’t have (and 
I probably don’t care either :-) ).

So, does anyone (else) feel they can point me at a system for doing this that I 
can just use out of the box (possibly having been told some parameters to use)?

Of course, maybe I am just asking too much of the technology at the moment, but 
I can hope!
Best
Hugh
-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

General tuning for Dbpedia Spotlight

2014-01-16 Thread Hugh Glaser

Hi.
I am trying to use Dbpedia Spotlight to find stuff in arbitrary English texts.
Following the instructions, I found it very easy to download and install the 
whole shebang on my Mac laptop - thanks!
It does pretty well in finding stuff, but gets some strange things wrong for me 
(choosing people called Monday instead of the day of the week, for example, or 
Municipalities of Germany for Municipalities).
That’s fine - I understand that there is always a precision/recall thing going 
on.
But I want to use it to mark up web pages, so having even a small number of 
strange links is not too good.

So my question is:
What are the parameters I should set to get a set of results with high 
precision (even if low recall) for arbitrary English text?
I assume that I need to set Confidence and Annotation Score, and probably some 
Types.

Related to this, I am using the Lucene version. I see there is a Statistical 
version, but can’t work out what the difference might be. Should I be using 
that to get more precise results?

Sorry if this is somewhere in the docs, but I couldn’t find it easily.
My guess is that this is something that quite a few people have been through?

I am using it from php via http, if anyone can actually provide the code! :-)

Best
Hugh

Re: State of Open Source Semantic Web CMS

2013-12-29 Thread Hugh Glaser

Hi Christof,
On 27 Dec 2013, at 15:57, Christoph Seelus christoph.see...@fh-potsdam.de 
wrote:

 # State of Open Source Semantic Web CMS
 
 Hello there,
 
 back in october, I asked here for Semantic Web CMS, written in PHP. The 
 response I got on this list and directly via mail was great, so thanks again.
 
 At the moment, I'm writing a paper, regarding the state of Open Source CMS 
 with Semantic Web support in general.
 
 Again: The final goal is to use our (or any) OWL-based ontology 
 (http://isdc.gfz-potsdam.de/ontology/isdc_1.4.owl) as a knowledge foundation 
 in a content management system, which would enable us to enrich available 
 data with Linked Open Data.
 
 My list so far contains the following projects:
 
 - Drupal (https://drupal.org/)
 - OntoWiki (http://ontowiki.net)
 - Ximdex (http://www.ximdex.com/)
 - Dspace (http://www.dspace.org/)
Can you point me at where the Semantic Web bit on Dspace is documented please, 
as I can’t find it?

Also, you may want to include http://www.eprints.org
which does Linked Data against
http://www.eprints.org/ontology/
And in fact I have a SPARQL endpoint etc for all the ePrints RDF I can harvest.
http://foreign.rkbexplorer.com/

Best
Hugh

 - (DCThera, not released to the public yet)
 
 Any suggestions of other systems I overlooked?
 
 
 Thanks and best regards,
 
 Christoph

-- 
Hugh Glaser
  20 Portchester Rise
  Eastleigh
  SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

UK Photonics Portal

2013-12-14 Thread Hugh Glaser

Hi.
You might like to be able to point at this new example of a site built around 
Linked Data from multiple sources:
http://www.ukphotonics.org

We (Seme4) have built it over the last few months, funded by the UK’s 
Engineering and Physical Sciences Research Council (EPSRC) Centre for 
Innovative Manufacturing in Photonics at the University of Southampton.

Press release:
http://www.southampton.ac.uk/mediacentre/news/2013/dec/13_224.shtml

Best
Hugh

Re: Lang and dt in the graph. Was: Dumb SPARQL query problem

2013-12-02 Thread Hugh Glaser


On 2 Dec 2013, at 06:24, Ross Horne ross.ho...@gmail.com wrote:

 Andy is right (as usual!). With the proposed bnode encoding, the graph 
 becomes fatter each time the same triple is loaded.
But how much fatter was the question.
  
 RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer and 
 the parser, as summarised by David recently: 
 http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html
Ah yes, I forgot that everything is rosy now with 1.1 - sorry.
 
 Please don't get back into mixing up the lexer and the parser. The lexical 
 spaces of the basic datatypes are disjoint, so in any language we can just 
 write:
  - 999  instead of 999^^xsd:integer
  - 9.99 instead of 9.99^^xsd:decimal
  - WWV instead of WWV^^xsd:string
  - 2013-06-6T11:00:00+01:00 instead of 
 2013-06-6T11:00:00+01:00^^xsd:dateTime
 
 As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and turns 
 the characters into a token. A token consists of a string, called an 
 attribute value, plus a token name, e.g. 999^^xsd:integer. Only a 
 relatively small handful of people writing compilers for languages should 
 have to care about how tokens are represented, not end users of languages.
Well personally I prefer the first version I used for my course on this when it 
came out in 1977, the Dragon Book - Principles of Compiler Design, before 
Sethi polluted it with all that type-checking stuff :-)
Actually, it wasn’t about blurring the lexer and parser - the graph semantics 
were different.
It was closer to having two representations of zero in the machine (as some 
machines used to have), and having to write code to ensure that you coped with 
both of them.

Of course your examples do raise the issue of multiple representations for the 
same thing if the user is not careful.
23.4, 23.5, 23.0, 23.2, 23, 23.1, 023.0, 023 all of which are different RDF 
terms.
Would a lexer/parser make 23.00 and 23.000 different RDF terms, I find myself 
thinking I should know, but don’t - my guess is it should.
(RDF 1.1 doesn’t seem to give guidance on this.)

And I find myself getting strangely interested in your dateTime example.
I think most lexers will reject it?
Or friendly ones will treat it as the correct lexical form:
2013-06-06T11:00:00+01:00
(You need to pad the day)

So maybe we need to get a bit more explicit about the RDF term for dateTime 
(unless I have missed it)?
That the RDF term is always in UTC? - This is what the xdd standard says.
That the RDF term always has a fractional second part? - Good question.
That the RDF term always has a timezone? - Better question.
(See http://www.w3.org/TR/xmlschema-2/#dateTime )
Or are we happy with many different representations of a given dateTime?
(Of course xsd:dateTime does get into problems with year zero, but lets not 
worry about that :-) )

But I guess my friendly RDF parser gnomes (all hail!) already have stories for 
all this.

Best
Hugh
 
 For language tags, a little simple conventional datatype subtyping (as 
 opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a 
 programmer that writes regex(WWV2013@en, WWV) clearly meant 
 regex(WWV2013, WWV) and shouldn't have to care about the distinction, 
 unless I am mistaken.
 
 Regards,
 
 Ross
 
 [1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and tools. 
 1986
 [2] Local Type Checking for Linked Data Consumers. 
 http:/dx.doi.org/10.4204/EPTCS.123.4
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: An

2013-12-02 Thread Hugh Glaser

Thanks Andy,
Sorry, I had a brain-fart (senior moment?), and forgot that we were dealing 
with RDF 1.1.
I guess I have suffered the pain of unknown presence of datatypes in the RDF 
terms for literals for so long it takes a while for me to accept that it has 
been fixed.
Thanks so much to the people that did it.

Using the bnode solution would be like bringing back the complexity of the 
optional datatype, which would bring back the pain!
Best
Hugh

On 2 Dec 2013, at 11:04, Andy Seaborne andy.seabo...@epimorphics.com wrote:

 
 
 On 01/12/13 23:02, Hugh Glaser wrote:
 Hi.
 Thanks.
 A bit of help please :-)
 On 1 Dec 2013, at 17:36, Andy Seaborne andy.seabo...@epimorphics.com wrote:
 
 
 
 On 01/12/13 12:25, Tim Berners-Lee wrote:
 
 On 2013-11 -23, at 12:21, Andy Seaborne wrote:
 
 
 
 On 23/11/13 17:01, David Booth wrote:
 [...]
 This would have been fixed if the RDF model had been changed to
 represent the language tag as an additional triple, but whether this
 would have been a net benefit to the community is still an open
 question, as it would add the complexity of additional triples.
 
 Different.  Maybe better, maybe worse.
 
 
 Do you want all your abc to be the same language?
 
   abc rdf:lang en
 
 or multiple languages:
 
   abc rdf:lang cy .
   abc rdf:lang en .
 
 
 ?
 
 Unlikely - so it's bnode time ...
 
 :x :p [ rdf:value abc ; rdf:lang en ] .
 
 The nice thing about this in a n3rules-like system (where FILTER and WHERE 
 clauses are not distinct and some properties are just builtins)   is that 
 rdf:value and rdf:lang can be made builtins so a datatypes literal can 
 behave just like a bnode with two properties if you want to.
 
 But I have always preferred it with not 2 extra triples, just one:
 
:x  :p [ lang:en cat ]
 
 which allows you also to write things like
 
:x :p  [ lang:en cat] , [ lang:fr chat ].
 
 or if you use the  ^  back-path syntax of N3 (which was not taken up in 
 turtle),
 
:x :p cat^lang:en,  chat^lang:fr .
 
 You can do the same with datatypes:
 
:x :q   2013-11-25^xsd:date .
 
 instead of
 
:x :q   2013-11-25^xsd:date .
 
 This seems to bring it it's own issues.  These bnodes seem to be like 
 untidy literals as considered in RDF-2004 WG.
 
 :x  :p [ lang:en cat ]
 :x  :p [ lang:en cat ]
 :x  :p [ lang:en cat ]
 
 is 6 triples.
 
 :x :p :q .
 :x :p :q .
 :x :p :q .
 
 is 1 triple.  Repeated read in same file - this already causes confusion.
 
 :x :p cat .
 :x :p cat .
 :x :p cat .
 
 is 1 triple or is it 3 triples because it's really
 Is it not 1 triple if you take the first view or 6 triples if you take the 
 second?
 Or probably I don’t understand bnodes properly!?
 
 :x :p [ xsd:string cat ].
 
 :x :p 123 .
 :x :p 123 .
 :x :p 123 .
 
 It makes it hard to ask do X and Y have the same value for :p? - it gets 
 messy to consider all the cases of triple patterns that arise and I would 
 not want to push that burden back onto the application writer. Why can't 
 the app writer say find me all things which a property value less than 45?
 I see it makes it hard, but I don’t see it as any harder than what we have 
 now, with multiple patterns that do and don’t have ^^xsd:String
 As I said before, with the ^^xsd you need to consider a bunch of patterns to 
 do the query - again, it is messy, but is it messier?
 
 Actually I find
  { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . }
 with a possible also
  { ?s1 ?p ?str . ?s2 ?p ?str . }
 
 Let's talk numbers (strings have a lexical form that looks like the value) 
 and have 123 as shorthand for [ xsd:integer 123 ].  And let's ignore 
 rdf:langString.
 
 { ?s1 ?p ?x . ?s2 ?p ?x . }
 
 does not care whether ?x is a URI or a literal at the moment.  Your example 
 is a good one as it's ?p so the engine does not know whether it's a 
 datatype property or a object property.
 
 With bnodes this may match, it probably doesn't.  It depends on the 
 micro-detail of the data.
 
 # No.
 :x1 :p 123 .
 :x2 :p 123 .
 
 # Yes
 :s1 :p _:a .
 :s2 :p _:a
 _:a xsd:string abc .
 
 Sure, if you know it's an integer
   ?s1 ?p [ xsd:integer ?str ]
 or even:
 { ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . }
 
 { ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . }
 
 though I think this is shifting unnecessary cognitive model onto the app 
 writer.
 
 I didn't say the access language was SPARQL :-)  I meant how people think 
 about accessing the data.  Datatype properties are really very bizarre in 
 this world.
 
 And this is at the fine grain level.  Now apply to real queries that are 10s 
 of lines long.
 
 
 { ?s1 ?p [ xsd:integer 123 ] }
 { ?s1 ?p 123 }
 
 it might be possible to make that bNode infer to the value 123 which would be 
 a win.  Making literals value-centric not appearance/struct based would be a 
 very nice.
 
 
 And counting.  Counting matters to people (e.g. facetted browse)
 
   Andy
 
 PS I started my first email draft with the argument that it was better to 
 have the more triples form

Understanding datatypes in RDF 1.1 - was various things

2013-12-02 Thread Hugh Glaser

Hmm,
My head is spinning a bit now - I’m trying to understand something simple - 
1^^xsd:boolean.

So my reading says that is a valid lexical form (in the lexical space) for the 
value ’true’ (in the value space).
(http://www.w3.org/TR/rdf11-concepts/#dfn-lexical-space )
I think that ‘value space’ is where the other documents talk about 'RDF term’, 
but I’m not sure.

And I also I read:
Literal term equality: Two literals are term-equal (the same RDF literal) if 
and only if the two lexical forms, the two datatype IRIs, and the two language 
tags (if any) compare equal, character by character.”
(http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal )

So the language processor will (must) take my lexical form
1^^xsd:boolean
and make it an RDF term
“true

And then if I ask the store (sorry, I am rather engineering in this) if 2 terms 
are equal, it will always be comparing two similar terms (from the literal 
space), (probably, but see below):
“true^^xsd:boolean

And I can expect a sensible querying engine to consider
1^^xsd:boolean
as a shorthand for
“true

It could be confusing, which it was for a bit for me, because the equality 
constraint says the two lexical forms”, but in this case there is more than 
one lexical from for the value form.
So I think it means that a processor must always choose the same lexical form 
for any given value form.
I am guessing that processors could consistently choose
1^^xsd:boolean
as the value form for
“true
but that would be pretty perverse.

A little further confusion for me arises as to whether the datatype IRI is part 
of the value space.
I have taken off any ^^xsd:boolean from my rendering of the “true” in the value 
space because the documentation seems to leave it out.
(The table says: '“true”, xsd:boolean’ and ‘true’ are the literal and value.)
So I am left assuming that the datatype IRI is somewhere in the RDF term world, 
although we know it isn’t in the graph.
Not something I need to worry about as a consumer, as it is all an internal 
issue, I think, but I thought I would mention it.

Best
Hugh

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Lang and dt in the graph. Was: Dumb SPARQL query problem

2013-12-01 Thread Hugh Glaser

Hi.
Thanks.
A bit of help please :-)
On 1 Dec 2013, at 17:36, Andy Seaborne andy.seabo...@epimorphics.com wrote:

 
 
 On 01/12/13 12:25, Tim Berners-Lee wrote:
 
 On 2013-11 -23, at 12:21, Andy Seaborne wrote:
 
 
 
 On 23/11/13 17:01, David Booth wrote:
 [...]
 This would have been fixed if the RDF model had been changed to
 represent the language tag as an additional triple, but whether this
 would have been a net benefit to the community is still an open
 question, as it would add the complexity of additional triples.
 
 Different.  Maybe better, maybe worse.
 
 
 Do you want all your abc to be the same language?
 
   abc rdf:lang en
 
 or multiple languages:
 
   abc rdf:lang cy .
   abc rdf:lang en .
 
 
 ?
 
 Unlikely - so it's bnode time ...
 
 :x :p [ rdf:value abc ; rdf:lang en ] .
 
 The nice thing about this in a n3rules-like system (where FILTER and WHERE 
 clauses are not distinct and some properties are just builtins)   is that 
 rdf:value and rdf:lang can be made builtins so a datatypes literal can 
 behave just like a bnode with two properties if you want to.
 
 But I have always preferred it with not 2 extra triples, just one:
 
  :x  :p [ lang:en cat ]
 
 which allows you also to write things like
 
  :x :p  [ lang:en cat] , [ lang:fr chat ].
 
 or if you use the  ^  back-path syntax of N3 (which was not taken up in 
 turtle),
 
  :x :p cat^lang:en,  chat^lang:fr .
 
 You can do the same with datatypes:
 
  :x :q   2013-11-25^xsd:date .
 
 instead of
 
  :x :q   2013-11-25^xsd:date .
 
 This seems to bring it it's own issues.  These bnodes seem to be like untidy 
 literals as considered in RDF-2004 WG.
 
 :x  :p [ lang:en cat ]
 :x  :p [ lang:en cat ]
 :x  :p [ lang:en cat ]
 
 is 6 triples.
 
 :x :p :q .
 :x :p :q .
 :x :p :q .
 
 is 1 triple.  Repeated read in same file - this already causes confusion.
 
 :x :p cat .
 :x :p cat .
 :x :p cat .
 
 is 1 triple or is it 3 triples because it's really
Is it not 1 triple if you take the first view or 6 triples if you take the 
second?
Or probably I don’t understand bnodes properly!?
 
 :x :p [ xsd:string cat ].
 
 :x :p 123 .
 :x :p 123 .
 :x :p 123 .
 
 It makes it hard to ask do X and Y have the same value for :p? - it gets 
 messy to consider all the cases of triple patterns that arise and I would not 
 want to push that burden back onto the application writer. Why can't the app 
 writer say find me all things which a property value less than 45?
I see it makes it hard, but I don’t see it as any harder than what we have now, 
with multiple patterns that do and don’t have ^^xsd:String
As I said before, with the ^^xsd you need to consider a bunch of patterns to do 
the query - again, it is messy, but is it messier?

Actually I find
 { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . }
with a possible also
 { ?s1 ?p ?str . ?s2 ?p ?str . }
much easier to work with than something that has this stuff optionally tacked 
on the end of literals, that isn’t really part of the string but isn’t part of 
RDF either.
Or maybe it is part of the literal but not the string? Surely that should be 
clear to me?

I just don’t see there is a difference in complexity for querying - it is just 
that the current situation is genuinely messier for consumers because there are 
two notations in play, whereas if RDF is so good we should have everything in 
RDF.
Not that I would say anything should change :-) it ain’t actually broken, but 
it could get fixed.

(Oh dear, Hugh showing his ignorance of the fancy stuff again)

Best
Hugh
 
 To give that, if we add interpretation of bNodes used in this value form 
 (datatype properties vs object properties ?), so you can ask about shared 
 values, we have made them tidy again.  But then it is little different from 
 structured literals with @lang and ^^datatype.
 
 Having the data model and the access model different does not gain anything.  
 The data model should reflect the way the data is accessed.
 
 Like RDF lists, or seq/alt/bag, encoding values in triples is attractive in 
 its uniformity but the triples nature always shows through somewhere, 
 making something else complicated.
 
   Andy
 
 PS Graph leaning does not help because you can't add data incrementally if 
 leaning is applied at each addition.
 
 I suggested way back these properties as a way of putting the info into the 
 graph
 but my suggestion was not adopted.  I think it would have made the model
 more complete which would have been a good think, though
 SPARQL would need to have language-independent query matching as a  special 
 case -- but
 it does now too really.
 
 (These are interpretation properties.  I must really update
 http://www.w3.org/DesignIssues/InterpretationProperties.html)
 
 Units are fun as properties too. http://www.w3.org/2007/ont/unit
 
 Tim
 
 
 Andy

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Dumb SPARQL query problem

2013-11-23 Thread Hugh Glaser

Its’ the other bit of the pig’s breakfast.
Try an @en

On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote:

 Hi,
 
 Sorry to bother the list, but I'm stumped by what should be a simple SPARQL 
 query.  When applied to the dbpedia end-point [1], this search:
 
 PREFIX foaf: http://xmlns.com/foaf/0.1/
 PREFIX dbpedia-owl: http://dbpedia.org/ontology/
 SELECT *
 WHERE {
 ?pers a foaf:Person .
 ?pers foaf:surname Malik .
 OPTIONAL {?pers dbpedia-owl:birthDate ?dob }
 OPTIONAL {?pers dbpedia-owl:deathDate ?dod }
 OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } 
 OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } 
 }
 LIMIT 100
 
 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, 
 you get a result set which includes a Malik with the desired surname 
 property.  I'm clearly being dumb, but in what way? :-) 
 
 (I've tried adding ^^xsd:string to the literal, but no joy.)
 
 Thanks,
 
 Richard
 [1] http://dbpedia.org/sparql
 -- 
 Richard Light

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: Dumb SPARQL query problem

2013-11-23 Thread Hugh Glaser

Pleasure.
Actually, I found this:
http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string

I said it is a pig’s breakfast because you never know what the RDF publisher 
has decided to do, and need to try everything.
So to match strings efficiently you need to do (at least) four queries:
“cat”
“cat”@en
“cat”^^xsd:string
“cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, 
but I think it’s only one of them :-)

Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = 
“cat”)…”, but that its likely to be much slower.

This means that you may need to do a lot of queries.
I built something to look for matching strings (of course! - finding sameAs 
candidates) where the RDF had been gathered from different sources.
Something like
SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s }
would have been nice.
I’ll leave it as an exercise to the reader to work out how many queries it 
takes to genuinely achieve the desired effect without using FILTER and str.

Unfortunately it seems that recent developments have not been much help here, 
but I may be wrong:
http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals

I guess that the truth is that other people don’t actually build systems that 
follow your nose to arbitrary Linked Data resources, so they don’t worry about 
it?
Or am I missing something obvious, and people actually have a good way around 
this?

To me the problem all comes because knowledge is being represented outside the 
triple model.
And also because of the XML legacy of RDF, even though everyone keeps saying 
that is only a serialisation of an abstract model.
Ah well, back in my box.

Cheers.

On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote:

 
 On 23/11/2013 10:30, Hugh Glaser wrote:
 Its’ the other bit of the pig’s breakfast.
 Try an @en
 
 Magic!  Thanks.
 
 Richard
 On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk
  wrote:
 
 
 Hi,
 
 Sorry to bother the list, but I'm stumped by what should be a simple SPARQL 
 query.  When applied to the dbpedia end-point [1], this search:
 
 PREFIX foaf: 
 http://xmlns.com/foaf/0.1/
 
 PREFIX dbpedia-owl: 
 http://dbpedia.org/ontology/
 
 SELECT *
 WHERE {
 ?pers a foaf:Person .
 ?pers foaf:surname Malik .
 OPTIONAL {?pers dbpedia-owl:birthDate ?dob }
 OPTIONAL {?pers dbpedia-owl:deathDate ?dod }
 OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } 
 OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } 
 }
 LIMIT 100
 
 yields no results. Yet if you drop the '?pers foaf:surname Malik .' 
 clause, you get a result set which includes a Malik with the desired 
 surname property.  I'm clearly being dumb, but in what way? :-) 
 
 (I've tried adding ^^xsd:string to the literal, but no joy.)
 
 Thanks,
 
 Richard
 [1] 
 http://dbpedia.org/sparql
 
 -- 
 Richard Light
 
 
 -- 
 Richard Light

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Re: OpenRefine

2013-10-29 Thread Hugh Glaser

Ah, flesh search still wins, thanks to Ruben.
Mind you, I have to say that for once I read all the documentation I could 
find, and still wasted several very frustrating hours, if not a day.
Perhaps someone knows how to update http://openrefine.org or the Github thingy 
to point at 
https://groups.google.com/forum/#!msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ ? 

On 29 Oct 2013, at 10:06, Sergio Fernández 
sergio.fernan...@salzburgresearch.at wrote:

 Exactly that reference I was looking for, but I didn't find it; so your 
 search skills are not so bad after all ;-)
 
 On 29/10/13 00:30, Hugh Glaser wrote:
 Thank you for all the responses.
 I can report success (and I am pleased to say it doesn’t seem to have been 
 my stupidity, although it may be my lack of web search skills!)
 
 Ruben Verborgh communicated quickly and efficiently off list, and pointed me 
 at
 https://groups.google.com/d/msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ
 which explains that things died last February, and I needed to add a 
 replacement reconciliation service.
 I now have reconciliation!
 
 Best
 Hugh

OpenRefine

2013-10-28 Thread Hugh Glaser

Hi.
I’m not sure where to ask, so I’ll try my friends here.
I was having a go at OpenRefine yesterday, and I can’t get it to reconcile, try 
as I might - I have even watched the videos again.
I’m doing what I remember, but it is a while ago.
Are there others currently using it successfully?
Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently.
Cheers
--
Hugh

Re: OpenRefine

2013-10-28 Thread Hugh Glaser

Unfortunately I’ve not been a regular user, so it is probably my stupidity.
Basically, I go through the Reconcile process using the Freebase Reconcile 
service, but it doesn’t find anything to reconcile, even though I have fixed it 
so that there is an entry that has exactly the same text as the Freebase entry 
title.
It just shows as if there are no positive results.
I try clicking on the search for match after that, but it never comes back, 
which makes me wonder.

On 28 Oct 2013, at 18:53, John Erickson olyerick...@gmail.com wrote:

 Hugh, I wonder if you could be more specific regarding the troubles
 you had with OpenRefine?
 
 One of our students also had trouble, and I'm wondering if it might be
 the same problem.
 
 Like you, reconciliation with Refine has worked for me in the past but
 I haven't tried the same process using OpenRefine...
 
 On Mon, Oct 28, 2013 at 2:41 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Hi.
 I’m not sure where to ask, so I’ll try my friends here.
 I was having a go at OpenRefine yesterday, and I can’t get it to reconcile, 
 try as I might - I have even watched the videos again.
 I’m doing what I remember, but it is a while ago.
 Are there others currently using it successfully?
 Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently.
 Cheers
 --
 Hugh
 
 
 
 
 -- 
 John S. Erickson, Ph.D.
 Director, Web Science Operations
 Tetherless World Constellation (RPI)
 http://tw.rpi.edu olyerick...@gmail.com
 Twitter  Skype: olyerickson

--
Hugh
023 8061 5652

Re: OpenRefine

2013-10-28 Thread Hugh Glaser

Thank you for all the responses.
I can report success (and I am pleased to say it doesn’t seem to have been my 
stupidity, although it may be my lack of web search skills!)

Ruben Verborgh communicated quickly and efficiently off list, and pointed me at
https://groups.google.com/d/msg/openrefine/GARvNqvVlqc/BhQatfKjFRIJ
which explains that things died last February, and I needed to add a 
replacement reconciliation service.
I now have reconciliation!

Best
Hugh

On 28 Oct 2013, at 19:39, Sergio Fernández 
sergio.fernan...@salzburgresearch.at wrote:

 Hi Hugh,
 
 which version of OpenRefine, and the Freebase extension are you using?
 I'm not totally sure, but I think few months ago they've change something in 
 the API.
 
 Anyway, for such concrete questions of a tool, I think it is much better to 
 directly ask on its discussion list, in this case:
 
  http://groups.google.com/d/forum/openrefine
 
 BTW, in verson 0.7.0 of the RDF Refine extension Stanbol-based reconciliation 
 support has been added; so I'd recommend you to give it a try too.
 
 Cheers,
 
 On 28/10/13 19:59, Hugh Glaser wrote:
 Unfortunately I’ve not been a regular user, so it is probably my stupidity.
 Basically, I go through the Reconcile process using the Freebase Reconcile 
 service, but it doesn’t find anything to reconcile, even though I have fixed 
 it so that there is an entry that has exactly the same text as the Freebase 
 entry title.
 It just shows as if there are no positive results.
 I try clicking on the search for match after that, but it never comes back, 
 which makes me wonder.
 
 On 28 Oct 2013, at 18:53, John Erickson olyerick...@gmail.com wrote:
 
 Hugh, I wonder if you could be more specific regarding the troubles
 you had with OpenRefine?
 
 One of our students also had trouble, and I'm wondering if it might be
 the same problem.
 
 Like you, reconciliation with Refine has worked for me in the past but
 I haven't tried the same process using OpenRefine...
 
 On Mon, Oct 28, 2013 at 2:41 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Hi.
 I’m not sure where to ask, so I’ll try my friends here.
 I was having a go at OpenRefine yesterday, and I can’t get it to 
 reconcile, try as I might - I have even watched the videos again.
 I’m doing what I remember, but it is a while ago.
 Are there others currently using it successfully?
 Or is it possibly a Mavericks (OSX) upgrade thing, which I did recently.
 Cheers
 --
 Hugh
 
 
 
 
 --
 John S. Erickson, Ph.D.
 Director, Web Science Operations
 Tetherless World Constellation (RPI)
 http://tw.rpi.edu olyerick...@gmail.com
 Twitter  Skype: olyerickson
 
 --
 Hugh
 023 8061 5652
 
 
 
 -- 
 Sergio Fernández
 Senior Researcher
 Knowledge and Media Technologies
 Salzburg Research Forschungsgesellschaft mbH
 Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
 T: +43 662 2288 318 | M: +43 660 2747 925
 sergio.fernan...@salzburgresearch.at
 http://www.salzburgresearch.at
 

--
Hugh
023 8061 5652

Re: TRank (Ranking Entity Types) pipeline released open-source at ISWC2013

2013-10-26 Thread Hugh Glaser

Hi Michele,
Looks exciting.
I wanted to have a go, but...
Can you help me find the documentation please?
I am a newby for quite a bit of this - not a great github user, and never used 
scala before, so I am probably missing something obvious, but was prompted to 
try because of the exhaustive documentation” that would help me!
Best
Hugh

On 23 Oct 2013, at 01:48, Michele Catasta michele.cata...@epfl.ch wrote:

 TRank is a pipeline that, given a textual/HTML document as input,
 performs named-entity recognition, entity linking/disambiguation, and
 entity type ranking/selection from a variety of type hierarchies
 including DBpedia, YAGO, and schema.org.
 
 TRank has been nominated as best paper at ISWC2013.
 We have now released TRank open-source for others to use:
 https://github.com/MEM0R1ES/TRank
 
 It provides good test coverage, continuous build, and exhaustive
 documentation. You can use it as is, or easily integrate your own
 entity type ranking algorithm to compare against or to build on top of
 TRank. Bug reports and pull requests are welcome!
 
 We also recommend to watch/star the GitHub repository, as we will be
 releasing soon the MapReduce implementation of TRank.
 
 
 -- 
 Best,
 Michele
 
 

--
Hugh
023 8061 5652

Re: How to publish SPARQL endpoint limits/metadata?

2013-10-09 Thread Hugh Glaser

Hmm.
In my mind, a dataset is rather abstract - a collection of data that is being 
made available.
They may use a combination of any or all of SPARQL endpoints, downloads of 
dumps, and resolvable URIs (Linked Data).
They may also make it available in other forms, but we are possibly primarily 
concerned with RDF here, although it would be a shame if we could not embrace 
the more abstract concept.

As a consumer (always!), I would like to come to where I think the dataset is 
being published, or look in some aggregator index, and easily find out all 
the stuff I need to know about the dataset, and how I might use it.

That's my starting point.

So in our system like many others around I think, for example, when we get a 
new URI, we hope there is a SPARQL endpoint, as that is our preferred format.
We need to use internal information to do this, so we can only do it for known 
places.
If not, then we try to simply resolve it.
Failing that, we could look in a cache of dumps we have found, but don't 
actually at the moment.

It would be good, for example, if resolving the URI always told us where there 
is metadata about a SPARQL endpoint that is recommended as having RDF about 
this URI.
In fact, we do this for co-reference information for our URIs (we use a bespoke 
predicate, but should probably have been using seeAlso), but should probably do 
it for SPARQL endpoint as well.
The metadata should be at the end of a resolvable URI, and the SPARQL endpoint 
should hold its own metadata in it, etc. etc..

So having a separation between SPARQL Service Description and voiD would just 
be plain wrong.
They must embrace each other, so that consumers can easily work out how to use 
what they think of as a dataset.

I would also add that if I take a REST-like view of the world, which I do for 
accessing a SPARQL endpoint (I am simply retrieving a document), the 
distinction between dataset and service becomes very blurred.
Even calling it a SPARQL Service Description seems rather old-fashioned to me.

Best
Hugh

On 9 Oct 2013, at 11:04, Barry Norton barrynor...@gmail.com
 wrote:

 On Wed, Oct 9, 2013 at 10:55 AM, Frans Knibbe | Geodan 
 frans.kni...@geodan.nl wrote:
 
 Shouldn't that be the SPARQL Service Description instead of VoID? In my mind, 
 SPARQL endpoints and datasets are separate entities.
 
 
 +1

Re: How to publish SPARQL endpoint limits/metadata?

2013-10-09 Thread Hugh Glaser


On 9 Oct 2013, at 12:46, Barry Norton barrynor...@gmail.com
 wrote:

 
 
 
 On Wed, Oct 9, 2013 at 12:15 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 [...]
 
 So having a separation between SPARQL Service Description and voiD would just 
 be plain wrong.
 They must embrace each other, so that consumers can easily work out how to 
 use what they think of as a dataset.
 
 I would also add that if I take a REST-like view of the world, which I do for 
 accessing a SPARQL endpoint (I am simply retrieving a document), the 
 distinction between dataset and service becomes very blurred.
 Even calling it a SPARQL Service Description seems rather old-fashioned to 
 me.
 
 Hugh, I tend to agree (certainly about calling them 'service descriptions', 
 ugh). From a REST point of view, void:Datasets, named graphs (capable of 
 RESTful interaction via Graph Store Protocol) and SPARQL query/update 
 'endpoints' (ugh again) are all resources that allow one to find other, more 
 specific, resources.
 
 That said if we accept that one needs some up-front guidance on what those 
 resources allow you to get to (a big 'if', if the REST community, but I don't 
 think anyone in ours would be happy with just a media type) then we want them 
 to be self-describing in RDF.
Always  everything!
 At the same time, the relationships we want to attach to the query/update 
 endpoints are semi-distinct, no? You'd agree these are different classes of 
 resource?
Yes, or perhaps I am saying different sub-classes?
Thinking of it that way, I then look at Frans' list of the kind of thing he 
would like to be able to say about endpoints.
It seems that at least the following might be common to almost any delivery 
mechanism for datasets:

• The time period of the next scheduled downtime
• (the URI of) a document that contains a human readable SLA or fair 
use policy for the service
• URIs of mirrors

So, yes, there are semi-distinctions, but if that implies 
semi-non-distinctions, there should be very useful mileage in trying to make 
such things deeply compatible.
Or at least starting from there?

Best
Hugh
 
 Barry

Re: ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-10-04 Thread Hugh Glaser

Hi.
Chris has suggested I send the following to the LOD list, as it may be of 
interest to several people:

Hi Chris.
Great stuff!

I have a question.
Or would you prefer I put it on the LOD list for discussion?

It is about url encoding.

Dbpedia:
http://dbpedia.org/page/Ashford_%28borough%29 is not found
http://dbpedia.org/page/Ashford_(borough) works, and redirects to
http://dbpedia.org/resource/Borough_of_Ashford
Wikipedia:
http://en.wikipedia.org/wiki/Ashford_%28borough%29 works
http://en.wikipedia.org/wiki/Ashford_(borough) works
Both go to the page with content of 
http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the address 
bar doesn't change.

So the problem:
I usually find things in wikipedia, and then use the last bit to construct the 
dbpedia URI - I suspect lots of people do this.
But as you can see, the url encoded URI, which can often be found in the wild, 
won't allow me to do this.
There are of course many wikipedia URLs with ( and ) in them - (artist), 
(programmer), (borough) etc.
It is also the same with comma and single quote.

I think this may be different from 3.8, but can't be sure - is it intended?

Very best
Hugh

Re: SPARQL results in RDF

2013-09-25 Thread Hugh Glaser

You'll get me using CONSTRUCT soon :-)
(By the way, Tim's actual CONSTRUCT WHERE query isn't allowed because of the 
FILTER).

In the end, I just wrote a little service to process the XML into turtle, so I 
do want I want now.
The problem is that the only result format I can rely on an endpoint giving is 
XML:- CSV and TSV (the other standards) which would have been easier are not 
always supported, it seems.
One thing I was trying to do (as Tim distinguished) was have the result set, 
bindings and all, in RDF, if for nothing else than credibility and PR.
Because it is true that people I explain Linked Data, RDF and conneg to and 
then go on to RDF stores just can't understand how I can tell them about this 
wonderful RDF, but when they ask or even try to do a conneg to the endpoint 
they don't get RDF.

I think one answer is to ignore SELECT completely, and just talk about 
CONSTRUCT.
It makes a lot more sense - in fact I might do that myself.

One fly in the ointment for that is that (as far as I can tell), even though I 
get RDF turtle or whatever back from an endpoint, it doesn't allow me to conneg 
for Accept:application/rdf+xml
At least dbpedia seems to give 406 Unacceptable.
Is there some adjustment that could be made here?
I know it would be a fudge, but if I request Accept:application/rdf+xml on a 
SPARQL endpoint, using a CONSTRUCT, would it be so bad to actually return 
RDFXML?

Thanks for all the interesting discussion.
Hugh

On 25 Sep 2013, at 10:05, Stuart Williams s...@epimorphics.com wrote:

 On 25/09/2013 00:23, Tim Harsch wrote:
 That idea seems very similar to the DELETE WHERE already in SPARQL 1.1, so
 maybe to be consistent with that existing syntax it should be CONSTRUCT 
 WHERE
 
 Hmmm... something like:
 
   http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#constructWhere
 
 Stuart
 --
 
 
 On Mon, Sep 23, 2013 at 3:08 PM, Tim Berners-Lee ti...@w3.org
 mailto:ti...@w3.org wrote:
 
 
 
1) I can see Hugh's frustration that the RDF system is incomplete
in a way.   You tell everyone you have a model which can
be used for anything and then make something which doesn't use it.
What's wrong with this picture?
 
Standardising/using/adopting
http://www.w3.org/2001/sw/DataAccess/tests/result-set
would solve that.
 
(The file actually defines terms like
http://www.w3.org/2001/sw/DataAccess/tests/result-set#resultVariable
without the .n3)
 
2)   Different (I think) from what you want Hugh, but something I have
thought would be handy would b a CONSTRUCT *  where it returns the sub
graphs it matches as turtle, ideally without duplicates.
This would be nice for lots of things, such as extracting a subset of a 
 dataset.
 
CONSTRUCT * WHERE {  ?x name ?y; age ?a; ?p ?o.} FILTER { a  18 }
 
Tim
 
On 2013-09 -23, at 07:03, Andy Seaborne wrote:
 
 DAWG did at one time work with result sets encoded in RDF for the 
 testing work.

 As the WG progressed, it was clear that implementation of testing
 was based on result set comparison, and an impl needed to grok the XML 
 results encoding anyway.  Hence the need for the RDF form dwindled but it's 
 still there:

http://www.w3.org/2001/sw/DataAccess/tests/result-set.n3

 Apache Jena will still produce it if you ask it nicely.

   Andy


 
 
 
 
 -- 
 Epimorphics Ltdwww.epimorphics.com
 Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
 Tel: 01275 399069
 
 Epimorphics Ltd. is a limited company registered in England (number 7016688)
 Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 
 6PT, UK

SPARQL results in RDF

2013-09-21 Thread Hugh Glaser

I was saying to someone the other day that it is bizarre and painful that you 
can't get SPARQL result sets in RDF, or at least there isn't a standard 
ontology for them.
But it looks like I was wrong.
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgquery=select+distinct+*+where+{%3Fs+%3Fp+%3Fo}+LIMIT+100format=application%2Frdf%2Bxml
happily gives me what I was expecting, and also gives me NTriples if I want 
them.
But the NS is http://www.w3.org/2005/sparql-results#
which doesn't give me what I was expecting (it is ordinary XML).
I did find what I think is the latest version, but it eschews RDF, and only 
talks about XML, JSON, CSV and TSV formats.

Can anyone shed any light on where things are on all this please?
Cheers
Hugh

Re: SPARQL results in RDF

2013-09-21 Thread Hugh Glaser

Many thanks, William, and for confirming so quickly.
(And especially thanks for not telling me that CONSTRUCT does what I want!)
I had suddenly got excited that RDF might actually be useable to represent 
something I wanted to represent, just like we tell other people :-)
So it is all non-standard, as I suspected.
Ah well, I'll go back to trying to work with XML stuff, instead of using my 
usual RDF tools :-(
Very best
Hugh

On 21 Sep 2013, at 19:14, William Waites w...@styx.org
 wrote:

 Hi Hugh,
 
 You can get results in RDF if you use CONSTRUCT -- which is basically
 a special case of SELECT that returns 3-tuples and uses set semantics
 (does not allow duplicates), but I imagine that you are aware of this.
 
 Returning RDF for SELECT where the result set consists in n-tuples
 where n != 3 is difficult because there is no direct way to represent
 it. 
 
 Also problematic is that there *is* a concept of order in SPARQL query
 results while there is not with RDF.
 
 Also the use of bag semantics allowing duplicates which also does not
 really work with RDF.
 
 These, again, could be kludged with reification, but that is not very
 elegant. 
 
 So most SELECT results are not directly representable in RDF.
 
 Cheers,
 -w

Re: SPARQL results in RDF

2013-09-21 Thread Hugh Glaser

Thanks Jerven, you may well be right!
SELECT DISTINCT * WHERE
{ ?s foo:bar ?o }
would do.
And things like
SELECT DISTINCT * WHERE
{ ?v1 foo:bar ?o . ?v1 ?p1 ?v2 . ?v2 ?p2 ?v3 }
and then probably get back an identifier for each result, so that I can find 
out what are the values of the ?p* and ?v*
I think essentially the sort of thing that dbpedia/virtuoso is giving me.
(By the way, Kingsley, replying to this has caused me to notice that the rdfxml 
does not rapper very nicely - sorry to report!
rapper: Error - URI file:///home/hg/sparql.rdf:8 - property element 'solution' 
has multiple object node elements, skipping.)

Best
Hugh

On 21 Sep 2013, at 23:32, Jerven Bolleman jerven.bolle...@isb-sib.ch
 wrote:

 Hi Hugh,
 
 I think you disregarded the CONSTRUCT queries a bit to quickly. This is what 
 you use when you want to get back triples.
 If you want back result columns you use SELECT. If you want describe to the 
 concept of result columns in RDF then you are
 on your own.
 
 Maybe if you explain what you want to represent then we can have a bit more 
 of an informed discussion.
 
 Regards,
 Jerven
 On Sep 21, 2013, at 8:38 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 
 Many thanks, William, and for confirming so quickly.
 (And especially thanks for not telling me that CONSTRUCT does what I want!)
 I had suddenly got excited that RDF might actually be useable to represent 
 something I wanted to represent, just like we tell other people :-)
 So it is all non-standard, as I suspected.
 Ah well, I'll go back to trying to work with XML stuff, instead of using my 
 usual RDF tools :-(
 Very best
 Hugh
 
 On 21 Sep 2013, at 19:14, William Waites w...@styx.org
 wrote:
 
 Hi Hugh,
 
 You can get results in RDF if you use CONSTRUCT -- which is basically
 a special case of SELECT that returns 3-tuples and uses set semantics
 (does not allow duplicates), but I imagine that you are aware of this.
 
 Returning RDF for SELECT where the result set consists in n-tuples
 where n != 3 is difficult because there is no direct way to represent
 it. 
 
 Also problematic is that there *is* a concept of order in SPARQL query
 results while there is not with RDF.
 
 Also the use of bag semantics allowing duplicates which also does not
 really work with RDF.
 
 These, again, could be kludged with reification, but that is not very
 elegant. 
 
 So most SELECT results are not directly representable in RDF.
 
 Cheers,
 -w
 
 
 
 
 ---
 Jerven Bollemanjerven.bolle...@isb-sib.ch
 SIB Swiss Institute of Bioinformatics  Tel: +41 (0)22 379 58 85
 CMU, rue Michel Servet 1   Fax: +41 (0)22 379 58 58
 1211 Geneve 4,
 Switzerland www.isb-sib.ch - www.uniprot.org
 Follow us at https://twitter.com/#!/uniprot
 ---

Re: Maphub -- RWW meets maps

2013-09-19 Thread Hugh Glaser

Hi Andy,
Nice.
In case you hadn't guessed:
http://sameas.org/?uri=http://oxpoints.oucs.ox.ac.uk/id/23232414
:-)

On 19 Sep 2013, at 15:03, Andy Turner a.g.d.tur...@leeds.ac.uk
 wrote:

 http://www.oucs.ox.ac.uk/oxpoints/
  
 Andy
 http://www.geog.leeds.ac.uk/people/a.turner/
  
 From: Gannon Dick [mailto:gannon_d...@yahoo.com] 
 Sent: 19 September 2013 13:55
 To: Andy Turner; 'Kingsley Idehen'; public-...@w3.org; public-lod@w3.org
 Cc: chippy2...@gmail.com; suchith.an...@nottingham.ac.uk
 Subject: Re: Maphub -- RWW meets maps
  
 FWIW, the University of Oxford has an 800th Birthday coming up soon.
 
 http://www.rustprivacy.org/2012/roadmap/oxford-university-area-map.pdf
 
 The geo coordinates, founding dates etc. for the Colleges and Halls are
 available on the University site.  The lo-res sunrise and sunset data is
 available in spreadsheets at 
 http://www.esrl.noaa.gov/gmd/grad/solcalc/calcdetails.html
 
 My offering has some *cough* complete lack of artistic promise and bandwidth 
 crushing size *cough* limitations, but I had fun :-)  It would be nice to see 
 this *cough* done well *cough* duplicated.
 
 --Gannon
 
 
  
  
 From: Andy Turner a.g.d.tur...@leeds.ac.uk
 To: 'Kingsley Idehen' kide...@openlinksw.com; public-...@w3.org 
 public-...@w3.org; public-lod@w3.org public-lod@w3.org 
 Cc: chippy2...@gmail.com chippy2...@gmail.com; 
 suchith.an...@nottingham.ac.uk suchith.an...@nottingham.ac.uk 
 Sent: Thursday, September 19, 2013 3:36 AM
 Subject: RE: Maphub -- RWW meets maps
 
 Interesting work. It's a way to go for linking OpenStreetMap data and 
 Wikimapia data with Wikipedia and each other etc.. I don't know the state of 
 play with how OpenStreetMap or Wikimapia are currently doing this, but I like 
 to think that someone at the recent Maptember events in Nottingham, UK 
 hopefully does and might provide some feedback...
 
 Thanks,
 
 Andy
 http://www.geog.leeds.ac.uk/people/a.turner/
 
 
 -Original Message-
 From: Kingsley Idehen [mailto:kide...@openlinksw.com] 
 Sent: 18 September 2013 19:44
 To: public-...@w3.org; public-lod@w3.org
 Subject: Re: Maphub -- RWW meets maps
 
 On 9/18/13 1:40 PM, Melvin Carvalho wrote:
  A fantastic open source project maphub which uses linked data to read 
  and write to current and historical maps, using RDF and the open 
  annotations vocab.  There's even links to DBPedia!
 
  http://maphub.github.io/
 
  A great example of how to use the Read Write Web.  The video is well 
  worth watching!
 
 Also publishes annotations in Linked Data form [1] :-)
 
 [1] http://maphub.herokuapp.com/control_points/4 .
 
 -- 
 
 Regards,
 
 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca handle: @kidehen
 Google+ Profile: https://plus.google.com/112399767740508618350/about
 LinkedIn Profile: http://www.linkedin.com/in/kidehen

http://differentfrom.org

2013-08-29 Thread Hugh Glaser

Hi,
I mentioned this in an earlier post.
I then discovered that I was the only one who could access it!
(While I was building it I fixed my private DNS.)
Anyway, since I mentioned it, I have now fixed the public DNS, and it should 
have propagated by now.
So feel free to go and (have another?) look, and any feedback welcome.
Best
Hugh

sameAs.org license - was Re: Linked data sets for evaluating interlinking?

2013-08-27 Thread Hugh Glaser

Thanks Ghislain.

Sorry, no SPARQL endpoint, as it isn't an RDF store.

With respect to a license, it is more difficult.
This may be a longer answer than you were expecting. :-)
(Firstly, please understand that I'm not very good with this license stuff.)
When I started sameAs.org, it only had mostly my rkb stuff in it.
So I could do what I liked.
Understanding the importance of having some sort of license, I put what I 
thought was the most liberal one I could find - 
http://creativecommons.org/publicdomain/zero/1.0/ (which is at the bottom of 
the page).
Take it away and do what you like with it.
I would have liked to say Please attribute if you can, but I understand that 
may be difficult, so don't worry if you can't but I couldn't find one of this, 
and I think having a license that is quickly seen and widely understood is 
important.

Sub-bit on attribution
A problem with follow your nose (fyn) Linked Data is that the attribution can 
be very hard.
I may tell you that a owl:sameAs b.
The reason I tell you that is that I have found loads of stuff about c, d 
and e which allowed me to infer that. And some of that data may no longer 
even be available.
So the only safe attribution for every fact I give you would be my entire 
source attribution - I might as well tell you the attribution is the Web.
Correct, but hardly in the spirit of the thing (I am actually more interested 
in the spirit of fair attribution tun the legal side of it!)
For one of my users who uses fyn to attribute is probably even harder - at 
least I now my sources by hand.
If they came using fyn, then they may be using a URI that happened to be got by 
a previous resolution (and so on).
So essentially, every time they resolve a URI, they need to do license work.
Of course in principle this is what people should be doing - absolutely!
But in practice, people are not tooled up for this; so a requirement for 
attribution would make the data unusable for such people.
And they are the ones who are *really* using Linked Data, so I want to 
encourage them!
/Sub-bit on attribution

Of course, it now has stuff from lots of other sources.
Many of these simply sent me the data, or told me I could put it in sameAs.org 
- but I don't really recall anyone ever discussing license!
Since I asked for it for sameAs.org, then I assumed that they agreed to have it 
out there with the license.

Other stuff, I have just gone to a sparql endpoint or download site and taken a 
bit of their data.
So what is the license of stuff on the open web? - No, you don't need to answer 
that!
Essentially sameAs.org is a search engine for the Linked Data web; so I went to 
Google and Bing to see what license they might put on their data.
Answer found I none!
[I even found that if you put a search such as Bing license into Bing it 
barfs! :-) ]
There is lots of stuff about what users license them to do with user data, and 
what they license for their software, but nothing on the results returned from 
a web search on their site.
My sameAs.org about page does list a bunch of places which should provide 
compliance with any attribution requirements for those sites, but is now 
seriously out of date, I think.
So I just left it at that.
I know I don't have the same legal department as Google or Microsoft if there 
is a problem :-), but I sort of think that I take far less data from sites than 
they do, and it doesn't seem to be a problem for them.

As far as the sub-stores are concerned, I took the license off.
But most of them were built in collaboration with the sources, and they have 
links to the sources, which may or may not have a license, but that probably 
makes things clearer for those.

The bottom line is that there are very few sites, if any, which (like 
sameAs.org) have as their main purpose the provision of sameAs information.
On the contrary (like googlejuice SEO) they want any sameAs links to be taken 
away, so that traffic will come to their sites through the links they have 
published (like via Google).

Thanks for your question - I'm happy to get any advice from anyone, and I hope 
I can understand it if it comes!

Best
Hugh

On 27 Aug 2013, at 09:09, Ghislain Atemezing auguste.atemez...@eurecom.fr
 wrote:

 Hi Hugh,
 
 
 So, for example, if you wanted Adrian's data, then I can give it to you.
 (I have queried the SPARQL endpoint to put stuff in sameAs.org. Both 
 owl:sameAs and skos:exactMatch.)
 I have lots of bibliographic ones, especially national libraries, who have 
 often sent me the data.
 (British, German, US, Japanese, Norwegian, French, Spanish, Hungarian … as 
 best I recall.)
 I also have the VIAF data.
 This is all aggregated in http://sameas.org/store/kelle/ and other stuff is 
 kept in some sameAs stores - see http://sameas.org/store/
 
 Nice work!! And a small question…..
 I was wondering if there is an endpoint in sameAs.org for using SPARQL 
 queries? 
 And for the data sets you receive, do they all have a specific terms of 
 license?

Re: Linked data sets for evaluating interlinking?

2013-08-27 Thread Hugh Glaser

Hi,
Thanks.
Just one comment, relating to the cities example you use.
The paper you cite mentions cities and says: For example, the city of Paris is 
referenced in a number of different Linked Data-sets: ranging from OpenCyc to 
the New York Times. In DBPedia, a Linked Data export of Wikipedia, these 
data-sets are connected by owl:sameAs. In particular, dbpedia:Paris is 
owl:sameAs as both the opencyc:CityOfParisFrance and opencyc:Paris 
DepartmentFrance, as OpenCyc distinguishes that “the department of Paris. Paris 
DepartmentFrance is a distinct geopolitical entity from CityOfParisFrance, 
despite the fact that both share the same territory, while Wikipedia does not 
make this distinction.

So even cities (actually especially cities and other geo things) have 
significant challenges here.
Geo-political v. geographic v. the geo-extent v. the nounSynset etc.
And we haven't even mentioned temporal aspects.

So I do worry about all this.
If the dataset is simple enough that you can ignore the problems, then the 
question is if the exercise tells you anything useful.
If the dataset is more complicated, for example having both geo-political and 
geographic and wanting to keep them separate, then it is also a question is if 
the exercise tells you anything useful!

But if something is hard and challenging it is more reason to do it, I guess.
Good luck.
Hugh

On 27 Aug 2013, at 16:57, csara...@uni-koblenz.de
 wrote:

 Hi Hugh,
 
 
 Hi Cristina,
 Some interesting issues you raise.
 One of them is how people publish links (which enables your analysis).
 There are two ways this happens.
 1) People add triples to their dataset that have an equivalence predicate
 (owl:sameAs, skos:exactMatch, skos:closeMatch, etc.)
 2) People use a foreign URI (very commonly a dbpedia URI), because
 when turning
 their data into RDF they have decided that the entity they are concerned
 with is
 the same as the dbpedia one. The second paragraph of Tom's message
 describes such
 a linkage, I think.
 I think these distinctions are behind the comments of Milorad, where he is
 assuming the type (2) way.
 Either of these methods should be fodder for you, and you may well find
 that the
 type (2) way is used by a dataset that is useful to you.
 
 
 I agree, it is important to distinguish between different types of links.
 When I refer to interlinking I have in mind triples (s, p, o),
 where s and o are resources from different data sets, and p is either a
 property like owl:sameAs or a domain-specific property like foaf:knows.  I
 think this corresponds to what you specified in 1) and 2). I would like to
 have both kinds of links in my evaluation (if possible).
 
 It may be harder for you to process, as the linkage is not so explicit
 because
 there is no distinct URI for the resource in the database, different from the
 foreign one. But any foreign URI is in fact a link.
 You will find that people have tended towards type (2) linkage because
 they can
 shy away from having lots of equivalence predicates in their datasets, not
 least
 because there was a time when RDF stores did not comfortably do owl:sameAs
 inference, and so they do the linking at RDF conversion time, and use
 foreign
 URIs.
 
 Another interesting issue is more fundamental to your work.
 You seem to think that there must be a gold standard or reference
 interlinking
 for equivalence.
 As long-time readers of this list will have seen discussed many times
 (!), it is
 not a simple matter.
 
 It is a complex matter to have such a thing, which is a necessity for
 you to do
 your precision/recall statistics.
 At its most basic, for example, am I as a private citizen the same as me
 as a
 member of my University or me as a member of my company?
 The answer is, of course yes and no.
 Another field that has spent a lot of time on this is the FRBR world
 (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records).
 If I have a book of the Semantic Web, is it the same as your book of the
 same name?
 Perhaps. What if it is a different (corrected) edition? An electronic
 version?
 Certainly a library will usually consider each book a different thing,
 but if you
 are asking how many books the author has published, you want to treat all the
 books as the same resource.
 
 I understand the point, and I find it very interesting, indeed. I guess
 that it might depend on the context where the data was created / will be
 used. This reminds me of the paper about the analysis of identity links (
 http://www.w3.org/2009/12/rdf-ws/papers/ws21). However, I think that it is
 possible to evaluate different interlinking techniques, establishing some
 gold standard (e.g. the links between the cities of a data set describing
 the population of European cities and a data set describing the cities as
 tourist attractions), to be able to analyse the results in terms of
 precision and recall, and say that one tool is able to certain things,
 while the other not.
 
 Regarding the

Pleiades - was Re: Linked data sets for evaluating interlinking?

2013-08-26 Thread Hugh Glaser

Hi Tom,
I don't know if you are involved with Pleiades, but I have some questions.
I found the data at http://atlantides.org/downloads/pleiades/rdf/ - many thanks.
It has some sameAs links :-)
But I have some worries:
It has triples like
http://pleiades.stoa.org/places/991318#this owl:sameAs 
http://pleiades.stoa.org/places/981510#this .
The http://pleiades.stoa.org/places/991318#this goes to a page entitled 
Duplicate Baetica.
In that page it says Link from a duplicate to the master Baetica.
I worry a bit about this, as it may be saying that the link page is owl:sameAs 
master page, which would clearly be wrong.

More problematic for me (for sameAs.org!) is that the duplicate link is not 
Linked Data.
If I try to get RDF from it, it gives HTTP/1.1 500 Internal Server Error and 
html.
Is all this the intended behaviour?

Great resource, of course!

Best
Hugh

On 26 Aug 2013, at 14:16, Tom Elliott tom.elli...@nyu.edu
 wrote:

 Hi all:
 
 Two humanities datasets of potential interest in this regard:
 
 A number of datasets (around 20 different ones I think) related to the study 
 of antiquity have aligned their geographic/toponymic fields with the Pleiades 
 gazetteer (http://pleiades.stoa.org) and published RDF accordingly. Most of 
 this work has been done under the auspices of something called the Pelagios 
 Project, and the alignment processes used by many of the participants are 
 documented in blog posts at http://pelagios-project.blogspot.com/ (most of 
 them a combination of automated and manual). Pleiades itself is also a linked 
 data resource, and has a growing number (still only a small percentage of its 
 content) of outbound links to dbpedia, geonames, and OSM. All of those 
 outbound links are hand-curated. Contributors to Pleiades, where possible, 
 are aligned to VIAF (manually) and bibliography in Pleiades is also beginning 
 to be aligned to the Open Library and Worldcat (again, manually).
 
 On a much smaller scale, I offer the About Roman Emperors dataset, which 
 rather than minting its own URIs for the Roman emperors, uses the dbpedia 
 resource URIs for each: http://www.paregorios.org/resources/roman-emperors/. 
 The primary purpose of the dataset is to provide a comprehensive list of 
 these for easy access and reuse by third parties, and to associate the 
 dbpedia URIs with corresponding Roman imperial mint and minting authority 
 data in nomisma.org and finds.org.uk, and to a static, late-90s-vintage 
 scholarly encyclopedia of Roman emperors: http://www.roman-emperors.org/
 
 Tom
 
 
 Tom Elliott, Ph.D.
 Associate Director for Digital Programs and Senior Research Scholar
 Institute for the Study of the Ancient World (NYU)
 http://isaw.nyu.edu/people/staff/tom-elliott
 
 
 
 On Aug 26, 2013, at 6:04 AM, Adrian Stevenson wrote:
 
 Hi All
 
 As part of the LOCAH and Linking Lives projects, the latter in particular, 
 we've being doing a lot of this auto and manual linking work, mainly to VIAF 
 and DBPedia, with some links to things like LCSH and Geonames. We've been 
 doing a lot of work just recently in fact, and we've published a blog post 
 that's picked up quite a bit of interest on this - 
 http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We haven't 
 published our latest run of data yet, but we hope to finish this soon. It'll 
 probably still be about a month or so as a few of us are on holiday soon.
 
 We do have quite a few links done semi-automatically in our existing data 
 set accessible via http://data.archiveshub.ac.uk but as I say we are 
 updating this, I'd suggest not taking the URIs and data available there as 
 the final word.
 
 A good example is 
 http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer
 
 Project URIs:
 http://archiveshub.ac.uk/locah/
 http://archiveshub.ac.uk/linkinglives/
 
 Adrian
 _
 Adrian Stevenson
 Senior Technical Innovations Coordinator
 Mimas, The University of Manchester
 Devonshire House, Oxford Road
 Manchester M13 9QH
 
 Email: adrian.steven...@manchester.ac.uk
 Tel: +44 (0) 161 275 6065
 http://www.mimas.ac.uk
 http://www.twitter.com/adrianstevenson
 http://uk.linkedin.com/in/adrianstevenson/
 
 On 22 Aug 2013, at 16:06, Cristina Sarasua wrote:
 
 Hi, 
 
 I am looking for pairs of linked data sets that can be used as gold 
 standard for evaluations.  I would need pairs of data sets which have been 
 manually linked, or data sets which have been (semi-)automatically linked 
 with interlinking tools, and afterwards reviewed (to include the links 
 which are not identified by tools). I have looked into the DataHub 
 catalogue and queried VoiD descriptions, but unfortunately the information 
 about how the interlinking process was carried out is often missing.
 
 Apart from the data sets which have been used in the OAEI-instance matching 
 track, could anyone recommend (based on past experience) good data sets for 
 evaluating data interlinking processes?

Re: YASGUI: Web-based SPARQL client with bells ‘n wistles

2013-08-20 Thread Hugh Glaser

Hi Bernard,
And if you are going to change things…
I went looking for equivalences (:-)), and found a lot (but not all) of 
owl:sameAs dbpedia objects that seem to have crept in as strings, e.g. 
http://rdf.muninn-project.org/ontologies/military#Battalion owl:sameAs 
dbpedia:Battalion

http://lov.okfn.org/endpoint/lov_aggregator?query=PREFIX+owl%3A+++%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0ASELECT+DISTINCT+*+WHERE+%7B+%3Fs+owl%3AsameAs+%3Fo+%7D%0D%0ALIMIT+100format=HTML

Best
Hugh
On 20 Aug 2013, at 18:23, Barry Norton barry.nor...@ontotext.com
 wrote:

 
 Thanks, Bernard.
 
 I get ~5000 instances of rdf:Property (and 1643 of rdfs:Class - and oddly 5 
 instances of rdfs:Property), but more than three times as many for:
 
 SELECT (COUNT(?property) AS ?properties)
 {
 SELECT DISTINCT ?property
 WHERE{
 {?property rdfs:domain ?domain}
 UNION
 {?property rdfs:range ?range}
 UNION
 {?property rdfs:subPropertyOf ?super}
 UNION
 {?sub rdfs:subPropertyOf ?property}
 }
 }
 
 I'm guessing, therefore, no inference in this store?
 
 Since OWL-implied properties would require a much more sophisticated query, 
 is it possible to get the dataset and re-index this with inference?
 
 Barry
 
 
 
 
 On 20/08/2013 18:02, Bernard Vatant wrote:
 Hello Barry
 
 
 I had a reminder today that I never answered the question below, and I am 
 very late indeed !
 
 
 Properties and classes of all vocabularies in LOV are aggregated in a triple 
 store 
 
 
 of which SPARQL endpoint is at 
 http://lov.okfn.org/endpoint/lov_aggregator
 This is quite raw data but you should find everything you need in there.
 
 
 
 Otherwise can also use the new API 
 http://lov.okfn.org/dataset/lov/api/v1/vocabs
 which for each vocabulary provides the prefix and link to the last version 
 stored.
 
 
 
 
 Hope that helps
 
 
 Bernard
 
 
 From: Barry Norton barry.nor...@ontotext.com
 
 
 Date: Sat, 06 Jul 2013 11:27:46 +0100
 
 
 Bernard, does LOV keep a cache of properties and classes?
 
 I'd really like to see resource auto-completion in Web-based tools like 
 YASGUI, but a cache is clearly needed for the to be feasible.
 
 Barry

Re: {Disarmed} Re: YASGUI: Web-based SPARQL client with bells ‘n wistles

2013-08-20 Thread Hugh Glaser

Thanks Ghislain, the right response :-)
(It's not our data, if it gets fixed at source we will re-acquire)
I think I tracked down the email of the person responsible, so have raised the 
issue.
Best
Hugh

On 20 Aug 2013, at 20:13, Ghislain Atemezing auguste.atemez...@eurecom.fr
 wrote:

 Hi Hugh,
 
 [[ My 2 cents ]] 
 I went looking for equivalences (:-)), and found a lot (but not all) of 
 owl:sameAs dbpedia objects that seem to have crept in as strings, e.g. 
 http://rdf.muninn-project.org/ontologies/military#Battalion owl:sameAs 
 dbpedia:Battalion
 
 
 I thing that comes from the ontology 
 http://rdf.muninn-project.org/ontologies/military.html itself, and you may 
 have a look here: 
 http://rdf.muninn-project.org/ontologies/military.html#linkages
 MailScanner has detected a possible fraud attempt from lov.okfn.org 
 claiming to be 
 http://lov.okfn.org/endpoint/lov_aggregator?query=PREFIX+owl%3A+++%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0ASELECT+DISTINCT+*+WHERE+%7B+%3Fs+owl%3AsameAs+%3Fo+%7D%0D%0ALIMIT+100format=HTML
 
 Best
 
 Ghislain

Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

This is great!
Thanks guys.
I normally really, really don't care about all that crypto stuff (it should all 
happen transparently), but I'm find all this interesting!

So yes, I created a p12 (using http://id.myopenlink.net/certgen/ - you 
sometimes have to trust someone :-) ) and emailed.
I am confident (!) that with Keychain things will be fine, but less sure about 
Windows.
Opened it on a Windows box, and it seems to have taken the thing to heart and 
put it in some certificate management thing.
I am a little uncertain what I should put at the URL (non-FOAF) I gave it - the 
final page gave me some options with micro data, RDFa etc - I am guessing I can 
just wrap any of them in htmlbody etc?
Anyway, I can sort that.

So now all (!!!) I really need to do is make my wordpress site look for the ID 
thing.
Hmmm.
Melvin, did you get any response to
http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ?
Or Kinglsey, what did you do on the server side of your photo?

Cheers

On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote:

 On 8/9/13 7:47 AM, Norman Gray wrote:
 Henry, greetings.
 
 [replying only on public-lod]
 
 Bit of an essay, this one, because I've been mulling this over, since this 
 message appeared a couple of days ago...
 
 On 2013 Aug 8, at 16:14, Henry Story wrote:
 
 On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote:
 
 1. Certificate Name: maybe there could be some examples of ways to name 
 your certificate.
 [...]
 That's why it should be done by the server generating the certificate.
 The details are here:
  
 https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate
 I appreciate the logic here, and can see how it works technically smoothly 
 for the anticipated use-case (the one illustrated in the WebID video on the 
 webid.info front page).  I don't think that's enough, however, because I 
 don't think I could convincingly explain what's happening here, to a 
 motivated but non-technical friend who wants to understand what they've just 
 achieved when I've walked them through getting their WebID certificate from 
 (something like) the social service illustrated in the video.
 
 People understand what a username  password is (the first is my identity, 
 the second is a secret that proves I am who I claim), and they understand 
 what a door-key is (no identity, but I have this physical token which 
 unlocks a barrier for anyone in possession of the token or a copy).
 
 The same is not true of a WebID.  Making this a one-click operation is nice 
 (and a Good Thing at some level), but just means that the user knows that it 
 was _this_ click that caused some black magic to happen, and I'm not sure 
 that helps.
 
 Therefore...
 
 2. With firefox, after filling out the form, I get a download dialogue for 
 the cert instead of it installing into the browser. So I saved, then went 
 into preferences and import ... which was successful with Successfully 
 restored your security certificate(s) and private key(s). Previously, 
 with my-profile.eu, this was automatically installed into the browser (I 
 was using Chrome then). Though I guess it's better to have it export/save 
 by default so you can install the same cert on any number of browsers 
 without hassle. Still, it creates more steps and could be confusing for 
 new users.
 In the case of WebID certs downloading the certificate is in fact silly as 
 you can produce a different one for each browser. So that message is a 
 little
 misleading. A good UI should warn the user about that.
 Thinking about it, and exploring the behaviours again this week, I'm more 
 and more sure that the browser is a problematic place to do this work.  
 _Technically_, it's exactly the right place, of course, and the HTML5 keygen 
 element is v. clever.  But it's killing for users, and coming back to WebIDs 
 and certificates this week, and parachuting into this discussion here, I've 
 been a 'user' this week.
 
 A 'web browser' is a passive thing: it's a window through which you look at 
 the web.  It quickly disappears, for all but the most hesitant and 
 disoriented users; in particular it's not a thing which takes _actions_, or 
 where you can store things.  That means that the browser creating the 
 key-pair, and storing the server-generated certificate, is literally 
 incomprehensible to the majority of anticipated users.
 
 And even to me.  I have an X.509 e-science certificate which needs renewing 
 every year, and every year I stuff up this renewal in one way or another: 
 the certificate isn't in the right place, or I try to retrieve the 
 replacement with a different browser from the one which generated the CSR, 
 or something else which is sufficiently annoying that I purge the experience 
 from my memory.  And I understand about certificates and the whole PKI thing 
 -- someone who doesn't is going to find the experience bamboozling, hateful 
 and stressful.
 
 It sounds as if

Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

Thanks Kingsley,
On 9 Aug 2013, at 15:09, Kingsley Idehen kide...@openlinksw.com
 wrote:

 On 8/9/13 9:51 AM, Hugh Glaser wrote:
 So now all (!!!) I really need to do is make my wordpress site look for the 
 ID thing.
 Hmmm.
 Melvin, did you get any response to
 http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ?
 Or Kinglsey, what did you do on the server side of your photo?
 
 In my case, I Just made an ACL based on a combination of the identity claims 
 that I know are mirrored in the WebID bearing certificate. In the most basic 
 sense, you can simply start with the basic WebID+TLS test which is part of 
 the basic server side implementation. Thus, I would expect the WordPress 
 plugin to perform aforementioned test.
Sorry mate, I have little or know idea what you are talking about.
What would an ACL look like?
What plugin in wordpress do you mean?
It is probably the case that this is now too much detail for the list (I think 
that the whole discussion has been great for uptake of WebID, which is relevant 
to Linked Data).
And it is probably the case that I am just too ignorant of the whole thing to 
attempt to do the server side of ti, especially when it is not a raw site, but 
Wordpress.
And people have been too polite to tell me.

Thanks for your response Melvin; I guess I got a bit mislead (or hopeful!) 
because Angelo's wp-linked-data plugin has webid as a keyword.

I think I will now consider myself Retired Hurt 
(http://en.wikipedia.org/wiki/Retired_hurt_(cricket)#Retired_hurt_.28or_not_out.29
 )!
I hope to return before the end of the innings.

Best
Hugh
 
 BTW -- When you distribute pkcs#12 files, the receiving parties don't 
 actually need to have any knowledge of the actual ACL that you use to protect 
 the resources being shared :-)
 
 Kingsley
 
 Cheers
 
 On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote:
 
 On 8/9/13 7:47 AM, Norman Gray wrote:
 Henry, greetings.
 
 [replying only on public-lod]
 
 Bit of an essay, this one, because I've been mulling this over, since this 
 message appeared a couple of days ago...
 
 On 2013 Aug 8, at 16:14, Henry Story wrote:
 
 On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote:
 
 1. Certificate Name: maybe there could be some examples of ways to name 
 your certificate.
 [...]
 That's why it should be done by the server generating the certificate.
 The details are here:
  
 https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate
 I appreciate the logic here, and can see how it works technically smoothly 
 for the anticipated use-case (the one illustrated in the WebID video on 
 the webid.info front page).  I don't think that's enough, however, because 
 I don't think I could convincingly explain what's happening here, to a 
 motivated but non-technical friend who wants to understand what they've 
 just achieved when I've walked them through getting their WebID 
 certificate from (something like) the social service illustrated in the 
 video.
 
 People understand what a username  password is (the first is my identity, 
 the second is a secret that proves I am who I claim), and they understand 
 what a door-key is (no identity, but I have this physical token which 
 unlocks a barrier for anyone in possession of the token or a copy).
 
 The same is not true of a WebID.  Making this a one-click operation is 
 nice (and a Good Thing at some level), but just means that the user knows 
 that it was _this_ click that caused some black magic to happen, and I'm 
 not sure that helps.
 
 Therefore...
 
 2. With firefox, after filling out the form, I get a download dialogue 
 for the cert instead of it installing into the browser. So I saved, then 
 went into preferences and import ... which was successful with 
 Successfully restored your security certificate(s) and private key(s). 
 Previously, with my-profile.eu, this was automatically installed into 
 the browser (I was using Chrome then). Though I guess it's better to 
 have it export/save by default so you can install the same cert on any 
 number of browsers without hassle. Still, it creates more steps and 
 could be confusing for new users.
 In the case of WebID certs downloading the certificate is in fact silly 
 as you can produce a different one for each browser. So that message is a 
 little
 misleading. A good UI should warn the user about that.
 Thinking about it, and exploring the behaviours again this week, I'm more 
 and more sure that the browser is a problematic place to do this work.  
 _Technically_, it's exactly the right place, of course, and the HTML5 
 keygen element is v. clever.  But it's killing for users, and coming back 
 to WebIDs and certificates this week, and parachuting into this discussion 
 here, I've been a 'user' this week.
 
 A 'web browser' is a passive thing: it's a window through which you look 
 at the web.  It quickly disappears, for all but the most hesitant and 
 disoriented users

Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

Hugh comes back to play /
Thanks Kingsley, and Melvin and Henry and Norman.
So, trying to cut it down to the minimum.
(Sorry, I find some/many of the pages about it really hard going.)
If I have a photo on a server, http://example.org/photos/me.jpg, and a WebID at 
http://example.org/id/you
What files do I need on the server so that http://example.org/id/you#me (and 
no-one else) can access http://example.org/photos/me.jpg?
I think that is a sensible question (hopefully!)
Cheers
Hugh

On 9 Aug 2013, at 16:30, Kingsley Idehen kide...@openlinksw.com
 wrote:

 On 8/9/13 11:09 AM, Hugh Glaser wrote:
 Thanks Kingsley,
 On 9 Aug 2013, at 15:09, Kingsley Idehen kide...@openlinksw.com
  wrote:
 
 On 8/9/13 9:51 AM, Hugh Glaser wrote:
 So now all (!!!) I really need to do is make my wordpress site look for 
 the ID thing.
 Hmmm.
 Melvin, did you get any response to
 http://lists.w3.org/Archives/Public/public-webid/2012Aug/0041.html ?
 Or Kinglsey, what did you do on the server side of your photo?
 In my case, I Just made an ACL based on a combination of the identity 
 claims that I know are mirrored in the WebID bearing certificate. In the 
 most basic sense, you can simply start with the basic WebID+TLS test which 
 is part of the basic server side implementation. Thus, I would expect the 
 WordPress plugin to perform aforementioned test.
 Sorry mate, I have little or know idea what you are talking about.
 What would an ACL look like?
 
 Okay, to be clearer, there are two things in play re. authentication via 
 WebID+TLS:
 
 1. basic identity verification -- this is the relation lookup against your 
 profile document (this is the minimal that must be implemented by a WebID+TLS 
 server)
 2. ACLs and Data Access Policies -- this is where, in addition to #1, you set 
 rules such as: only allow identities that are members of a group or known 
 (i.e., via foaf:knows relation) by some other identity etc..
 
 So starting simple, your first step would be #1.
 
 What plugin in wordpress do you mean?
 
 I thought there was a WebID plugin for WordPress. Thus, post-installation, 
 you would be able to achieve step #1 i.e., the plugin turns your WordPress 
 installation into a WebID+TLS compliant server.
 
 It is probably the case that this is now too much detail for the list (I 
 think that the whole discussion has been great for uptake of WebID, which is 
 relevant to Linked Data).
 And it is probably the case that I am just too ignorant of the whole thing 
 to attempt to do the server side of ti, especially when it is not a raw 
 site, but Wordpress.
 And people have been too polite to tell me.
 
 Also note, if you are hosting WordPress you can make the plugin yourself. It 
 boils down to a SPARQL ASK on the relation that associates a WebID with a 
 Public Key.
 
 
 Thanks for your response Melvin; I guess I got a bit mislead (or hopeful!) 
 because Angelo's wp-linked-data plugin has webid as a keyword.
 
 Yes, that threw me off too.
 
 I think I will now consider myself Retired Hurt 
 (http://en.wikipedia.org/wiki/Retired_hurt_(cricket)#Retired_hurt_.28or_not_out.29
  )!
 I hope to return before the end of the innings.
 
 I really assumed that circa. 2013 an interested party would have build a 
 WebID+TLS server side plugin for WordPress.
 
 Ah! Just realized something, there's an OpenID plugin for WordPress [1], 
 which means you can (if you choose) leverage an OpenID+WebID bridge service 
 [2].
 
 
 Links:
 
 1. http://wordpress.org/plugins/openid/ -- the OpenID plugin for Wordpress 
 (this gives you the authentication functionality for your WordPress instance)
 2. http://bit.ly/OcbR8w -- G+ note I posted about the OpenID+WebID proxy 
 service (which you can leverage in this scenario too!).
 
 Kingsley
 
 
 Best
 Hugh
 BTW -- When you distribute pkcs#12 files, the receiving parties don't 
 actually need to have any knowledge of the actual ACL that you use to 
 protect the resources being shared :-)
 
 Kingsley
 Cheers
 
 On 9 Aug 2013, at 13:25, Kingsley Idehen kide...@openlinksw.com wrote:
 
 On 8/9/13 7:47 AM, Norman Gray wrote:
 Henry, greetings.
 
 [replying only on public-lod]
 
 Bit of an essay, this one, because I've been mulling this over, since 
 this message appeared a couple of days ago...
 
 On 2013 Aug 8, at 16:14, Henry Story wrote:
 
 On 7 Aug 2013, at 19:34, Nick Jennings n...@silverbucket.net wrote:
 
 1. Certificate Name: maybe there could be some examples of ways to 
 name your certificate.
 [...]
 That's why it should be done by the server generating the certificate.
 The details are here:
  
 https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#the-certificate
 I appreciate the logic here, and can see how it works technically 
 smoothly for the anticipated use-case (the one illustrated in the WebID 
 video on the webid.info front page).  I don't think that's enough, 
 however, because I don't think I could convincingly explain what's 
 happening here, to a motivated but non-technical

Re: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

Thanks.
I've looked at quite a bit of this stuff, but still don't see where the ACL 
document gets stored and used.

I am beginning to get the sense that I may have to write some code, other than 
the ACL rdf to do this.
Surely Apache or something else will do this for me?
Can't I just put the ACL in a file (as in htpasswd) and point something at it?
I certainly don't want to be writing code to make one photo (or simply a static 
web site) available.
Or is that the delegated service you are talking about?

I've got my fingers crossed here.

On 9 Aug 2013, at 17:35, Kingsley Idehen kide...@openlinksw.com
 wrote:

 On 8/9/13 12:22 PM, Hugh Glaser wrote:
 Hugh comes back to play /
 Thanks Kingsley, and Melvin and Henry and Norman.
 So, trying to cut it down to the minimum.
 (Sorry, I find some/many of the pages about it really hard going.)
 If I have a photo on a server, http://example.org/photos/me.jpg, and a WebID 
 at http://example.org/id/you
 What files do I need on the server so that http://example.org/id/you#me (and 
 no-one else) can access http://example.org/photos/me.jpg?
 I think that is a sensible question (hopefully!)
 
 You can need a Turtle document (other RDF document types will do too) 
 comprised of content that describes your ACL based on 
 http://www.w3.org/ns/auth/acl vocabulary terms.
 
 You might find http://www.w3.org/wiki/WebAccessControl#this wiki document 
 useful too.
 
 My ACL demos leverage the fact that our ODS and Virtuoso platforms have this 
 in-built re. Web Server functionality.
 
 I need to check if we built a delegated service for WebID+TLS based ACLs, if 
 not, then (note to self re., new feature zilla) we'll make one :-)

Re: {Disarmed} RWW-Play was: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

Thanks Henry.
Well I had looked there, but it all looked quite complicated - I have never 
cloned a git thingy before and I don't even know if Java is available on the 
host :-)
But emboldened by your encouragement I went for the The short version.
I was very encouraged, as it seemed to do quite a lot, but seemed to hang after 
getting
play-2-TLS-e6c58f64585b182f937358fa984474b86984d77d.tar.bz2

But, even when I tried to do it by hand (the Longer Version), I eventually got 
the java was killed for excessive resource usage.
By which tim sit had downloaded 427MB of stuff.
I don't think these are the sort of hosting costs I want to have.
So I have a sense this is not the solution I was looking for :-)

Very best
Hugh

By the way, the link in An initial implementation of Linked Data Basic 
Profile does a 404.

On 9 Aug 2013, at 18:09, Henry Story henry.st...@bblfish.net
 wrote:

 
 On 9 Aug 2013, at 18:55, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 
 Thanks.
 I've looked at quite a bit of this stuff, but still don't see where the ACL 
 document gets stored and used.
 
 I am beginning to get the sense that I may have to write some code, other 
 than the ACL rdf to do this.
 Surely Apache or something else will do this for me?
 Can't I just put the ACL in a file (as in htpasswd) and point something at 
 it?
 I certainly don't want to be writing code to make one photo (or simply a 
 static web site) available.
 Or is that the delegated service you are talking about?
 
 I've got my fingers crossed here.
 
 You can follow the instructions on installing 
 https://github.com/stample/rww-play
 (It's under the Apache Licence and patches and contributions are welcome )
 
 Then you'll be able to do the following:
 
 An initial implementation of the Linked Data Platform spec is implemented 
 here. The same way as theApache httpd server it servers resource from the 
 file system and maps them to the web. By default we map the test_www 
 directory's content to http://localhost:8443/2013/.
 
 The test_www directory starts with a few files to get you going
 
 $ cd 
 test_www
 
 $ 
  ls -al 
 total 48
 drwxr-xr-x   4 hjs  admin   340  9 Jul 19:04 .
 drwxr-xr-x  15 hjs  admin  1224  9 Jul 19:04 ..
 -rw-r--r--   1 hjs  staff   229  1 Jul 08:10 .acl.ttl
 -rw-r--r--   1 hjs  admin   109  9 Jul 19:04 .ttl
 lrwxr-xr-x   1 hjs  admin 8 27 Jun 20:29 card - card.ttl
 -rw-r--r--   1 hjs  admin   167  7 Jul 22:42 card.acl.ttl
 -rw-r--r--   1 hjs  admin   896 27 Jun 21:41 card.ttl
 -rw-r--r--   1 hjs  admin   102 27 Jun 22:32 index.ttl
 drwxr-xr-x   2 hjs  admin   102 27 Jun 22:56 raw
 drwxr-xr-x   3 hjs  admin   204 28 Jun 12:51 
 test
 All files with the same initial name up to the . are considered to work 
 together, (and in the current implementation are taken care of by the same 
 agent).
 
 Symbolic links are useful in that they:
 
   • allow one to write and follow linked data that works on the file 
 system without needing to name files by their extensions. For example a 
 statement such as [] wac:agent card#me can work on the file system just as 
 well as on the web.
   • they guide the web agent to which the default representation should be
   • currently they also help the web agent decide which are the resources 
 it should serve.
 There are three types of resources in this directory:
 
   • The symbolic links such as card distinguish the default resources 
 that can be found by an httpGET on http://localhost:8443/2013/card. Above the 
 card - card.ttl shows that card has a defaultturtle representation.
   • Each resource also comes with a Web Access Control List, in this 
 example card.acl.ttl, which set access control restrictions on resources on 
 the file system.
   • Directories store extra data (in addition to their contents) in the 
 .ttl file. (TODO: not quite working)
   • Directories also have their access control list which are published 
 in a file named .acl.ttl.
 These conventions are provisional implementation decisions, and improvements 
 are to be expected here . (TODO:
 
   • updates to the file system are not reflected yet in the server
   • allow symbolic links to point to different default formats )
 Let us look at some of these files in more detail
 
 The acl for card just includes the acl for the directory/collection . (TODO: 
 wac:include has not yet been defined in the Web Access Control Ontology)
 
 $ 
 cat card.acl.ttl 
 @prefix wac: 
 http://www.w3.org/ns/auth/acl#
  .
 @prefix foaf: 
 http://xmlns.com/foaf/0.1/
  .
 
  wac:include .acl .
 
 The acl for the directory allows access to all resources in the 
 subdirectories of test_www when accessed from the web as 
 https://localhost:8443/2013/ only to the user authenticated 
 ashttps://localhost:8443/2013/card#me. (TODO: wac:regex is not defined in 
 it's namespace - requires standardisation.)
 
 $ 
 cat .acl.ttl 
 @prefix acl: 
 http://www.w3.org/ns/auth/acl#
  . 
 @prefix foaf: 
 http://xmlns.com/foaf/0.1

Re: {Disarmed} RWW-Play was: Simple WebID, WebID+TLS Protocol, and ACL Dogfood Demo

2013-08-09 Thread Hugh Glaser

Thanks.
Fair enough indeed.
And thanks for sticking with me through the process.
I know it's a pain when n00bs like me get involved trying to use bleeding edge 
code :-)
I look forward to the consumer version.

In fact, I have  feeling that Kingsley may have found much of what I want at
http://dig.csail.mit.edu/2009/mod_authz_webid/README -- ** this might be what 
you need re. Apache ** .
In fact I don't have access to the Apache config on the machine I was using, 
but I will have a go on a machine I do when I have a minute.
If I (or someone else) is successful, a report back with the absolute minimum 
for doing the whole WebID thing that way would be a nice resource.

And of course I would love to see a Wordpress plugin (the Drupal plugin seemed 
to have to many dependencies for me to even think about writing my first 
wordpress plugin!)

Best
Hugh

On 9 Aug 2013, at 19:17, Henry Story henry.st...@bblfish.net
 wrote:

 
 On 9 Aug 2013, at 19:34, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 
 Thanks Henry.
 Well I had looked there, but it all looked quite complicated - I have never 
 cloned a git thingy before and I don't even know if Java is available on the 
 host :-)
 But emboldened by your encouragement I went for the The short version.
 I was very encouraged, as it seemed to do quite a lot, but seemed to hang 
 after getting
 play-2-TLS-e6c58f64585b182f937358fa984474b86984d77d.tar.bz2
 
 But, even when I tried to do it by hand (the Longer Version), I eventually 
 got the java was killed for excessive resource usage.
 By which tim sit had downloaded 427MB of stuff.
 I don't think these are the sort of hosting costs I want to have.
 So I have a sense this is not the solution I was looking for :-)
 
 Well this is not optimised yet. It's for developers. At the moment you need a 
 powerful modern machine.  I am assuming people here on the Linked Data List
 are interested in working with bleeding edge code, and getting an idea of
 where things are heading to.
 
 If you want the couch potatoe version, then you need to wait for 
 the consumer version. :-)
 
 
 
 Very best
 Hugh
 
 By the way, the link in An initial implementation of Linked Data Basic 
 Profile does a 404.
 
 On 9 Aug 2013, at 18:09, Henry Story henry.st...@bblfish.net
 wrote:
 
 
 On 9 Aug 2013, at 18:55, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 
 Thanks.
 I've looked at quite a bit of this stuff, but still don't see where the 
 ACL document gets stored and used.
 
 I am beginning to get the sense that I may have to write some code, other 
 than the ACL rdf to do this.
 Surely Apache or something else will do this for me?
 Can't I just put the ACL in a file (as in htpasswd) and point something 
 at it?
 I certainly don't want to be writing code to make one photo (or simply a 
 static web site) available.
 Or is that the delegated service you are talking about?
 
 I've got my fingers crossed here.
 
 You can follow the instructions on installing 
 https://github.com/stample/rww-play
 (It's under the Apache Licence and patches and contributions are welcome )
 
 Then you'll be able to do the following:
 
 An initial implementation of the Linked Data Platform spec is implemented 
 here. The same way as theApache httpd server it servers resource from the 
 file system and maps them to the web. By default we map the test_www 
 directory's content to http://localhost:8443/2013/.
 
 The test_www directory starts with a few files to get you going
 
 $ cd 
 test_www
 
 $ 
 ls -al 
 total 48
 drwxr-xr-x   4 hjs  admin   340  9 Jul 19:04 .
 drwxr-xr-x  15 hjs  admin  1224  9 Jul 19:04 ..
 -rw-r--r--   1 hjs  staff   229  1 Jul 08:10 .acl.ttl
 -rw-r--r--   1 hjs  admin   109  9 Jul 19:04 .ttl
 lrwxr-xr-x   1 hjs  admin 8 27 Jun 20:29 card - card.ttl
 -rw-r--r--   1 hjs  admin   167  7 Jul 22:42 card.acl.ttl
 -rw-r--r--   1 hjs  admin   896 27 Jun 21:41 card.ttl
 -rw-r--r--   1 hjs  admin   102 27 Jun 22:32 index.ttl
 drwxr-xr-x   2 hjs  admin   102 27 Jun 22:56 raw
 drwxr-xr-x   3 hjs  admin   204 28 Jun 12:51 
 test
 All files with the same initial name up to the . are considered to work 
 together, (and in the current implementation are taken care of by the same 
 agent).
 
 Symbolic links are useful in that they:
 
 • allow one to write and follow linked data that works on the file 
 system without needing to name files by their extensions. For example a 
 statement such as [] wac:agent card#me can work on the file system just 
 as well as on the web.
 • they guide the web agent to which the default representation should be
 • currently they also help the web agent decide which are the resources 
 it should serve.
 There are three types of resources in this directory:
 
 • The symbolic links such as card distinguish the default resources 
 that can be found by an httpGET on http://localhost:8443/2013/card. Above 
 the card - card.ttl shows that card has a defaultturtle representation.
 • Each resource also comes with a Web Access

FOAF Editor - was Re: WebID Frustration

2013-08-07 Thread Hugh Glaser

Norman, hello.
Very interesting.
Yes, I think that works.
I think I had got mislead into thinking the issuer was significant - especially 
as the one I created calls itself Key from my-profile.eu, but of course I 
could change that in keychain.
I was sort of thinking of a FOAF service, which also just happens to do WebID 
if you click the WebID button (on by default, since people don't even need to 
know what it is?).
So, essentially the next generation of foaf-a-matic.
I'm sure I remember talking about this stuff many years ago :-), but maybe 
WebID makes it even more useful.

In some sense this is a way to get WebID more widely adopted - be in a 
symbiotic relationship with FOAF.
Because it also gets FOAF more widely adopted because it does the ID thing.
I'm guessing the WebID people have had all these discussions.

So the service would create and edit a Personal Profile Document for users.
It would look after it itself if you wanted, GET and PUT it on a third party if 
desired and possible, or give you the edited version to put somewhere yourself.

Personally I would love to have something better than vi to edit my FOAF, much 
as I love it :-)

Best
Hugh

On 6 Aug 2013, at 23:26, Norman Gray nor...@astro.gla.ac.uk
 wrote:

 
 Hugh, hello.
 
 On 2013 Aug 6, at 22:58, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 
 [...and quoting out of order...]
 
 I looked a quite a few sites before choosing where my OpenID would be.
 
 So did I, but OpenID allows for some indirection, so that the OpenID that I 
 quote -- http://nxg.me.uk/norman/openid -- isn't committed to a particular 
 OpenID provider.  I use versignlabs.com, but could change away from them 
 without disruption.
 
 This is relevant because...
 
 Actually, this whole thing seems to me (I now realise) nothing to do with 
 WedID per se.
 It is about creating and editing FOAF files.
 
 Aha, yes!  This is the key thing, I think.
 
 So the question of how to get a WebID may reduce to the question of how to 
 get a certificate which includes a 'good' X.509 Subject Alternative Name, 
 with 'good' here meaning something like 'the FOAF file I (apparently or to my 
 surprise) already have'.
 
 Now, while there's a very small number who might want to do the whole thing 
 from scratch, there's a larger number of people who might already have a FOAF 
 file somewhere, and a still larger number of people (possibly all of 
 Facebook? -- did they ever actually do this?) who have a FOAF profile but 
 don't know it by that name.
 
 As in...
 
 But actually I didn't; what I wanted was a WebID that didn't create an 
 account somewhere (most of the sites I found offer an account that comes 
 with a WebID as a side-effect).
 
 So you want the inverse of this, in some loose sense.
 
 What probably would work in this case is a service which allows two steps:
 
  1. You can say: I've got a preexisting account at Network X; can you give me 
 a WebID which will point to that?
 
  2. The service says:  yes, they do FOAF, so (a) here's a WebID certificate 
 which points to that, for you to put in your browser, and (b) tell Network X 
 to do ... blah.
 
 Step 1 is probably not t hard (especially if people can say I've got 
 this FNOF profile thing I've been told you tell you about).
 
 Step 2a is still going to be fiddly (X.509 + browser = baldness), but I 
 imagine that it's the 'blah' in step 2b that will require network by network 
 cooperation.  Though all it would require is for the user to upload their new 
 WebID certificate to the cooperating service for it to work out what the 
 WebID is that it should add to the preexisting user's FOAF profile.
 
 So you choose which network gets to edit and serve your FOAF file for you, 
 and only have to mention that on one occasion, when talking to a 
 make-me-a-WebID service.  You'd never have to go back to that WebID-creating 
 service again.  In other words, unlike OpenID, you don't even need a 
 redirection step.
 
 Does that work?
 
 All the best,
 
 Norman
 
 
 -- 
 Norman Gray  :  http://nxg.me.uk
 SUPA School of Physics and Astronomy, University of Glasgow, UK

1 2 3 4 5 >

1 - 100 of 458 matches

Mail list logo