RE: See Other

john.nj.davies Wed, 28 Mar 2012 01:37:47 -0700

"I'm doing this because of promise of interestingly unexpected re-use"

Dan - you are not convincing me of the cost-benefit trade off here...!

-----Original Message-----
From: Dan Brickley [mailto:[email protected]]
Sent: 28 March 2012 02:31
To: Melvin Carvalho
Cc: Jeni Tennison; [email protected]; public-lod community
Subject: See Other

On 27 March 2012 20:23, Melvin Carvalho <[email protected]> wrote:

> I'm curious as to why this is difficult to explain.  Especially since
> I also have difficulties explaining the benefits of linked data.
> However, normally the road block I hit is explaining why URIs are important.

Alice: So, you want to share your in-house thesaurus in the Web as 'Linked 
Data' in SKOS?

Bob: Yup, I saw [inspirational materials] online and a few blog posts, it looks 
easy enough. We've exported it as RDF/XML SKOS already. Here, take a look...

[data stick changes hands]

Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it with a 
real RDF parser (made by Dave Beckett who worked on the last W3C spec for this 
stuff, beats me actually checking it myself) and it didn't complain. So looks 
fine! Ok so we'll need to chunk this up somehow so there's one little record 
per term from your thesaurus, and links between them... ...and it's generally 
good to make human facing pages as well as machine-oriented RDF ones too.

Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa, right? 
Which version?

Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking from 
microdata and microformats communities. But I meant that you'd make a version 
of each page for computers to use (RDF/XML like your test export here), ... and 
you'd make some kind of HTML page for more human readers also. The stuff you 
mention is more about doing both within the same format...

Bob: Great. Which one's the most standard?  What should I use?

Alice: Well I guess it depends what you mean by standard.
[skips digression about whatwg and w3c etc notions of standards process] [skips 
digression about XHTML vs XML-ish polyglot HTML vs resolutely non-XML HTML5 
flavours] [skips digression about qnames in HTML and RDFa 1.1 versus 1.0]

...you might care to look at using basic HTML5 document with say the Lite 
version of RDFa 1.1 (which is pretty much finished but not an official stable 
standard yet at W3C)

Bob: [makes a note]. Ok, but that's just the human-facing page, anyway. We'd 
put up RDF/XML for machines too, right? Well maybe that's not necessary I 
guess. I was reading something about GRDDL and XSLT that automates the 
conversion, ... should we maybe generate the RDF/XML from the HTML+RDFa or vice 
versa? or just have some php hack generate both from MySQL since that's where 
the stuff ultimately lives right now anyway...?

Alice: Um, well it's pretty much your choice. Do you need RDF/XML too?
Well..... maybe, not sure... it depends. There are more RDF/XML parsers around, 
they're more mature, ... but increasingly tools will consume all kinds of data 
as RDF. So it might not matter. Depends why you're doing this, really.

Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces, ... maybe 
watch the stats and see what's being picked up? I'm doing this because of 
promise of interestingly unexpected re-use and so on, which makes details hard 
to predict by definition.

Alice: Sounds like a plan. Ok, so each node in your RDF graph, ...
we'll need to give it a URI. You know that's like the new word for URL, but 
that includes identifiers for real world things too.

Bob: Sure sure, I read that. Makes sense. And I can have a URI, my homepage can 
have a URI, I'm not my home page blah-de-blah?

Alice: You got it.

Bob: Ok, so what URLs should I give the concepts in this thesaurus?
They've got all kinds of strings attached, but we've also got nicely managed 
numeric IDs too.

Alice: Right so maybe something short (URls can never be too short...), ... so 
maybe if you host at your example.org server,
http://example.com/demothes/c1  then same but /c2 /c3 etc.

... or well you could use #c1 or #c2 etc. That's pretty much up to you. There 
are pros-and-cons in both directions.

Bob: whatever's easiest. It's a pretty plain apache2 setup, with php if we want 
it, or we can batch create files if that makes more sense; this data doesn't 
change much.

Alice: Well how big is the thesaurus...?

Bob: a couple thousand terms, each with a few relations and bits of text; maybe 
more if we dig out the translations (humm should we language negotiate those 
somehow?)

Alice: Let's talk about that another day, maybe?

Bob:  And hmm the translations are versioned a bit differently? Should we put 
version numbers in somewhere so it's unambiguous which version of the 
translation we're using?

Alice: Let's talk about that another day, too.

Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that sounds 
fine.

... we'd put some content negotiated apache thing there, and make c1 send HTML 
if there's a browser, or rdf/xml if they want that stuff instead? Default to 
the browser / HTML version maybe?

Alice: something like that could work. There are some howtos around.
Oh, but if c1 isn't an information resource, you'll need to redirect with a 303 
HTTP code. It's like you said with people and homepages, to make clear which is 
which.

Bob: Oh-kay... so in our SKOS graph, it's a mix of things, the bulk is a load 
of descriptions of skos:Concept and there's a bit of metadata in there about 
some docs, and the admin contact info, ...  but yeah it's mostly the concepts 
(which seems to be the skos way to talk about thesaurus terms, sort of 
abstracted a bit to make translations easier, right?)

Alice: Yup. Well, ... remember we're breaking up your graph into bits... like 
one chunk per page?

Bob: Ah right, so is that one node in the graph per page? per ... erm, how do 
they call it? [counts on fingers] subject-predicate-object...
er subject, right? Each object in my graph, er like OO object I mean, entity, 
thingy...

Alice: -thingy is good-

Bob: Each thing in the graph, goes in one page, more or less?

Alice: more or less. It's up to you, I guess there are best practices, roughly 
the bulk of it, one page per concept, ... and then the metadata etc you might 
do differently

Bob: Ok, so c1 is one concept, c2 is another, ... they'd have links to each 
other in the ... the RDF/XML files, right? And I guess the HTML too, sure

Alice: Sure

Bob: so the html rdfa stuff would be <a href='c2'>something and rel='broader' 
if c1 was broader than c2?

Alice: er it might be broaderTerm, or broaderConcept, I forget... [searches]

Bob: ah look, yeah skos:broader, ... ok so if c2 is more broad, er broader, 
more general, than c1, ... we put in the c1 HTML page a link over to c2, and 
add some RDFa too, to say what the link means in semantic rdf speak as well as 
clickable-link?

Alice: [tips head on side], ... sorry I always get this stuff back to front. 
Ok, slowly. c2 is broader than c1, ... 'broader' points to the one that's 
broader, like you know more general, ... so let's say c1 is the specific, 
detailed one. In the c1 HTML page, we'd ...

Bob: [interrupting] would that be c1.html? like concept ID dot h t m l, as a 
pattern?

Alice: yes, you could call it that, ... it's up to you really but obviously 
it's sort of conventional. But then there's another convention of keeping the 
file types out of URLs

Bob: So in the filesystem they might be a bunch of batch-generated HTML files 
called c1.html c2.html etc, but I'd keep that secret or obscure or hide it with 
apache config somehow?

Alice: For example, yes. But ok, so c1.html would be like "blah blah blah, and 
then a paragraph describing concept c1 from your  thesaurus, ... which is (we 
say) some pretty specific topic, like er, say "allergy to pine nuts'... and 
maybe c2 is just 'pine nuts'

Bob: Well it's an engineering terminology thesaurus, but sure. I get the idea. 
So we'd do <a href='/demothes/c2' rel='broader' ...

Alice: in rdfa 1.1 lite that's property='broader', erm property='skos:broader', 
... but sure, something like that. you might put the relationship first, it 
reads better. I think it means the same formally.

Bob: right right, ... and in c2 HTML page, we'd do the link back the other way? 
is there a skos word for the opposite of broader, skos:narrower? [searches] ... 
ok looks like it, ... so I'd use that?
it's sort of redundant I guess if you crawl all the pages, ... but you have to 
find the pages and links somehow, ... what if I started with some linked data 
agent thing on c2.html, how would it find c1.html to find that c1.html says 
that c2.html is broader?

Alice: Good point. We can work some of this out later. There are also sitemap 
files, so in page links aren't the only way to find stuff.
It's all sort of emerging best practices territory. Lots of early adopters 
figuring things out, if we get this working, maybe you could write up a case 
study?

Bob: Or you could just tell me what to do. Hey, whatever happened to rev= ... 
is that still in XHTML?

Alice: Which version? I mean, ... can we talk about this later?

Bob: Right right. But couldn't I put "rev='skos:broader'" in c2.html, ...

Alice: [patiently] ... you could, yes. Or both... there's a lot of flexibility 
in this system. In many ways it's a huge strength...

Bob: Oh hang on, I found
http://www.w3schools.com/TAGS/att_link_rev.asp and it says rev isn't supported 
by browsers; is that a problem.

Alice: We're getting off the point a bit, ... Anyway I think Hixie took it out 
of HTML5 because it wasn't being used and people found it confusing. Or last 
time I looked anyway, I think it was gone.

Bob: Righto. I can see that. So anyway, we'll make a load of HTML pages that 
describe our concepts...

Alice: Yup, and we'll redirect /demothes/c2 to a page about c2, ... so things 
don't mix up information resources with non information resources. Oh and I'm 
not sure w3schools is always the best reference on this stuff...

Bob: things on the Web and things that aren't on the Web. Ok, if not w3schools, 
where should I check?

Alice: [ignoring w3schools question] ... exactly. things that aren't _on_ the 
Web. Or _in_ if you prefer. Like your concepts are a kind of abstraction so 
they're not really on the Web, ... they're just _described_ in the Web.

Bob: so we redirect to c1.html etc?

Alice: Sure we could do that, or if you want to keep the suffix out of the URL, 
which is considered good hygeine by some, you might for example use 
/demothes/about_c1 ... that's quite clean

Bob: And if we get a content negotiated request for rdf/xml ...? ...
send that instead, ... no redirecting

Alice: something like that, I'll check the docs for you later. It's a bit 
fiddly but there are some examples around we can copy from, httpd.conf etc

Bob: Great. And if someone asks for the rdf/xml version of about_c1?

Alice: Not sure, I'll have to think a bit, but ... well sending the rdf along 
sounds ok. It's not quite the same as asking for c1 but ...
well sure. Why not?

Bob: What was the other option? #c1 ? No messing around with redirects there? 
Easier to bookmark?

Alice: Well yeah, ... and to link to, ... but your data isn't tiny, ... a few 
thousand concepts you said. Could be a big page fetch each time.

Bob: Is that a problem? How big is too big? We can cache internally so it's not 
hitting the db, right? Will intelligent agents and so on be reading this a lot? 
Do they choke on big files?

Alice: Well, maybe not so intelligent. But the way URIs and URLs work, when 
there's a # in them, ... that doesn't get sent to the server  and so the server 
doesn't see the #c1 or #c2 or #c9999 bit, ... so it can only really send you 
the whole lot and the consuming code has to make sense of it by remembering 
what it asked for...

Bob: ...well maybe this is still easier. And we can content negotiate still, 
right?

Alice: sure. HTML+RDFa or RDF/XML or ... you heard of turtle and ntriples and 
there's this thing called json-ld ... but don't worry about that for now. Let's 
just think about RDF/XML and HTML+RDFa today, eh?

Alice: [thinking...] well maybe just one of those would do, ... but it's not 
hard to generate both.

Bob: Alright, so one big HTML+RDFa file with the thesaurus in it, in SKOS 
triples but prettied up a bit with CSS? Sounds ok...

Alice: and a big RDF/XML doc too, if they ask for that instead

Bob: got it. So ... hang on, back up a bit, ... if we're in one big HTML page, 
and I'm at the er what did you say, 'allergy to pine nuts'
section, ... and I want to link to show that this concept has a ... a broader 
one which is just 'pine nuts', ... I put in '<a href='c1'
property="skos:broader"> within the c2 bit?

Alice: c1 was the broader one, I forget?

Bob: er c2 was broader, general ... Pine Nuts only. So yup, within pine nuts 
section of this big HTML page at /demothes, we'd link up (or down, guess it 
doesn't matter the page order?) to the #c1 section.
Remind me, I always mix up, is that <a name="c1"> or <a id="c2">?

Alice: it's a little bit complicated [searches] but 
http://stackoverflow.com/questions/484719/html-anchors-with-name-or-id
seems to cover it... ...er but look it's a bit fiddly this way, never mind the 
HTML attribute name for now we can look that up ... you don't want to call it 
c1 exactly, because that's the name of your concept

Bob: And concepts aren't information resources?

Alice: well obviously they sort of are _informational_ so that's why some 
people don't like that terminology, ... but that doesn't matter, the thing is 
they're not ... you know HTTP endpointy things, ... like data objects attached 
to a Web server, ... they're more abstract

Bob: and so also they're not bits of an HTML page either? Right? So if I go 
linking with <a href="/demothes#c1" blabla, that's implying that
c1 is a bit of a Web page... so that's an information resource, ...
and really it's not because it's a thesaurus concept which is more a sort of 
social entity or conceptual or mental or something, ... not inside my server or 
page like a concrete information object?

Alice: Y...es. Well you're mixing two things here a bit. Or three.
Hang on. Two. Right:
  First. We slipped from talking about the target of HTML hyperlinks (the 
id/name attribute stuff) to the markup at the start end of the link. <a 
href="/demothes#c1" is fine, so long as you're not really pointing at a page 
that has a section with name (or id) of 'c1'. It's the name end, the target 
stuff, that you can't put the thingy's URI into. It's ok to point because ... 
you're sort of saying something.
But if you write the target markup, you're saying that c2 is part of the page. 
Which it isn't.

Bob: o...k. Seems oddly asmmetrical somehow. But the '
href="/demothes#c1" ' HTML ... it's pointing at a page, right? And if we go to 
it in a browser (unless it has a bunch of funny extensions, ... I got in a mess 
one time with Firefox addons I was trying, ...) ... we go to it in a browser 
we'll get an HTML page. And there'll be a bit of the page decribing our 
concepts c1 and c2 ... and in theory links can jump you down the classic way, 
to where you want to read?
That's nice to have in documentation.

Alice: Yes yes, ... just we don't name the page target parts, erm anchors, with 
that same name. As the skos thing, concept, I mean.

Bob: Because it's not a webby thing it's a real worldy thing, even if it's 
still sort of about information? Like a book in my hand also is?

Alice: Exactly

Bob: [beams]

Alice: So, ... right, ... we don't name the in-page targets the same as the 
things those bits of page describe?

Bob: Ok, so like with the other design, we could call the page bits 
#c2_bit_of_the_page or something less verbose, just not #c2 because we already 
used that ID for something 'off Web', the concept itself?

Alice: Yes.

Bob: Doesn't that screw up scrolling?

Alice: Well you could use some jquery thing  I found and that's quite nice 
actually because it scrolls smoothly and degrades gracefully and ... wait wait 
I'm talking nonsense, ... sorry. It's fine. You just put in two anchors and two 
links?

Bob: They're called anchors at both ends of the link, right? Sort of nautical 
idea... ... in RDF links too?

Alice: Er yeah yes. It's <a>. We don't really talk about anchors so much in the 
abstract rdf model. But it's a similar idea, hence the Linked Data thing?

Bob: So the subject is an anchor, the predicate is like a kind of link (that's 
a 'rel' or whatever?) and the object in the triple is an anchor too?

Alice: Well not exactly. You're ... well, sure. Yes. If that helps you think 
about it, ...  but what I meant to say was forget about jquery, it'll still 
scroll and stuff in the browser, so long as you have anchors like <a 
name="c2_bit_of_page" and <a name="c2">. Then for each semantic link, you can 
link to the semantic target - like c2 - with that, ... and in the human facing 
link, ... link to c2_bit_of_page.
Hmm no wait, you'd want to hide the human one because when you click it it 
won't go to the part of the page, and you shouldn't put in a name="c2". Sorry, 
I'm tired.

Alice: Ok sure, look is scrolling to the bit of the page important for you? 
Maybe the jquery thing could work? Or you can do something in CSS. It can't be 
that hard. There are lots of ways to do things with RDFa. If you get stuck we 
can ask in IRC or Twitter, people are really helpful (though not everyone 
agrees about this hash stuff and 303s)

Bob: So which is simplest, really? Really big files: bad with #, ...
but # lets you bookmark, ... but makes linking down to the right bit of the 
page somehow confusing...?

...could I rewrite the links in Javascript maybe? Ok ok, ... how's this! How 
about we make the HTML+RDFa page all nice and semantic, but put in a javascript 
that when the page loads --- only in browsers --- it rewrites all the links to 
be #bit_of_page links, ... then when clicked ... boom you hop to the right bit 
of the page. And still there's a big RDF/XML file content negotiable for older 
tools that don't read the HTML+RDFa .... everyone's happy!

Alice: You still need to update all the URIs in your text RDF/XML file to be 
this new pattern we agreed, ... and that javascript thing is half sick and half 
clever, ... maybe it'd be fine. There might be browser addons that get confused 
if javascript is messing with stuff. But we got distracted. I'm not sure of 
anything using this stuff much, you might check in tabulator at least to see 
what it does.

But we were talking about <a href="/demothes#c1" blabla,

... and I said you'd slipped from talking about the 'target' end of the anchor 
link,  to the 'source' end. And that the source in RDFa could mention things 
that weren't (and shouldn't) be actual HTML page targets. But also you were 
mixing up a bit, ... the idea of an information resource like a thing that's up 
there via some Web server and giving you content negotiated formats, ... with 
the idea of bits of a page being an information resource.

 ... but at least we agreed that your skos concepts aren't either of those; 
they're abstractions. So the Web pages sort of describe them, ... and the bits 
of a big rdfa html page if we go with the # option

Bob: ... we didn't really get on to the RDF/XML version

Alice: that's a bit simpler, in a way... because only machines look at it and 
it doesn't care about prettyness or usability or browser behaviour.

Bob: couldn't we put in some xslt,.. and make it work for both? like a 
stylesheet to make it into html+rdfa, ... browsers do that now don't they?

Alice: In theory maybe; in practice that's not really something people seem to 
do. But look, we're going with the 'put it all in one big page' # version, so 
we just make the rdf/xml use those as the URIs and ... well we're done.

Bob: Easy! So I just change some URLs and upload the RDF/XML, ... then write 
some script and make an HTML page with the right kinds of links.
....

Alice: And we'll figure out some way to make it jump to bits of the page? 
Maybe...

Bob: Can we do backlinks inside the page?

Alice: like broader concept links back to the narrower?

Bob: I can find some RDFa parser and put that in dynamically? or better to 
spell it out in the actual markup so the triples are there?

Alice: yes, maybe better

Bob: but ... that'll make the doc even bigger, ... if every broader term triple 
has an inverse link too.... guess it doesn't matter, webservers zip stuff on 
the fly don't they?

Alice: they can, yeah. but look you can always do the http 303 thing if you're 
worried about size of the file,... chunk it up. Do the slash thing. Are you 
expecting your thesaurus to grow at all?

Bob: it can be nice having one page per thingy, after all

Alice: they're both easier in different ways

Bob: thanks, you've been a big help. Would it be alright to just upload the 
skos dump file for now, ... maybe I'll zip it

Alice: we ought to fix those URIs sometime

Bob: maybe tomorrow?

Alice: maybe tomorrow...

RE: See Other

Reply via email to