Re: Document fragment vocabulary

Michael Hausenblas Mon, 15 Aug 2011 22:14:27 -0700

It is not really LinkedData friendly.




Why?

@Michael: is there some standardisation respective URIs for textgoing on?

As you've rightly identified, an RFC already exists. What would thisnew standardisation activity be chartered for?


As and aside, this reminds me a bit of http://xkcd.com/927/

The approach by Wilde and Dürst[1] seems to lack stability.

I don't know what you mean by this. Lack of take-up, yes. Stability,what's that?

Do you think we could do such standardisation for document fragmentsand text fragments within the Media Fragments Group[3] ?




No. Disclaimer: I'm a MF WG member. Look at our charter [1] ...


Maybe this thread should slowly be moved over to [email protected] [2]?


Cheers,
        Michael

[1] http://www.w3.org/2008/01/media-fragments-wg.html
[2] http://lists.w3.org/Archives/Public/uri/
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 05:40, Sebastian Hellmann wrote:

Hi Michael and Alex,
sorry to answer so late, I was in holiday in France.
I looked at the three provided resources [1,2,3] and there are stillsome comments and questions I have.
1. The part after the # is actually not sent to the server. Arethere any solutions for this? It is not really LinkedData friendly.
Compare 
http://linkedgeodata.org/triplify/near/51.033333,13.733333/1000/class/Amenity
(Currently not working, but it gives all points within a 1000m radius)
The client would be required to calculate the subset of triples fromthe resource, that are addressed.
2. [1] is quite basic and they are basically using position andlines. I made a qualitative comparison of different fragment idapproaches for text in [4] slide 7.I was wondering if anybody has researched such properties of URIfragments. Currently, I am benchmarking stability of these urisusing Wikipedia changes.
Has such work been done before?
3. @Alex: In my opinion, your proposed fragment ontology can onlybe used to provide documentation for different fragments.
I would rather propose to just use one triple:
<http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418>a <http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString>The ontology I made for Strings might be generalized for formatsother than text based [5]One triple is much shorter. As you can see I also tried to encodethe type of fragment right into the fragment "offset", although anotation like "type=offset" might be better.
4. @Michael: is there some standardisation respective URIs fortext going on?I heard there would be a Language Technology W3C group. The approachby Wilde and Dürst[1] seems to lack stability.Do you think we could do such standardisation for document fragmentsand text fragments within the Media Fragments Group[3] ?I really thought the liveUrl project was quite good, but it seemsdead[6].
In LOD2[7] and NIF[8] we will need some fragment identifiers toStandardize NLP tools for the LOD2 stack.It would be great to reuse stuff instead of starting from scratch. Ihad to extend [1] for example, because it did not produce stableuris and also it did not contain the type of algorithm used toproduce the URI.
All the best,
Sebastian


[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
[3] http://www.w3.org/TR/media-frags/
[4] http://www.slideshare.net/kurzum/nif-nlp-interchange-format
[5] http://nlp2rdf.lod2.eu/schema/string/
[6] http://liveurls.mozdev.org/index.html
[7] http://lod2.eu
[8] http://aksw.org/Projects/NIF

Am 04.08.2011 22:37, schrieb Michael Hausenblas:
Alex,
Has something already done this? Is it even (mostly?) sane?
Sane yes, IMO. Done, sort of, see:

+ URI Fragment Identifiers for the text/plain [1]
+ URI Fragment Identifiers for the text/csv [2]

Cheers,
    Michael

[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment

--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
Say I have an XML document, <http://example.org/something.xml>,and Iwant to talk about about some part of it in RDF. As this is XML,beingable to point into it using XPath sounds ideal, leading tosomething like:
<#fragment> a fragment:Fragment ;
 fragment:within <http://example.org/something.xml> ;
 fragment:locator "/some/path[1]"^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML namespaces.)

More generally, we might want other ways of locating fragments
(probably with a datatype for each):

* character offsets / ranges
* byte offsets / ranges
* line numbers / ranges
* some sub-rectangle of an image
* XML node IDs
* page ranges of a paginated document

Some of these will be IMT-specific and may need some more thinking
about, but the idea is there.


Has something already done this? Is it even (mostly?) sane?


Yours,

Alex


NB. Our actual use-case is having pointers into an NLM XML file
(embodying a journal article) so we can hook up our in-textreference
pointer¹ URIs to the original XML elements (<xref/>s) they were
generated from. This will allow us to work out the context of each
citation for use in further analysis of the relationship between the
citing and cited articles.

¹ See
<http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/>
for an explanation of the terminology.

- --
Alexander Dutton
Developer, data.ox.ac.uk, InfoDev, Oxford University ComputingServices
          Open Citations Project, Department of Zoology, University
of Oxford
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
=UcCr
-----END PGP SIGNATURE-----
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Re: Document fragment vocabulary

Reply via email to