I am not completely familiar with DOI. Am I right, that it more or less
provides the same service as http://purl.org .
DOI links on the resource-level. You would still need frag ids to link
to parts.
Firefox can actually handle this:
http://dx.doi.org/10.1038%2Fscientificamerican1210-80#atl
If I am right, DOI also wouldn't be able to provide links to the 40
million mentions contained in the Wiki links corpus:
http://techcrunch.com/2013/03/08/google-research-releases-wikilinks-corpus-with-40m-mentions-and-3m-entities/
That's 40 million DOIs ....
See the data excerpt below.
All the best,
Sebastian
URL
ftp://217.219.170.14/Computer%20Group/Faani/vaset%20fani/second/sattari/word/2007/source/s%20crt.docx
MENTION vacuum tube 421 http://en.wikipedia.org/wiki/Vacuum_tube
MENTION vacuum tubes 10838 http://en.wikipedia.org/wiki/Vacuum_tube
MENTION electron gun 598 http://en.wikipedia.org/wiki/Electron_gun
MENTION fluorescent 790 http://en.wikipedia.org/wiki/Fluorescent
MENTION oscilloscope 1307 http://en.wikipedia.org/wiki/Oscilloscope
MENTION computer monitor 1503 http://en.wikipedia.org/wiki/Computer_monitor
MENTION computer monitors 3066 http://en.wikipedia.org/wiki/Computer_monitor
MENTION radar 1657 http://en.wikipedia.org/wiki/Radar
MENTION plasma screens 2162 http://en.wikipedia.org/wiki/Plasma_screen
Each file is in the following format:
-------
URL\t<url>\n
MENTION\t<mention>\t<byte_offset>\t<target_url>\n
MENTION\t<mention>\t<byte_offset>\t<target_url>\n
MENTION\t<mention>\t<byte_offset>\t<target_url>\n
...
TOKEN\t<token>\t<byte_offset>\n
TOKEN\t<token>\t<byte_offset>\n
TOKEN\t<token>\t<byte_offset>\n
...
\n\n
URL\t<url>\n
...
Am 02.05.2013 22:36, schrieb Dawson, Laura:
Short DOIs for fragment IDs?
From: Sebastian Hellmann <[email protected]
<mailto:[email protected]>>
Date: Thursday, May 2, 2013 4:33 PM
To: Paul Groth <[email protected] <mailto:[email protected]>>
Cc: Steve Pettifer <[email protected]
<mailto:[email protected]>>, Sarven Capadisli
<[email protected] <mailto:[email protected]>>, "[email protected]
<mailto:[email protected]>" <[email protected] <mailto:[email protected]>>
Subject: Re: Final CFP: In-Use Track ISWC 2013
Resent-From: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Resent-Date: Thursday, May 2, 2013 4:34 PM
Open annotation is great. Really powerful and well designed ontology
and model. It doesn't replace fragment ids, however. Both are necessary:
frag ids to link with in simple use cases (e.g. HTML) and the other
one to annotate properly.
A bridge between them would be nice.
All the best,
Sebastian
Am 02.05.2013 18:00, schrieb Paul Groth:
Hi Sebastien,
I use latex as well. Utopia is a pdf reader.
But utopia does support referencing bits of the pdf. As I understand,
they are moving to extending the open annotation ontology. I've cc'd
Steve Pettifer who created Utopia and who will known the ins-and-outs.
Currently, they store all the annotations separately.
Thanks
Paul
On Thu, May 2, 2013 at 5:21 PM, Sebastian Hellmann
<[email protected]
<mailto:[email protected]>> wrote:
Hi Paul,
personally for me latex works best, because it has good editors
and support for description logic formulas. Plus it is widely
used and quite good for PDF typesetting.
It would be really swell to be able to address content within PDF
with identifiers. Did Utopia solve that problem?
I am asking along the lines of
- mediafragments [1]
- RFC 5147 text fragment identifier (see the example at the
bottom of [2])
- xpointer/xpath [3]
If yes, I would like to use it immediately. There are plans to
convert the Google Mention corpus (which includes PDF's) to NIF [2] .
The PDF Open Parameters provided by [4] are way too simple.
All the best,
Sebastian
[1] http://www.w3.org/TR/media-frags/
<http://www.w3.org/TR/media-frags/>
[2] (example is at the bottom of .ttl file)
http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core
[3] e.g. http://example.com/exampledoc.html#xpath(/html/body
<http://example.com/exampledoc.html#xpath%28/html/body>[1]/h2[1]/span[1]/text()[1])
[4]
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf#page=7
Am 02.05.2013 12:55, schrieb Paul Groth:
Hi Sarven,
Beyond the PDF for me is moving beyond the current research
communication system as highlighted by the Force 11 manifesto
(http://www.force11.org/white_paper). This includes adopting
technologies that augment/extend (i.e. go beyond) existing
technologies. For example, making data easily accessible and
citable, providing links to online content, making multiple
perspectives on content available, exposing provenance, using
altmetrics. I'm very influenced by the work on Utopia
(http://utopiadocs.com) so that's why I think using pdfs are
fine - you can do a lot with them as they stand - and for a
certain form of communication (written long form text) they work
well. As technologist we need to make sure that these new
technologies work well in the environment and connect to other
things.
cheers
Paul
On Thu, May 2, 2013 at 12:32 PM, Sarven Capadisli
<[email protected] <mailto:[email protected]>> wrote:
On 05/02/2013 12:23 PM, Paul Groth wrote:
I think Harry makes the point better than I can.
Paul, I have one last question for you if you don't mind,
because it seems like you are not interested in playing this
out and I don't want to bother you further: what does
"beyond the PDF" mean to you?
-Sarven
--
-----------------------------------------------------------------------------------
Dr. Paul Groth ([email protected] <mailto:[email protected]>)
http://www.few.vu.nl/~pgroth/ <http://www.few.vu.nl/%7Epgroth/>
Assistant Professor
- Web & Media Group | Department of Computer Science
- The Network Institute
VU University Amsterdam
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013
(http://nlp-dbpedia2013.blogs.aksw.org, Deadline: *July 8th*)
Venha para a Alemanha como PhD:
http://bis.informatik.uni-leipzig.de/csf
<http://bis.informatik.uni-leipzig.de/csf>
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org <http://aksw.org>
--
-----------------------------------------------------------------------------------
Dr. Paul Groth ([email protected] <mailto:[email protected]>)
http://www.few.vu.nl/~pgroth/ <http://www.few.vu.nl/%7Epgroth/>
Assistant Professor
- Web & Media Group | Department of Computer Science
- The Network Institute
VU University Amsterdam
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org