Hi Sarven,
PDF has several big advantages:
- easy to produce by latex, because of good editor
- I can be sure of how it looks like in 99% of the PDF viewers
- there aren't any incentives for me to switch (personal benefits seem
marginal)
Let's be honest: HTML is not really perfect and it doesn't have all the
advantages you would like it to have. As you might know, HTML 5 now
tries to fix a lot of practical problems, i.e. browser compatibility, a
thing PDF does not have.
Also: *both* PDF as well as HTML can not be scraped well and they also
can not be addressed well.
Please look at Sören, Jens and my citation page:
http://www.informatik.uni-leipzig.de/~auer/index.php?n=Main.Publications
http://jens-lehmann.org/publications
http://bis.informatik.uni-leipzig.de/SebastianHellmann#h520-8
Mine is not up to date and I would rather invest more time in updating
the content, than layout or machine readable information. So they are
pretty much the same as references in PDF.
Links pointing into HTML are terribly under-developed as well. There are
only anchors and xpointer/xpath[1]. The second one is not implemented by
browsers like Firefox.
Please note that xpointer/xpointer is not a finished standard[2].
I think, the advantages of HTML are over-rated at the moment. It is
getting better, but still a long way to go.
Actually, I tried using HTML already, when sending out call for papers.
First as attachment [4], but these were removed at some mailing lists.
Then I tested to write the call in HTML directly, but the layout was
terrible. So now, I am back to Markdown [5], because I seem to suck at
producing well layouted HTML .
I really would like to focus on content and have the rest handled by
machines. My job title is "researcher" not "layouter" . Markdown, Latex,
PDF seem to get the job done.
Also being a chair means, that you write several hundred emails,
micro-manage peer-reviewing, publish call for papers, make a schedule,
etc.... I am quite happy, when everybody hands in decent latex (an not
.doc ) + a signed license agreement. There is just no time for more.
So the real problem in my opinion is, that we are really not there yet,
technologically as well as research-wise.
HTML copy and paste only seems to work 2/3 of times due to boundary
problems, recently I copied google doc content (also HTML) into
Wordpress TinyMCE and it looked terrible.
This discussion is going in circles because HTML fans are over-eager
and fail to judge HTML realisticly. I think, we should try to provide
content in structured format and then research ways to transform them
effectively. This seemed to be the idea behind XML + XSLT as well as
HTML + CSS, maybe we can take it one step further....
@Sarven: If you are so interested in this, why don't you dig down
systematically and try to find the current problems and barriers. This
is actually a great research project in my opinion.
all the best,
Sebastian
PS: By the way, content is findable fine in any format with a little
help from our friend [3]
[1] http://www.w3.org/TR/xpath20/
[2] http://www.w3.org/TR/xptr-xpointer/
[3] http://lmgtfy.com/?q=Linked-Data+Aware+URI+Schemes+for+Referencing+Text
[4] http://lists.w3.org/Archives/Public/public-lod/2012Nov/0001.html
[5] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0456.html
Am 02.05.2013 19:38, schrieb Sarven Capadisli:
On 05/02/2013 06:55 PM, Norman Gray wrote:
I'm now thoroughly confused by this conversation.
Allow me to summarize: "Linked Science is brought to you by PDF" [1]
Talking about LaTeX...
On 2013 May 2, at 17:02, [email protected] (Phillip Lord)
wrote:
Sebastian Hellmann <[email protected]> writes:
Plus it is widely used and quite good for PDF typesetting.
And sucks on the web, which is a shame. If I could get good HTML
out of it, I would be a happy man.
_What_ sucks on the web? Certainly not PDF.
HTML/Web, PDF/Desktop?
There are hassles with PDFs, yes. In particular, (i) embedding
metadata is underdeveloped (XMP is undertooled), and (ii)
deep-linking into PDFs could be better, as has been discussed. HTML
is naturally better at both of these, but neither is a real problem.
(i) between DOIs and metadata from journal webpages, most of the
important stuff is available without major difficulty, and various
organisations (eg ORCID) are labouring away at making a very messy
problem better. (ii) would be nice to solve (and perhaps Utopiadocs
is the way to do it), but doesn't, as far as I can see, offer major
advantages beyond 'See sect. xxx'. Most text is, after all, consumed
by humans, and articles tend not to be tens of pages long.
Thus HTML can do some unimportant things better than PDF,
Web pages. It will never take off.
but what it
can't do, which _is_ important, is make things readable. The visual
appearance -- that is, the typesetting -- of rendered HTML is almost
universally bad, from the point of view of reading extended pieces.
I haven't (I admit) yet experimented with reading extended text on a
tablet, but I'd be surprised if that made a major difference.
I think you are conflating the job of HTML with CSS. Also, I think you
are conflating readability with legibility as far as the typesetting
goes. Again, that's something CSS handles provided that suitable fonts
are in use. What you are probably viewing on an average webpage is the
common "works on most machines" fonts e.g., Arial. I don't know
whether the PDF reader for instance does magic behind the scenes to
smooth things out or crisp things up - whatever additional
instructions it may have. Needless to say, this is the job of the
reader AFICT. If you put the effort into CSS, it might just give
something pretty.
I'll also admit that I have not experimented with the exact
differences in quality.
Also, HTML is not the same as linked data; there's no 'dog food' here
for us to eat.
That's quite a generalization there? So, I would argue that "HTML" is
more about eating dogfood in the Linked Data mailing list than
parading on PDF. We are trying to build things one step at a time;
HTML today, a URI that it can sit on tomorrow. Additional
machine-friendly stuff the day after.
So, if conferences want to promote PDF, perhaps they should jump over
to public-lod-pdf-print-industry-and-friends mailing list? :)
Is it possible that folk here are conflating 'LaTeX' with the quite
startlingly ugly ACM style? That's almost as unreadable as HTML.
Nothing to do with HTML unless you are thinking of loading the default
browser styles and using that as the measure for readability.
[1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html
-Sarven
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org