Re: Final CFP: In-Use Track ISWC 2013

Sebastian Hellmann Thu, 02 May 2013 13:32:18 -0700

Hi Sarven,
PDF has several big advantages:
- easy to produce by latex, because of good editor
- I can be sure of how it looks like in 99% of the PDF viewers

- there aren't any incentives for me to switch (personal benefits seemmarginal)

Let's be honest: HTML is not really perfect and it doesn't have all theadvantages you would like it to have. As you might know, HTML 5 nowtries to fix a lot of practical problems, i.e. browser compatibility, athing PDF does not have.

Also: *both* PDF as well as HTML can not be scraped well and they alsocan not be addressed well.


Please look at Sören, Jens and my citation page:
http://www.informatik.uni-leipzig.de/~auer/index.php?n=Main.Publications
http://jens-lehmann.org/publications
http://bis.informatik.uni-leipzig.de/SebastianHellmann#h520-8

Mine is not up to date and I would rather invest more time in updatingthe content, than layout or machine readable information. So they arepretty much the same as references in PDF.

Links pointing into HTML are terribly under-developed as well. There areonly anchors and xpointer/xpath[1]. The second one is not implemented bybrowsers like Firefox.

Please note that xpointer/xpointer is not a finished standard[2].

I think, the advantages of HTML are over-rated at the moment. It isgetting better, but still a long way to go.Actually, I tried using HTML already, when sending out call for papers.First as attachment [4], but these were removed at some mailing lists.Then I tested to write the call in HTML directly, but the layout wasterrible. So now, I am back to Markdown [5], because I seem to suck atproducing well layouted HTML .

I really would like to focus on content and have the rest handled bymachines. My job title is "researcher" not "layouter" . Markdown, Latex,PDF seem to get the job done.

Also being a chair means, that you write several hundred emails,micro-manage peer-reviewing, publish call for papers, make a schedule,etc.... I am quite happy, when everybody hands in decent latex (an not.doc ) + a signed license agreement. There is just no time for more.

So the real problem in my opinion is, that we are really not there yet,technologically as well as research-wise.HTML copy and paste only seems to work 2/3 of times due to boundaryproblems, recently I copied google doc content (also HTML) intoWordpress TinyMCE and it looked terrible.This discussion is going in circles because HTML fans are over-eagerand fail to judge HTML realisticly. I think, we should try to providecontent in structured format and then research ways to transform themeffectively. This seemed to be the idea behind XML + XSLT as well asHTML + CSS, maybe we can take it one step further....

@Sarven: If you are so interested in this, why don't you dig downsystematically and try to find the current problems and barriers. Thisis actually a great research project in my opinion.


all the best,
Sebastian

PS: By the way, content is findable fine in any format with a littlehelp from our friend [3]



[1] http://www.w3.org/TR/xpath20/
[2] http://www.w3.org/TR/xptr-xpointer/
[3] http://lmgtfy.com/?q=Linked-Data+Aware+URI+Schemes+for+Referencing+Text
[4] http://lists.w3.org/Archives/Public/public-lod/2012Nov/0001.html
[5] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0456.html

Am 02.05.2013 19:38, schrieb Sarven Capadisli:

On 05/02/2013 06:55 PM, Norman Gray wrote:
I'm now thoroughly confused by this conversation.
Allow me to summarize: "Linked Science is brought to you by PDF" [1]
Talking about LaTeX...

On 2013 May 2, at 17:02, [email protected] (Phillip Lord)
wrote:
Sebastian Hellmann <[email protected]> writes:
Plus it is widely used and quite good for PDF typesetting.
And sucks on the web, which is a shame. If I could get good HTML
out of it, I would be a happy man.
_What_ sucks on the web?  Certainly not PDF.
HTML/Web, PDF/Desktop?
There are hassles with PDFs, yes.  In particular, (i) embedding
metadata is underdeveloped (XMP is undertooled), and (ii)
deep-linking into PDFs could be better, as has been discussed. HTML
is naturally better at both of these, but neither is a real problem.
(i) between DOIs and metadata from journal webpages, most of the
important stuff is available without major difficulty, and various
organisations (eg ORCID) are labouring away at making a very messy
problem better.  (ii) would be nice to solve (and perhaps Utopiadocs
is the way to do it), but doesn't, as far as I can see, offer major
advantages beyond 'See sect. xxx'.  Most text is, after all, consumed
by humans, and articles tend not to be tens of pages long.

Thus HTML can do some unimportant things better than PDF,
Web pages. It will never take off.

but what it
can't do, which _is_ important, is make things readable.  The visual
appearance -- that is, the typesetting -- of rendered HTML is almost
universally bad, from the point of view of reading extended pieces.
I haven't (I admit) yet experimented with reading extended text on a
tablet, but I'd be surprised if that made a major difference.
I think you are conflating the job of HTML with CSS. Also, I think youare conflating readability with legibility as far as the typesettinggoes. Again, that's something CSS handles provided that suitable fontsare in use. What you are probably viewing on an average webpage is thecommon "works on most machines" fonts e.g., Arial. I don't knowwhether the PDF reader for instance does magic behind the scenes tosmooth things out or crisp things up - whatever additionalinstructions it may have. Needless to say, this is the job of thereader AFICT. If you put the effort into CSS, it might just givesomething pretty.
I'll also admit that I have not experimented with the exactdifferences in quality.
Also, HTML is not the same as linked data; there's no 'dog food' here
for us to eat.
That's quite a generalization there? So, I would argue that "HTML" ismore about eating dogfood in the Linked Data mailing list thanparading on PDF. We are trying to build things one step at a time;HTML today, a URI that it can sit on tomorrow. Additionalmachine-friendly stuff the day after.
So, if conferences want to promote PDF, perhaps they should jump overto public-lod-pdf-print-industry-and-friends mailing list? :)
Is it possible that folk here are conflating 'LaTeX' with the quite
startlingly ugly ACM style?  That's almost as unreadable as HTML.
Nothing to do with HTML unless you are thinking of loading the defaultbrowser styles and using that as the measure for readability.
[1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html

-Sarven



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig

Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,Deadline: *July 8th*)

Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf

Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Re: Final CFP: In-Use Track ISWC 2013

Reply via email to