Sarven and all, hello.

On 2013 May 2, at 18:38, Sarven Capadisli <[email protected]> wrote:

>> _What_ sucks on the web?  Certainly not PDF.
> 
> HTML/Web, PDF/Desktop?

PDF/Web, HTML/Desktop?  I'm not sure what you're trying to say here.

>> Thus HTML can do some unimportant things better than PDF,
> 
> Web pages. It will never take off.

No no, the web is massively successful.  HTML is a really clever hypertext 
format which is successful because it lets a number of things go wrong (it 
doesn't guarantee link integrity, links are all one-way, there's minimal text 
metadata, and so on).  These deficiencies are seriously smart things to use to 
create a global hypertext.  Web pages have taken off in a big way.

It does not follow that HTML-based hypertext solves all text problems.  In 
particular, there is nothing in the above set of clever properties which makes 
HTML obviously ideal for communicating long-form textual arguments.

And what is this 'desktop' of which you speak?  PDF is for making posters, 
presentations, on-screen documents, and on-tablet documents -- lots of very 
distinct layout problems there.  In the last case, you can even transfer the 
things to paper and read them in the bath, if you want.


> but what it
>> can't do, which _is_ important, is make things readable.  The visual
>> appearance -- that is, the typesetting -- of rendered HTML is almost
>> universally bad, from the point of view of reading extended pieces.
>> I haven't (I admit) yet experimented with reading extended text on a
>> tablet, but I'd be surprised if that made a major difference.
> 
> I think you are conflating the job of HTML with CSS. Also, I think you are 
> conflating readability with legibility as far as the typesetting goes. Again, 
> that's something CSS handles provided that suitable fonts are in use.

CSS can help make HTML pages more readable.  Myself, I usually put quite a lot 
of effort into the CSS which accompanies web pages I write.  But it takes a lot 
of effort to produce good CSS, and the case you're aiming to optimise is the 
case of a normal-length web-page (under 1000 words, say), with relatively small 
investments on the part of the reader.

Distributing PDF, you have easy and precise control over fonts, layout, and 
overall design (or rather, you in principle have access to a style which is 
carefully designed).  This makes it easy to produce something which is easy to 
read for thousands of words. 

But this is to some extent irrelevant, because I think we're now talking about 
a non-problem:

>> Also, HTML is not the same as linked data; there's no 'dog food' here
>> for us to eat.
> 
> That's quite a generalization there? So, I would argue that "HTML" is more 
> about eating dogfood in the Linked Data mailing list than parading on PDF. We 
> are trying to build things one step at a time; HTML today, a URI that it can 
> sit on tomorrow. Additional machine-friendly stuff the day after.

What, seriously, is the connection between HTML and linked-data?  If there is a 
deep connection, then HTML articles represent the linked-data community's 
dog-food, and it should be eaten.

But there is no such deep connection.

Certainly, HTML is one of the representations which a LD system will offer, 
because a data provider needs to produce a readily and flexibly rendered 
human-readable representation of the item data being named/offered.   That's a 
completely different thing from an article.

In another message in this thread, Alexander Garcia Castro says:

> I am right now struggling with a task as simple as getting citation data
> from PDFs. I dont want to say that the PDF is all bad but... come on,
> it had a place in the time when desktop was king. now we need to make
> effective use of content, the reality is simply that content is locked
> up in PDFs.

Sure: there are weaknesses in the way that article metadata is currently 
incorporated in PDFs.  DOIs, ORCIDs, arXiv identifiers, all of the 'Beyond PDF' 
experiments, and so on are all attempts to join the various dots here, and they 
are rapidly getting better.

Until we really get AI that can read the paper for us, there's nothing 'locked 
up in PDFs' that's more than (I exaggerate only slightly) a regular expression 
away.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK


Reply via email to