Re: Petitioning ISWC to allow Web friendly formats

Stephen Williams Tue, 07 May 2013 01:42:14 -0700

Generally, I'd rather have semantically tagged reflowable CSS-enabled XHTML documents, epub like. However, PDFs serve a usefulpurpose too, and in some restricted cases it's hard to see how a particular goal can be achieved differently. An interestingoption is to do something like a "Hybrid PDF": store the original editable document and/or alternate forms (XHTML+CSS+semanticmarkup) in the PDF, automatically and reliably sensing those alternates at any point. LibreOffice includes this feature now:


http://blogs.computerworlduk.com/simon-says/2012/03/the-magic-of-editable-pdfs/index.htm

It's possible, at least to a large extent, to associate particular segments of data to particular rendered elements. OCRprograms make use of this to place resulting text in the same position as the graphic version of the text in a scanned page.This could allow copy and paste of semantically tagged data from a PDF just like an RDFa web page.


sdw

On 5/7/13 1:32 AM, RebholzSchuhmann wrote:

Hi,

I have seen similar discussions before.

I guess, we look at two different use cases:
(1) PDF: layout oriented, but could (and will, hopefully) carry a lot more semantics information. The key achievement is andwill be to have optimal layout, and on the other side the overhead for processing / exploitation / reuse goes up for everybodywho is NOT PDF-savvy.(2) the other open formats (Html, Xml, Pdf): allow easy-to-go exploitation, processing, and enrichment, and stand for thespirit of the open web and reuse of data.
Listening to publishers, certainly layout matters. I am not only talking about the big five or ten who would have theresources to go a different direction, I am talking about the 1,000 smaller publishers who have to serve their community. Theywould struggle more to comply with the other "standards" and still deliver an appealing product.
I guess, some clever thinking and collabortive work is required to bring both 
together.

Hope this helps.

    -drs-

On 07/05/2013 09:17, Steve Pettifer wrote:
I assume most authors don't actually format their documents by selecting a font 
size for every single heading and so on.
This is a tempting assumption to make, especially if you come from computer 
science / maths / physics and related disciplines (as I do). But my experience 
in the life sciences is that authors do 'paint' their manuscripts by hand, 
painstakingly selecting the font and format for every bit of their document. 
Even using the 'semantic' features of wordprocessors (such as 'Heading 1') is 
something that's not commonplace. So before we get too carried away with 
expecting people to write HTML / LaTex or even markup, we'll need to take into 
account the working practises of the vast majority of academics outside of the 
more 'semantically aware' bits of science.
They work in a format that utilizes semantically meaningful information about 
the work: to identify a title, headings, math blocks, illustrations, plots, etc.
No, they really don't. I wish they did. But, outside of a certain area of 
science, they don't.

Steve
--
D. Rebholz-Schuhmann -mailto:[email protected]



--
Stephen D. Williams [email protected] [email protected] LinkedIn: 
http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer

Re: Petitioning ISWC to allow Web friendly formats

Reply via email to