Re: scientific publishing process (was Re: Cost and access)

Kingsley Idehen Mon, 06 Oct 2014 08:21:10 -0700

On 10/6/14 10:25 AM, Paul Houle wrote:

Frankly I don't see the reason for the hate on PDF files.
I do a lot of reading on a tablet these days because I can take it to the gym or on a walk or in the car. Network reliability is not universal when I leave the house (even if I had a $10 a GB LTE plan) so downloaded PDFs are my document format of choice.
There might be a lot of hypothetical problems with PDFs, and I am sure there is a better way to view files on a small screen, but practically I have no trouble reading papers from arXiv.org, books from oreilly.com <http://oreilly.com>, be these produced by a TeX-derived or Word-derived toolchains or a toolchain that involves a real page layout tool for that matter.


Paul,

As I see it, the issue here is more to do with PDF being the only option, rather than no PDFs at all. Put differently, we are not using our "horses for course" technology (the Web that emerges from AWWW exploitation) to produce "horses for course" conference artifacts. Instead, we continue to impose (overtly or covertly) specific options that are contradictory, and of diminishing value.

Conferences (associated with themes like Semantic Web and Linked Open Data) should accept submissions that provide open access to relevant research data. In a sense, imagine if PDFs where submitted without bibliographic references. Basically, that's what happening here with research data circa. 2014, where we have a functioning Web of Linked (Open) Data, which is based on AWWW.

Loosely coupling the print-friendly documents (PDFs, Latex etc.), http-browser friendly documents (HTML), and actual raw data references (which take the form of 5-Star Linked Open Data ) is a practical staring point. Adding experiment workflow (which is also becoming the norm in the bio informatics realm) is a nice bonus, as already demonstrated by examples provided by Hugh Glaser (see: this weekend's thread).


Kingsley

On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory <[email protected] <mailto:[email protected]>> wrote:



    On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory <[email protected]
    <mailto:[email protected]>> wrote:

        Hello Community,

        On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis
        <[email protected] <mailto:[email protected]>> wrote:

            On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman <[email protected]
            <mailto:[email protected]>> wrote:
            > The real problem is still the missing tooling. Authors,
            even if technically savy like this community, want to do
            what they set up to do: write their papers as quickly as
            possible. They do not want to spend their time going
            through some esoteric CSS massaging, for example. Let us
            face it: we are not yet there. The tools for authoring are
            still very poor.

            But are they still very poor? I mean, I think there are
            more tools for
            rendering HTML than there are for rendering Latex. In fact
            there are
            probably more tools for rendering HTML than anything else
            out there,
            because HTML is used more than anything else. Because HTML
            powers the

Web!


            You can write in Word, and export in HTML. You can write
            in Markdown
            and export in HTML. You can probably write in Latex and
            export in HTML

as well :)


            The tools are not the problem. The problem to me is the
            printing
            afterwords. Conferences/workshops need to print the
            publications.
            Printing consistent Latex/PDF templates is a lot easier
            than printing
            inconsistent (layout wise) HTML pages.


        There are tools, for example, theres already a bit of work to
        provide a plugin for semantic markup in Microsoft Word
        (https://ucsdbiolit.codeplex.com/) and similar efforts on the
        Latex side (https://trac.kwarc.info/sTeX/)

        But, this is not a question of technology available to
        authors, but of requirements defined by publishers. If authors
        are too busy for this effort, then publishers facilitate that
        added value when it is in their best interest.

        For example, PLoS has a published format guidelines using Work
        and Latex (http://www.plosone.org/static/guidelines), a
        workflow for semantically structuring their resulting output
        and their final output is well structured and available in XML
        based on a known standard
        (http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF
        and the published HTML on their website
        
(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

        This results In semantically meaningful XML that is
        transformed to HTML

        
http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233&representation=XML

        Clearly the publication process can support solutions and when
        its in the best interest of the publisher. They will adopt and
        drive their own markup processes to meet external demand.

        Providing tools that both the publisher and the author may use
        independently could simplify such an effort, but is not a main
        driver in achieving that final result you see in PLoS. This is
        especially the case given that both file formats and efforts
        to produce the "ideal solution" are inherently localized,
        competitive and diverse, not collaborative in nature. For
        PLoS, the solution that is currently successful is the one
        that worked to solve todays immediate local need with todays
        tools, not the one that was perfectly designed to meet all
        tomorrows hypothetical requirements.

        Cheers,
        Mark Diggory

        p.s. Finally, on the reference of moving repositories such as
        EPrints and DSpace towards supporting semantic markup of their
        contents. Being somewhat of a participant in LoD on the DSpace
        side, I note that these efforts are inherently just
        "Repository Centric", describing the the structure of the
        repository (IE collections of files), not the semantic
        structure contained within those files (ideas, citations,
        formulas, data tables, figures). In both cases, these
        capabilities are in their infancy and without any strict
        format and content driven publication workflow, and lacking
        any rendering other than to offer the file for download, they
        ultimately suffer from the same need for a common Semantic
        Document format that can be leveraged for rendering,
        referencing and indexing.

-- @mire Inc.

                *Mark Diggory*
        /2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010/
        /Esperantolaan 4, Heverlee 3001, Belgium/
        http://www.atmire.com <http://www.atmire.com/>

-- @mire Inc.

        *Mark Diggory*
    /2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010/
    /Esperantolaan 4, Heverlee 3001, Belgium/
    http://www.atmire.com <http://www.atmire.com/>




--
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF

(607) 539 6254 paul.houle on Skype [email protected] <mailto:[email protected]>

http://legalentityidentifier.info/lei/lookup



--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

smime.p7s
Description: S/MIME Cryptographic Signature

Re: scientific publishing process (was Re: Cost and access)

Reply via email to