Re: scientific publishing process (was Re: Cost and access)

Sarven Capadisli Sat, 04 Oct 2014 02:24:08 -0700

On 2014-10-04 04:14, Daniel Schwabe wrote:

As is often the case on the Internet, this discussion gives me a terrible sense 
of dejá vu. We've had this discussion many times before.
Some years back the IW3C2 (the steering committee for the WWW conference 
series, of which I am part) first tried to require HTML for the WWW conference 
paper submissions, then was forced to make it optional because authors simply 
refused to write in HTML, and eventually dropped it because NO ONE (ok, very 
very few hardy souls) actually sent in HTML submissions.
Our conclusion at the time was that the tools simply were not there, and it was 
too much of a PITA for people to produce HTML instead of using the text editors 
they are used to. Things don't seem to have changed much since.

Hi Daniel, here is my long reply as usual and I hope you'll give it a shot :)

I've offered *a* solution that is compatible with the existing workflow without asking for any extra work from the OC/PCs, with the exception that the Web-native technologies for the submissions are officially encouraged. They will get their PDF in the end to cater the existing pipeline. In the meantime, the community retains higher quality research documents.

And this is simply looking at formatting the pages, never mind the whole issue of 
actually producing hypertext (ie., turning the article's text into linked hypertext), 
beyond the easily automated ones (e.g., links to authors, references to papers, etc..). 
Producing good hypertext, and consuming it, is much harder than writing plain text. And 
most authors are not trained in producing this kind of content. Making this actually 
"semantic" in some sense is still, in my view, a research topic, not a routine 
reality.
Until we have robust tools that make it as easy for authors to write papers 
with the advantages afforded by PDF, without its shortcomings, I do not see 
this changing.

I disagree that we don't have sufficient or robust tools to author and publish "web pages". I find it ironic that we are still debating on this issue as if we are in the early-mid 90s. Or ignoring [2], or the possibility to use a service which offers [3] to publish a (pardon me for saying) but a friggin' web page.

If it is about "coding", I find it unreasonable or unprofessional to think that a Computer/Web Scientist in 2014 that's publicly funded to do their academic endeavors is incapable of groking HTML. But, somehow LaTeX is presumed to be okay for the new post-graduate that's coming in. Really? Or is the real reason that no one is asking them to do otherwise?

They can randomly pick a WYSIWYG editor tool or an existing publishing service. No one is forcing anyone to hand-code anything. Just as no one is forced to hand code LaTeX.

We have the tools and even services to help us do all of that. Both from and outside of SW. We had them for a long time. What was lacking was a continuous green light to use them. That light stopped flashing as you've mentioned.


But again, our core problems are not technical in nature.

I would love to see experiments (e.g., certain workshops) to try it out before 
making this a requirement for whole conferences.

I disagree. The fact that workshops or tracks on "linked science" or "semantic publishing" didn't deliver is a clear sign that they have the wrong process at the root. When those workshops ask for submissions to be in PDF, that's the definition of irony. There is no "useful" machine-friendly research objects! Opportunity lost at every single CfP.

Yet, we eloquently describe hypothetical systems or tools that will "one day" do all the magic for us instead of taking a good look at what's right in front of us.

So, lets talk about putting the cart before the horse. A lot of time and energy (e.g., public funding) that could have been better used simply by actually *having the data*. And, then figuring out how to utilize that. There is no data, so what's there to analyze or learn from? Some research trying to figure out what to do with trivial and limited metadata e.g., title, abstract, authors, subjects? Is data.semanticweb.org ("dog food") the best we can show for our "dogfooding" ability?

I can't search/query for research knowledge on topic T, that used variables X, Y, which implemented a workflow step S, that's cited by or used those exact parameters, that happens to use the datasets that I'm planning to use in my research.


Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?

Bernadette's suggestions are a good step in this direction, although I suspect 
it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).

Nothing is stopping us from doing things in parallel and we are in fact. Close-by efforts from workshops to force11, public-dwbp-wg, public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, besides the whole SW/LD stack, which benefits scientific research communication and advancement.

The fundamental question is, if we have all of that going on, why are we not taking the *minimal* step to put it to use where it matters most? If the answer depends on making it comfortable and rewarding for the very few, then I disagree on our priorities.

So, it is *especially difficult* when conferences or journals about (Semantic) "Web" using WWW, "Semantic Web", "Linked Something" in their title do not encourage their own technologies towards communicating research output.


Net result: we continued on making it difficult to mine our own information.

When conference and supervisors do not encourage the SW researcher to eat their own dogfood but use something archaic for knowledge sharing on the Web, well, is it any surprise that we are not going faster than we could, and that I can't do my silly queries to find papers or use previously declared/discussed variables in my research?

We can speed up Web Science, attract or create funding opportunities simply by having a better understanding of our own data. What good is it exactly that the print output can be in high quality and that it has an arbitrary length or fixed view? Oh right, that's where the publisher comes in.


No sensible data, no fun.

Again, I think something along the lines of:

http://csarven.ca/call-for-linked-research
https://github.com/csarven/linked-research

is "good enough" to proceed for those that wish to cover both cases (retaining semantics and complying with conference/publisher requirements). We don't have to wait it out and see how the next best thing comes along (e.g., like I said, the workshops on SW/LD scientific publishing are not even doing it) We can figure that out as we go.


If you have read this far, thank you! :)

[1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0325.html
[2] http://en.wikipedia.org/wiki/Comparison_of_HTML_editors
[3] http://en.wikipedia.org/wiki/List_of_content_management_systems

-Sarven
http://csarven.ca/#i

smime.p7s
Description: S/MIME Cryptographic Signature

Re: scientific publishing process (was Re: Cost and access)

Reply via email to