Scholarly paper in HTML+RDF through RASH

Silvio Peroni Fri, 22 May 2015 14:57:50 -0700

Dear all,

Considering the several posts about this topic, I would like to share with you 
my personal experience in using HTML(+RDF) as a format for 
preparing/submitting/processing papers in scientific events.


In the past months, I (together with several people in the my research group at 
the University of Bologna plus other interested researchers from other 
institutions) have released a format for writing academic articles called RASH, 
i.e., Research Articles in Simplified HTML. RASH is a markup language that 
restricts the use of HTML elements to only 25 elements for writing academic 
research articles. It is possible to includes also RDFa annotations within any 
element of the language and other RDF statements in Turtle and JSON-LD format 
by using the appropriate tag "script". The RASH documentation is available 
online at [1] and documents RASH version 0.3.5, defined as a RelaxNG grammar 
[2].

RASH is the core component of a larger framework that includes a set of 
specifications and writing/conversion/extraction tools for academic articles. 
All the sources (released with Open Source and Creative Commons Licences) are 
available on GitHub [3] and have been developed by a group of several people so 
far. An internal note [4] provides a complete overview of the RASH Framework - 
please find attached the structured abstract of such note at the end of this 
email, for your convenience. 

Currently, the RASH Framework includes the following tools:

- a script to enable RASH users to check their documents simultaneously both 
against the specific requirements in the RASH RelaxNG grammar and also against 
the full set of HTML checks that the W3C Nu HTML Checker (a.k.a., HTML5 
validator) does for all HTML documents (by checking all requirements given in 
the HTML specification);

- javascript scripts (based on Bootstrap and JQuery) and CSS stylesheets 
(partially based on Linked Research [5] CSSs) implementing the visualisation of 
RASH documents in the browser. Such scripts also include into RASH papers a 
footbar with statistics about the paper (i.e., number of words, figures, tables 
and formulas), a menu to change the actual layout of the page, the automatic 
reordering of footnotes and references, the visualisation of the metadata of 
the paper, etc.;

- XSLT 2.0 files for converting RASH documents into LaTeX according to the ACM 
ICPS [6] and Springer LNCS [7] styles (other styles to come soon);

- an XSLT 2.0 file to perform conversions from OpenOffice documents into RASH 
documents;

- a Java application called SPAR Xtractor suite that takes a RASH document as 
input and returns a new RASH document where all its markup elements have been 
annotated with their actual (structural) semantics according to the Document 
Components Ontology (DoCO) [8].

In order to experiment with the use of RASH in official venues, it has been 
already proposed among the possible submission formats in three academic 
events, i.e., the Semantic Publishing Challenge 2015 [9] (that will be held 
during ESWC 2015), and the workshops SAVE-SD 2015 [10] (held during WWWW 2015) 
and Linking in the Cloud 2015 [11] (that will be held during Hypertext 2015).

In particular, six papers were actually submitted in RASH in the SAVE-SD 2015 
Workshop [10] (which I have co-organised) - the sources of such papers are 
available in the workshop program webpage [12]. All the RASH papers also 
include RDF statements (for a total of about 1300 RDF triples) concerning 
article metadata, basic article structures (mainly based on DoCO [9]), citation 
functions (based on CiTO [13]), and even semantic descriptions of figures as in 
the case of the SAVE-SD 2015 Best RASH Paper [14].

It is worth mentioning that the conversion of the RASH submissions into the ACM 
format requested by Sheridan publisher (responsible for the publications of all 
WWW proceedings including the workshop proceedings) was handled by us, the 
workshop organisers, through a semi-automatic process. In particular, we used 
the aforementioned XSLT files to convert RASH papers into LaTeX files compliant 
with the official ACM format requested [6], and then we fixed only a few of 
layout misalignments.

I hope that the RASH Framework (together with others, e.g., Linked Research [5] 
and Scholarly Markdown [15]) and the related initiatives and adoption in 
academic events can be considered a first concrete step towards the possible 
adoption of HTML(+RDF) for scientific publications in academic venues.

I'm looking forward to having your comments about RASH and its framework and, 
in case you are already an earlier adopter of it, please feel free to 
participate in a 10 minutes survey about the use of RASH for writing academic 
papers, available at http://esurv.org/?u=rash-format.

Please don't hesitate to contact me (email: essepunt...@gmail.com) for 
comments, suggestions, and further questions.

Have a nice day :-)

S.



# References
1. http://cs.unibo.it/save-sd/rash/documentation/index.html
2. http://cs.unibo.it/save-sd/rash/grammar/rash.rng
3. http://github.com/essepuntato/rash
4. http://www.essepuntato.it/2015/sepublica/rash-sepublica2015.html
5. https://github.com/csarven/linked-research
6. http://www.acm.org/sigs/publications/proceedings-templates
7. http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
8. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F. (in 
press). The Document Components Ontology (DoCO). To appear in Semantic Web – 
Interoperability, Usability, Applicability. OA available at 
http://www.semantic-web-journal.net/content/document-components-ontology-doco-0
9. https://github.com/ceurws/lod/wiki/SemPub2015
10. http://cs.unibo.it/save-sd/2015/index.html
11. http://lc2015.dibris.unige.it/
12. http://cs.unibo.it/save-sd/2015/program.html 
13. Peroni, S., Shotton, D. (2012). FaBiO and CiTO: ontologies for describing 
bibliographic resources and citations. In Journal of Web Semantics: Science, 
Services and Agents on the World Wide Web, 17 (December 2012): 33-43. 
Amsterdam, The Netherlands: Elsevier. 
http://dx.doi.org/10.1016/j.websem.2012.08.001 
14. Kuhn, T. (2015). Science Bots: A Model for the Future of Scientific 
Computation? http://cs.unibo.it/save-sd/2015/papers/html/kuhn-savesd2015.html
15. http://scholarlymarkdown.com


# Abstract of [4]
Purpose: this paper introduces the RASH Framework, i.e., a set of 
specifications and tools for writing academic articles in RASH (a simplified 
version of HTML).

Design: RASH has been developed in order to: be easy to learn and use; share 
scholarly documents (and embedded semantic annotations) through the Web; 
support its adoption within the publishing workflow.

Findings: RASH has been used for papers submitted to the SAVE-SD 2015 workshop. 
The authors of papers were able to self-learn it by simply referring to its 
documentation page without facing particular issues. The conversion of the RASH 
submissions into the format requested by the publisher was handled by the 
workshop organisers quickly through a semi-automatic process.

Research limitations: additional tools are needed, e.g., for extracting 
additional RDF statements from RASH documents and to enable additional 
conversion from/to existing formats.

Practical implications: the RASH Framework is another step towards enabling the 
definition of formal representations of the meaning of the content of an 
article, facilitate its automatic discovery, enable its linking to semantically 
related articles, provide access to data within the article in actionable form, 
and allow integration of data between papers.

Social implications: RASH addresses the intrinsic needs related to the various 
users of a scholarly article: researchers (focussing on its content), readers 
(experiencing new ways for browsing it), citizen scientists (reusing available 
data formally defined within it through semantic annotations), publishers 
(using the advantages of new technologies as envisioned by the Semantic 
Publishing movement).

Value: RASH focuses strictly on writing the content of the paper (i.e., 
organisation of text + semantic annotations) and leaves all the issues about 
validation, visualisation, conversion, and semantic data extraction to the 
various tools developed within the framework.


----------------------------------------------------------------------------
Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.per...@unibo.it
Web: http://www.essepuntato.it
Blog: http://palindrom.es/phd
Twitter: essepuntato

Scholarly paper in HTML+RDF through RASH

Reply via email to