Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-07 15:44, Peter F. Patel-Schneider wrote:

Well, I remain totally unconvinced that any current HTML solution is as
good as the current PDF setup.  Certainly htlatex is not suitable.
There may be some way to get tex4ht to do better, but no one has
provided a solution. Sarven Capadisli sent me some HTML that looks much
better, but even on a math-light paper I could see a number of
glitches.  I haven't seen anything better than that.


Would you mind creating an issue for the glitches that you are experiencing?

https://github.com/csarven/linked-research/issues

Please mention your environment and the documents you've looked at. Also 
keep in mind the LNCS and ACM SIG authoring guidelines. The purpose of 
the LNCS and ACM CSS is to adhere to the authoring guidelines so that 
the the generated PDF file or print output looks as expected (within 
reason).


Much appreciated!

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Peter F. Patel-Schneider

Done.

The goal of a new paper-preparation and display system should, however, be to 
be better than what is currently available.  Most HTML-based solutions do not 
exploit the benefits of HTML, strangely enough.


Consider, for example, citation links.  They generally jump you to the 
references section.  They should instead pop up the reference, as is done in 
Wikipedia.


Similarly for links to figures.  Instead of blindly jumping to the figure, 
they should do something better, perhaps popping up the figure or, if the 
figure is already visible, just highlighting it.


I have put in both of these as issues.

peter

On 10/08/2014 03:18 AM, Sarven Capadisli wrote:

On 2014-10-07 15:44, Peter F. Patel-Schneider wrote:

Well, I remain totally unconvinced that any current HTML solution is as
good as the current PDF setup.  Certainly htlatex is not suitable.
There may be some way to get tex4ht to do better, but no one has
provided a solution. Sarven Capadisli sent me some HTML that looks much
better, but even on a math-light paper I could see a number of
glitches.  I haven't seen anything better than that.


Would you mind creating an issue for the glitches that you are experiencing?

https://github.com/csarven/linked-research/issues

Please mention your environment and the documents you've looked at. Also keep
in mind the LNCS and ACM SIG authoring guidelines. The purpose of the LNCS and
ACM CSS is to adhere to the authoring guidelines so that the the generated PDF
file or print output looks as expected (within reason).

Much appreciated!

-Sarven
http://csarven.ca/#i






Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 14:10, Peter F. Patel-Schneider wrote:

Done.

The goal of a new paper-preparation and display system should, however,
be to be better than what is currently available.  Most HTML-based
solutions do not exploit the benefits of HTML, strangely enough.

Consider, for example, citation links.  They generally jump you to the
references section.  They should instead pop up the reference, as is
done in Wikipedia.

Similarly for links to figures.  Instead of blindly jumping to the
figure, they should do something better, perhaps popping up the figure
or, if the figure is already visible, just highlighting it.

I have put in both of these as issues.


Thanks a lot for the issues! Really great to have this feedback.

I have resolved and commented on some of those already, and will look at 
the rest very shortly.


I am all for improving the interaction as well. I'd like to state again 
that the development was so far focused on adhering to the LNCS/ACM 
guidelines, and improving the final PDF/print product. That is to get on 
reasonable grounds with the state of the art.


Moving on: I plan to bring in the interaction and framework to easily 
semantically enrich the document as well as the overall UX. I have some 
preliminary code in my dev branch, and will bring it forward, and would 
like feedback as well.


Thanks again and please continue to bring forward any issues or feature 
requests. Contributors are most welcome!


-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 PLOS is an interesting case.  The HTML for PLOS articles is relatively
 readable.  However, the HTML that the PLOS setup produces is failing at math,
 even for articles from August 2014.

 As well, sometimes when I zoom in or out (so that I can see the math better)
 Firefox stops displaying the paper, and I have to reload the whole page.

Interesting bug that. Worth reporting to PLoS.

 Strangely, PLOS accepts low-resolution figures, which in one paper I looked at
 are quite difficult to read.

Yep. Although, it often provides several links to download higher
res images, including in the original file format. Quite handy.

 However, maybe the PLOS method can be improved to the point where the HTML is
 competitive with PDF.

Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is
because scientists are used to viewing in print format, I suspect, but
partly not.

I'm hoping that, eventually, PLoS will stop using image based maths. I'd
like to be able to zoom maths independently, and copy and paste it in
either mathml or tex. Mathjax does this now already.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Peter F. Patel-Schneider



On 10/08/2014 05:31 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


PLOS is an interesting case.  The HTML for PLOS articles is relatively
readable.  However, the HTML that the PLOS setup produces is failing at math,
even for articles from August 2014.

As well, sometimes when I zoom in or out (so that I can see the math better)
Firefox stops displaying the paper, and I have to reload the whole page.


Interesting bug that. Worth reporting to PLoS.


PLoS doesn't appear to have a bug reporting system in place.  Even their 
general assistance email is obsfucated.  I sent them a message anyway.



Strangely, PLOS accepts low-resolution figures, which in one paper I looked at
are quite difficult to read.


Yep. Although, it often provides several links to download higher
res images, including in the original file format. Quite handy.


In this case, even the original was low resolution.


However, maybe the PLOS method can be improved to the point where the HTML is
competitive with PDF.


Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is
because scientists are used to viewing in print format, I suspect, but
partly not.

I'm hoping that, eventually, PLoS will stop using image based maths. I'd
like to be able to zoom maths independently, and copy and paste it in
either mathml or tex. Mathjax does this now already.


I would suggest that this should have been one of their highest priorities.


Phil



peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 The goal of a new paper-preparation and display system should, however, be to
 be better than what is currently available.  Most HTML-based solutions do not
 exploit the benefits of HTML, strangely enough.

 Consider, for example, citation links.  They generally jump you to the
 references section.  They should instead pop up the reference, as is done in
 Wikipedia.

Yes, I agree. I do this on my blog or rather provide it as an option.
The reference list is also automatically generated here, so, for
example, there is no metadata associated with the two references in
this post:

http://www.russet.org.uk/blog/3015

In both cases, the reference list is formed from the metadata on the
other end of the link, gathered either from the HTML, or in the case of
arXiv from their XML-RPC interface.


 Similarly for links to figures.  Instead of blindly jumping to the figure,
 they should do something better, perhaps popping up the figure or, if the
 figure is already visible, just highlighting it.

Or better still, providing access to the code and data from which the
figure is derived.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Luca Matteis
Dear Sarven,

I really appreciate the work that you're doing with trying to style an
HTML page to look similar to the Latex templates. But there's so many
typesetting details that are not available in browsers, which means
you're going to do a lot of DOM hacking to be able to produce the same
quality typography that Latex is capable of. Latex will justify text,
automatically hyphenate, provide proper spacing, and other typesetting
features. Not to mention kerning. Kerning is a *huge* thing in
typography and with HTML you're stuck with creating a DOM element for
every single letter - yup you heard me right.

I think it would be super cool to create some sort of JavaScript
framework that would enable the same level of typography that Latex is
capable of, but you'll eventually hit some hard limitations and you'll
probably be stuck drawing on a canvas.

What are your ideas regarding these problems?

On Wed, Oct 8, 2014 at 2:26 PM, Sarven Capadisli i...@csarven.ca wrote:
 On 2014-10-08 14:10, Peter F. Patel-Schneider wrote:

 Done.

 The goal of a new paper-preparation and display system should, however,
 be to be better than what is currently available.  Most HTML-based
 solutions do not exploit the benefits of HTML, strangely enough.

 Consider, for example, citation links.  They generally jump you to the
 references section.  They should instead pop up the reference, as is
 done in Wikipedia.

 Similarly for links to figures.  Instead of blindly jumping to the
 figure, they should do something better, perhaps popping up the figure
 or, if the figure is already visible, just highlighting it.

 I have put in both of these as issues.


 Thanks a lot for the issues! Really great to have this feedback.

 I have resolved and commented on some of those already, and will look at the
 rest very shortly.

 I am all for improving the interaction as well. I'd like to state again that
 the development was so far focused on adhering to the LNCS/ACM guidelines,
 and improving the final PDF/print product. That is to get on reasonable
 grounds with the state of the art.

 Moving on: I plan to bring in the interaction and framework to easily
 semantically enrich the document as well as the overall UX. I have some
 preliminary code in my dev branch, and will bring it forward, and would like
 feedback as well.

 Thanks again and please continue to bring forward any issues or feature
 requests. Contributors are most welcome!

 -Sarven
 http://csarven.ca/#i





Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Phillip Lord


I'm always at a bit of a loss when I read this sort of thing. Kerning,
seriously? We can't share scientific content in HTML because of kerning?

In practice, web browsers do a perfectly reasonable job of text layout,
in real time, and do it in a way that allows easy reflowing. The 
thing about Sarven's LNCS style sheets, for instance, is that I like the
most is that I can turn them off; I don't like the LNCS format.

Having said all of that, 5 minutes of googling suggests that, kerning
support is in Canditate Recommendation form from W3C, and at least three
different JS libraries that support it.

Phil

Luca Matteis lmatt...@gmail.com writes:
 I really appreciate the work that you're doing with trying to style an
 HTML page to look similar to the Latex templates. But there's so many
 typesetting details that are not available in browsers, which means
 you're going to do a lot of DOM hacking to be able to produce the same
 quality typography that Latex is capable of. Latex will justify text,
 automatically hyphenate, provide proper spacing, and other typesetting
 features. Not to mention kerning. Kerning is a *huge* thing in
 typography and with HTML you're stuck with creating a DOM element for
 every single letter - yup you heard me right.

 I think it would be super cool to create some sort of JavaScript
 framework that would enable the same level of typography that Latex is
 capable of, but you'll eventually hit some hard limitations and you'll
 probably be stuck drawing on a canvas.

 What are your ideas regarding these problems?

 On Wed, Oct 8, 2014 at 2:26 PM, Sarven Capadisli i...@csarven.ca wrote:
 On 2014-10-08 14:10, Peter F. Patel-Schneider wrote:

 Done.

 The goal of a new paper-preparation and display system should, however,
 be to be better than what is currently available.  Most HTML-based
 solutions do not exploit the benefits of HTML, strangely enough.

 Consider, for example, citation links.  They generally jump you to the
 references section.  They should instead pop up the reference, as is
 done in Wikipedia.

 Similarly for links to figures.  Instead of blindly jumping to the
 figure, they should do something better, perhaps popping up the figure
 or, if the figure is already visible, just highlighting it.

 I have put in both of these as issues.


 Thanks a lot for the issues! Really great to have this feedback.

 I have resolved and commented on some of those already, and will look at the
 rest very shortly.

 I am all for improving the interaction as well. I'd like to state again that
 the development was so far focused on adhering to the LNCS/ACM guidelines,
 and improving the final PDF/print product. That is to get on reasonable
 grounds with the state of the art.

 Moving on: I plan to bring in the interaction and framework to easily
 semantically enrich the document as well as the overall UX. I have some
 preliminary code in my dev branch, and will bring it forward, and would like
 feedback as well.

 Thanks again and please continue to bring forward any issues or feature
 requests. Contributors are most welcome!

 -Sarven
 http://csarven.ca/#i






-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Bernadette Hyland
Hi Sarven,
Congratulations for kicking off a thread that has received over 150 replies 
across two W3 lists in a week.  That is impressive!  This isn't the first time 
(nor the last) that it has been discussed.  The active discussion reaffirms the 
need to drive a closer dialog between Web technologists  publishers for 
scientific publishing.

One gets the sense, there is serious depth of expertise on the publishing 
workflow on these lists. People have taken considerable time to reply  be 
constructive with ideas to advance the effort.  Thanks.

Can anyone advise on whether the publishers in 2014 are in fact on the 'front 
lines' of defining these standards that affect their core business, i.e., Web 
standards that are the foundation for layout  typography?  

Is this an opportunity for W3C members to take this up as a topic for 
discussion at the upcoming TPAC?  Perhaps this is already scheduled?  W3C 
staffers, any guidance on this?

I still content there is a great business opportunity for an entrepreneurial 
Web publishing-savvy team to build something really useful  immediately have 
1000+ researchers provide feedback  drive use.

Cheers,

Bernadette Hyland
CEO, 3 Round Stones, Inc.

http://3roundstones.com
http://about.me/bernadettehyland 

PS.  It's also clear, your PhD dissertation topic is of keen interest Sarven!!  
We'd like to read it when you're done (no pressure ;-)


On Oct 8, 2014, at 10:09 AM, Gray, Alasdair a.j.g.g...@hw.ac.uk wrote:

 
 On 8 Oct 2014, at 13:31, Phillip Lord phillip.l...@newcastle.ac.uk wrote:
 
 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 
 [snip]
 However, maybe the PLOS method can be improved to the point where the HTML 
 is
 competitive with PDF.
 
 Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is
 because scientists are used to viewing in print format, I suspect, but
 partly not.
 
 
 Or is that because they want to import it into their own reference management 
 system, e.g. Mendeley, which does not support the HTML version?
 
 Alasdair
 
 [snip]
 Phil
 
 
 Alasdair J G Gray
 Lecturer in Computer Science, Heriot-Watt University, UK.
 Email: a.j.g.g...@hw.ac.uk
 Web: http://www.alasdairjggray.co.uk
 ORCID: http://orcid.org/-0002-5711-4872
 Telephone: +44 131 451 3429
 Twitter: @gray_alasdair
 
 
 
 
 
 
 
 
 
 We invite research leaders and ambitious early career researchers to join us 
 in leading and driving research in key inter-disciplinary themes. Please see 
 www.hw.ac.uk/researchleaders for further information and how to apply. 
 
 Heriot-Watt University is a Scottish charity registered under charity number 
 SC000278. 



Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 15:14, Luca Matteis wrote:

Dear Sarven,

I really appreciate the work that you're doing with trying to style an
HTML page to look similar to the Latex templates. But there's so many
typesetting details that are not available in browsers, which means
you're going to do a lot of DOM hacking to be able to produce the same
quality typography that Latex is capable of. Latex will justify text,
automatically hyphenate, provide proper spacing, and other typesetting
features. Not to mention kerning. Kerning is a *huge* thing in
typography and with HTML you're stuck with creating a DOM element for
every single letter - yup you heard me right.

I think it would be super cool to create some sort of JavaScript
framework that would enable the same level of typography that Latex is
capable of, but you'll eventually hit some hard limitations and you'll
probably be stuck drawing on a canvas.

What are your ideas regarding these problems?


We do not have to have everything pixel perfect and comprehensive all up 
front. That is a common pitfall. Applying the Pareto principle is 
preferable.


LaTeX is great for what it is intended for! This was never in question. 
We are however looking at a bigger picture for Web Science communication 
and access. There will be far more concerns than the presentation layer 
alone.


As for your technical questions: we need to create issues or features, 
and more importantly, open discussions like in these threads, to better 
understand what the SW research community's needs are. So, please create 
an issue because what you raise is important to be looked into further. 
I do not have all the technical answers, even though I am very close to 
the world of typeface, typography, and book design :)


In any case, if it was possible in LaTeX, I hope it is not naive of me 
to say that it can be achieved (if not already) in HTML+CSS+JavaScript.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Kingsley Idehen

On 10/8/14 10:18 AM, Sarven Capadisli wrote:

On 2014-10-08 15:14, Luca Matteis wrote:

Dear Sarven,

I really appreciate the work that you're doing with trying to style an
HTML page to look similar to the Latex templates. But there's so many
typesetting details that are not available in browsers, which means
you're going to do a lot of DOM hacking to be able to produce the same
quality typography that Latex is capable of. Latex will justify text,
automatically hyphenate, provide proper spacing, and other typesetting
features. Not to mention kerning. Kerning is a *huge* thing in
typography and with HTML you're stuck with creating a DOM element for
every single letter - yup you heard me right.

I think it would be super cool to create some sort of JavaScript
framework that would enable the same level of typography that Latex is
capable of, but you'll eventually hit some hard limitations and you'll
probably be stuck drawing on a canvas.

What are your ideas regarding these problems?


We do not have to have everything pixel perfect and comprehensive all 
up front. That is a common pitfall. Applying the Pareto principle is 
preferable.


LaTeX is great for what it is intended for! This was never in 
question. We are however looking at a bigger picture for Web Science 
communication and access. There will be far more concerns than the 
presentation layer alone.


As for your technical questions: we need to create issues or features, 
and more importantly, open discussions like in these threads, to 
better understand what the SW research community's needs are. So, 
please create an issue because what you raise is important to be 
looked into further. I do not have all the technical answers, even 
though I am very close to the world of typeface, typography, and book 
design :)


In any case, if it was possible in LaTeX, I hope it is not naive of me 
to say that it can be achieved (if not already) in HTML+CSS+JavaScript.


-Sarven
http://csarven.ca/#i 

Sarven,

Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked 
Open Data URI that identifies Github issue tracker for Linked Data Research:


[1] 
http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 
-- Linked Open Data URI (basic entity description page)
[2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose 
over relations facets oriented entity description page
[3] 
http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data 
-- Vapor Report (re., Linked Open Data principles adherence) .


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature


Reference management (was: Re: scientific publishing process (was Re: Cost and access))

2014-10-08 Thread Simon Spero
On Oct 8, 2014 10:15 AM, Gray, Alasdair a.j.g.g...@hw.ac.uk wrote:

 Or is that because they want to import it into their own reference
management system, e.g. Mendeley, which does not support the HTML version?

1. It is quite easy to embedded metadata in HTML pages in forms designed
for accurate importing into reference managers (Hellman 2009). Mendeley has
been known to have problems with imports in cases where a proxy server is
involved.

COinS does have the slight problem of being kind of being based on top of
OpenURL, which is made of lose (Hellman 2010) , but is the current least
bad solution.

2. There is ongoing work to create a decent ontology for better embedding.
The BibEx work for schema.org is going in the right direction (Bibex 2014).

The Library of Congress BIBFRAME effort (LC 2014) is going in the right
direction iff the right direction is defined as straight off a cliff - see
eg Spero (2013)

2. There is a good comparison of Docear, Mendeley,and Zotero available in
Beel (2014), which is remarkably balanced given that he is the PI for
Docear. He includes a link to an earlier post mocking several completely
unbalanced comparison charts prepared by different vendors (he finishes by
making a similar chart showing Docear is the only possible choice. Table
snark FTW.)

My personal favorite tool is Bibdesk (2014), which is Mac and bibtex
specific, but justifies this by using many Mac specific capabilities. There
is some support for integration into word  (Don't mention the Word. I
mentioned it once but I think I got away with it.)

3. All of these tools could benefit from even simple subsumption reasoning
(although vocabularies like the LCSH have errors that lead to amusing and
frustrating results - everything about doorbells is also about mammals,
eschatology, the soul, and psychotherapy (Spero 2008).

It is important to recognize the difference between a knowledge
organization system, for describing intentional concepts, and knowledge
representation systems, for describing a view of reality. Leonard Cohen via
Elaine Svenonius authorizes laughing at people who confuse the two.

http://ibiblio.org/ses/anyqs.jpg

3. Extended rants on misunderstandings of plausible Ontologies and
ontologies of the Bibliographic Universe omitted (cough SKOS cough).

Simon

References

Beel, Jorean (2014) . Comprehensive Comparison of Reference Managers :
Mendeley vs. Zotero vs. Docear. Available at
http://www.docear.org/2014/01/15/comprehensive-comparison-of-reference-managers-mendeley-vs-zotero-vs-docear/

Bibdesk (2014). Bibdesk wiki: Main Page. Available at
http://sourceforge.net/p/bibdesk/wiki/Main_Page/

BibEx (2014). Schema Bib Extend Community Group Wiki: Main Page.  Available
at http://www.w3.org/community/schemabibex/wiki/index.php?title=Main_Page

Hellman, Eric (2009). OpenURL COinS : A convention to embed bibliographic
metadata in HTML. Available at http://ocoins.info

Hellman, Eric (2010). It's cool to hate on OpenURL (was Re: Twitter
Annotations). Available at
https://listserv.nd.edu/cgi-bin/wa?A2=CODE4LIB;axd%2FoQ;201004291208400400

https://www.mail-archive.com/code4lib@listserv.nd.edu/msg07857.html

LC (2014). BIBFRAME : Bibliographic Framework Initiative. Available at
http://www.loc.gov/bibframe/

Spero, Simon (2008). LCSH is to Thesaurus as Doorbell is to Mammal:
visualizing structural problems in the Library of Congress subject
headings. In Proceedings of the 2008 International Conference on Dublin
Core and Metadata Applications. DCMI. Available at:
http://iBiblio.org/ses/poster.pdf
Spero, Simon (2013). Prolegomena to any future metadata. Available at
http://www.ibiblio.org/fred2.0/wordpress/?p=269


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 18:38, Kingsley Idehen wrote:

Sarven,

Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked
Open Data URI that identifies Github issue tracker for Linked Data Research:

[1]
http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4
-- Linked Open Data URI (basic entity description page)
[2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose
over relations facets oriented entity description page
[3]
http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data
-- Vapor Report (re., Linked Open Data principles adherence) .


It's pretty cool that you can grab stuff out of GitHub issues, even 
comments!


Papers link to code and then to commits and issues. See also [1].

Even comments e.g., [2]. Or even in the direction of paper comments 
which can be integrated and picked right up from the page e.g., [3]. 
Just need to add +/-1 buttons and triplify the review ;) With WebID+ACL, 
we have the rest.


Do I have write access (via WebID?)to something like [4] ? e.g., 
deleting an older label or triple :)


[1] http://git2prov.org/
[2] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/call-for-linked-research
[3] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/sense-of-lsd-analysis%01comment_20140808164434
[4] 
http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4


-Sarven



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Kingsley Idehen

On 10/8/14 3:13 PM, Sarven Capadisli wrote:

On 2014-10-08 18:38, Kingsley Idehen wrote:

Sarven,

Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked
Open Data URI that identifies Github issue tracker for Linked Data 
Research:


[1]
http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 


-- Linked Open Data URI (basic entity description page)
[2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose
over relations facets oriented entity description page
[3]
http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data 


-- Vapor Report (re., Linked Open Data principles adherence) .


It's pretty cool that you can grab stuff out of GitHub issues, even 
comments!


Papers link to code and then to commits and issues. See also [1].

Even comments e.g., [2]. Or even in the direction of paper comments 
which can be integrated and picked right up from the page e.g., [3]. 
Just need to add ±1 buttons and triplify the review ;) With WebID+ACL, 
we have the rest.


Do I have write access (via WebID?)to something like [4] ? e.g., 
deleting an older label or triple :)


[1] http://git2prov.org/
[2] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/call-for-linked-research 

[3] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/sense-of-lsd-analysis%01comment_20140808164434 

[4] 
http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 



-Sarven 


Yes, there are WebID+TLS and/or NetID+TLS based ACLs [1][2][3] in place. 
In addition, you can always make a full TURTLE doc in some data space, 
or embed your TURTLE in any text slot (e.g., comments or description 
fields) provided by a Web app/service using Nanotation, and you are set 
re. payload for upload into URIBurner. Basically, you have the following 
RWW options:


1. Append RDF statements to the existing the RDF document (named graph) 
identified by IRI http://csarven.ca/sense-of-lsd-analysis -- all you 
do is refresh  the URIBurner URI as data changes in github 
(?sponger:get=add at the end of a URIBurner URI has this effect)


2. Overwrite statements in the existing RDF document (named graph) -- 
simply add ?@Lookup@=refresh=clean to the end of the URIBurner URI, for 
this effect.


Of course there's lots more, but I'll let this flow one step at a time :-)


Links:

[1] 
http://bit.ly/enterprise-identity-management-and-attribute-based-access-controls
[2] 
http://www.slideshare.net/kidehen/how-virtuoso-enables-attributed-based-access-controls/34 
-- WebID-TLS (authenticates WebIDs)
[3] 
hhttp://www.slideshare.net/kidehen/how-virtuoso-enables-attributed-based-access-controls/40 
-- NetID-TLS (authenticates LinkedIn, Facebook, Twitter, G+, Amazon, 
Dropbox, and many other identities)
[4] http://bit.ly/blog-post-about-nanotation -- Nanotation (this SHOULD 
work wherever you're able to input plain text).


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Eric Prud'hommeaux
* Luca Matteis lmatt...@gmail.com [2014-10-07 00:41+0200]
 Sorry to jump into this once again but when it comes to typesetting
 nothing really comes close to Latex/PDF:
 http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
 not even HTML/CSS/JavaScript

Making a floating model look like Latex/PDF at all resolutions seems
impossible. Perhaps targeting a fixed (A4 or 8½×11 @300dpi) resolution
is quite doable. Doing so allows one to use fixed position for all CSS
directives.

But Eric, that sucks!!

Well, sort of, because we can't conveniently read it on a phone and it
doesn't fill large displays, but that may be a small price to pay to
be able to use all of the rich markup that we wax poetic about on this
list. If it does work, then we can figure out ways to script it so it
has a simply-controlled, predictable behavior at a certain resolution
but is reasonable at arbitrary resolutions.


 On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray nor...@astro.gla.ac.uk wrote:
 
  Greetings.
 
  On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com 
  wrote:
 
  querying PDFs is NOT simple and requires a lot of work -and usually
  produces lots of errors. just querying metadata is not enough. As I said
  before, I understand the PDF as something that gives me a uniform layout.
  that is ok and necessary, but not enough or sufficient within the context
  of the web of data and scientific publications. I would like to have the
  content readily available for mining purposes. if I pay for the publication
  I should get access to the publication in every format it is available. the
  content should be presented in a way so that it makes sense within the web
  of data.  if it is the full content of the paper represented in RDF or XML
  fine. also, I would like to have well annotated content, this is simple and
  something that could quite easily be part of existing publication
  workflows. it may also be part of the guidelines for authors -for instance,
  identify and annotate rhetorical structures.
 
 
  The following might add something to this conversation.
 
  It illustrates getting the metadata from a LaTeX file, putting it into an 
  XMP packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's 
  mention of /Author, /Title, etc, this just focuses on the XMP packet.
 
  This has the document metadata, the abstract, and an illustrative bit of 
  argumentation.  Adding details about the document structure, and (RDF) 
  pointers to any figures would be feasible, as would, I suspect, 
  incorporating CSV files directly into the PDF.  Incorporating 
  \begin{tabular} tables would be rather tricky, but not impossible.  I can't 
  help feeling that the XHTML+RDFa equivalent would be longer and need more 
  documentation to instruct the author where to put the RDFa magic.
 
  It's not very fancy, and still has rough edges, but it only took me 100 
  minutes, from a standing start.
 
  Generating and querying this PDF seems pretty simple to me.
 
  
 
  $ cat test-xmp.tex
  \documentclass{article}
 
  \usepackage{xmp-management}
 
  \title{This is a test file}
  \author{Norman Gray}
  \date{2014 October 6}
 
  \begin{document}
 
  \maketitle
 
  \abstract{It's easy to include metadata in \LaTeX\ files.
 
  That's because there's plenty of metadata in there already.}
 
  There is text and metatext within files.
 
  \section{Further details}
 
  In this section we could potentially discuss moving information
  around.  I think we can assert that \claim{it is easy to move
information around}, and, further, that \claim{making metadata
readily available is a Good Thing}.  I hope that clears that up.
  \end{document}
  $ cat xmp-management.sty
  \ProvidesPackage{xmp-management}[2014/10/06]
 
  \newwrite\xmp@ttlfile
  \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl
\let\xmp@open\relax}
  \long\def\xmp@stmt#1#2{%
\xmp@open
\write\xmp@ttlfile{ #1 #2.}}
  \let\xmp@origtitle\title
  \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}}
  \let\xmp@origauthor\author
  \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}}
  \let\xmp@origdate\date
  \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}}
 
  \long\def\abstract#1{
\xmp@stmt{dc:abstract}{#1}
\begin{quotation}\textbf{Abstract:} #1\end{quotation}}
  \def\claim#1{
\xmp@stmt{xmpinfo:claim}{#1}
\emph{#1}}
 
  \let\xmp@origsection\section
  \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1}
\xmp@origsection{#1}}
 
  \usepackage{xmpincl}
  \AtBeginDocument{\includexmp{info}}
  $ pdflatex test-xmp
  This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
   restricted \write18 enabled.
  entering extended mode
  (./test-xmp.tex
  LaTeX2e 2011/06/27
  [...BLAH...]
  Output written on test-xmp.pdf (1 page, 75667 bytes).
  Transcript written on test-xmp.log.
  $ cat test-xmp.ttl
   dc:title This is a test file.
   dc:creator Norman Gray.
   dc:created 2014 October 6.
   

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread José Manuel Gómez Pérez

+1

This is precisely one of the main ideas we pursued in Wf4Ever. The paper 
in whatever format is not enough, you also need to preserve the methods 
and their implementation, including the workflows and the datasets, not 
only for validation and reproducibility purpose in the face of 
publication but ultimately for incremental reuse and scientific development.


Publications indeed shouldn't be seen as a static piece of paper but 
rather as a (linked) piece of knowledge which can be revised and evolve 
in time. So, tooling is required that supports the management of the 
lifecycle of such knowledge, from creation of specific research objects 
to reuse, including ways to deal with decay and exploration and 
inspection capabilities.


In this direction, we took incremental steps through actual deployments 
of project outcomes in the previously mentioned platforms. Furthermore, 
we also integrated almost the whole set of functionalities into the 
ROHub.org platform, which was demonstrated in the last Semantic 
Publishing Challenge in ESWC [1,2] as a step forward in the direction 
you mention.


To me it would make absolute sense to see further community pull of this 
kind of tooling, starting with their utilization in the conferences and 
journals of our own field (ESWC, ISWC, etc.) in order to incubate, gain 
traction, and draw conclusions that we could generalize to other domains.


If this sounds appealing to the folks in this list, please let me know.

Cheers,
Jose

[1] 
http://2014.eswc-conferences.org/sites/default/files/eswc2014-challenges_spc_submission_3.pdf

[2] http://2014.eswc-conferences.org/program/semwebeval

On 04/10/2014 13:14, Hugh Glaser wrote:

(c) Workflows and Datasets
I have mentionedhttp://www.myexperiment.org  before, but can’t remember if I 
have mentionedhttp://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for examplehttps://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.


--

Dr. Jose Manuel Gomez-Perez
Director RD
jmgo...@isoco.com
#T +34913349797
#M +34609077103
Avda. del Partenón 10, Planta 1, Oficina 1.3A
Campo de las Naciones
28042 Madrid, Spain

iSOCO
 enabling the networked economy
 www.isoco.com

P Please consider your environmental responsibility before printing this
e-mail


---
Este mensaje no contiene virus ni malware porque la protección de avast! 
Antivirus está activa.
http://www.avast.com




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Norman Gray

Kingsley and all, hello.

On 2014 Oct 7, at 02:18, Kingsley Idehen kide...@openlinksw.com wrote:

 On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:
 
 
 On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
 On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
 It's not hard to query PDFs with SPARQL.  All you have to do is extract the
 
 Huh?  Every single PDF reader that I use can extract the PDF metadata and 
 display it.
 
 Again, this isn't about metadata.

With all respect to the larger goal of having fully semanticked-up documents, I 
think the question _is_ all about metadata.  The original spark to the thread 
was a lament that SW and LD conferences don't mandate something XMLish for 
submissions because X(HT)ML is clearly better for... well ... dammit, it's 
Better.

_One_ thing it would be better for is supporting the sort of full-scale 
RDF-everything view that you've described so eloquently.  But if that's your 
goal, then lexing the source text is really going to be the least of your 
problems.

A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.  That practically means 
metadata, plus perhaps some document structure, plus, if you're keen and can 
get the authors to invest their effort, some argumentation.  That's available 
for free (and right now) from LaTeX authors, and available from XHTML authors 
depending on how hard it would be to get them to put @profile attribute in the 
right places.

So no, not just about 'metadata' in the narrow sense, but I think this thread 
is about what RDF you can in practice extract from the materials that authors 
can in practice be induced or obliged to submit to conference proceedings.

That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.  I believe that the demo in my earlier 
message undermines that claim as far as RDF goes.

 1. The extractors are platform specific -- AWWW is about platform 
 agnosticism
 (I don't want to mandate an OS for experiencing the power of Linked Open 
 Data
 transformers / rdfizers)
 
 Well, the extractors would be specific to PDF, but that's hardly surprising, 
 I think.

[I've lost track of whose comment this is...]

The extractor I demoed wasn't PDF-specific.

 We want to leverage the productivity and simplicity that AWWW brings to data
 representation, access, interaction, and integration.
 
 Sure, but the additional costs, if any, on paper authors, reviewers, and 
 readers have to be considered.  If these costs are eliminated or at least 
 minimized then this good is much more likely to be realized.
 
 With some help from Adobe we can have the best of all worlds here. I am going 
 to take a look at their latest cloud offerings and associated APIs.

I forgot to attach the extractor I wrote -- done.  The demo didn't use any 
Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK


extract-xmp.c
Description: Binary data


Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Norman Gray

Eric, hello.

This is a bit of a side-issue, but...

On 2014 Oct 7, at 07:13, Eric Prud'hommeaux e...@w3.org wrote:

 * Luca Matteis lmatt...@gmail.com [2014-10-07 00:41+0200]
 Sorry to jump into this once again but when it comes to typesetting
 nothing really comes close to Latex/PDF:
 http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
 not even HTML/CSS/JavaScript
 
 Making a floating model look like Latex/PDF at all resolutions seems
 impossible. Perhaps targeting a fixed (A4 or 8½×11 @300dpi) resolution
 is quite doable.

This isn't as hard as you might think (if I'm understanding you correctly).

At http://purl.org/nxg/text/general-relativity I have some lecture notes.  
The downloads there include:

http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2.pdf
http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2-usletter.pdf
http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2-screen.pdf

Those come from the _same_ source file with different \documentclass options (I 
keep meaning to do something about the marginal notes in the screen version, 
but have never got around to it).  There's no resolution/DPI problem, because 
these are all vector fonts, not bitmaps.  There's should be no 'missing font' 
problem because the fonts are automatically embedded properly (the maths font 
in those documents is a commercial one, so it's unlikely to be on your 
computer).

This won't dynamically reflow, it's true (and that's a pity), but if I ever get 
a tablet computer, I doubt I'll be able to resist producing versions in a 
layout which is targeted at that size of screen.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Kingsley Idehen

On 10/7/14 5:39 AM, Norman Gray wrote:

Kingsley and all, hello.

On 2014 Oct 7, at 02:18, Kingsley Idehen kide...@openlinksw.com wrote:


On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:


On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the

Huh?  Every single PDF reader that I use can extract the PDF metadata and 
display it.

Again, this isn't about metadata.

With all respect to the larger goal of having fully semanticked-up documents, I 
think the question _is_ all about metadata.


It can't be. The metadata focus is a subtle misconception. We need 
access to all of the data in the document.



   The original spark to the thread was a lament that SW and LD conferences 
don't mandate something XMLish for submissions because X(HT)ML is clearly 
better for... well ... dammit, it's Better.


The initial gripe (as I've always seen it) is that we are trying to tell 
the world about Linked Open Data virtues while rarely putting them to 
use (instinctively) ourselves. It just so happens that conferences are 
provide an example that most have experienced in some capacity.




_One_ thing it would be better for is supporting the sort of full-scale 
RDF-everything view that you've described so eloquently.  But if that's your 
goal, then lexing the source text is really going to be the least of your 
problems.

A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.


Yes, or just make references to RDF sources relevant to the paper, but 
on the basis that those references (to the degree possible) resolve. 
This also about the data represented in tabular form (as tables) and the 
data behind the tables, so to speak.



  That practically means metadata, plus perhaps some document structure, plus, 
if you're keen and can get the authors to invest their effort, some 
argumentation.  That's available for free (and right now) from LaTeX authors, 
and available from XHTML authors depending on how hard it would be to get them 
to put @profile attribute in the right places.

So no, not just about 'metadata' in the narrow sense, but I think this thread 
is about what RDF you can in practice extract from the materials that authors 
can in practice be induced or obliged to submit to conference proceedings.


For those conferences associated with themes such as Linked Open Data 
and the Semantic Web, RDF should be the norm for structured data 
representation. If that isn't possible then what are we saying to the 
world about RDF, in regards to structured data representation and data 
de-silo-fication?





That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.


The are linked :-)


   I believe that the demo in my earlier message undermines that claim as far 
as RDF goes.


1. The extractors are platform specific -- AWWW is about platform agnosticism
(I don't want to mandate an OS for experiencing the power of Linked Open Data
transformers / rdfizers)

Well, the extractors would be specific to PDF, but that's hardly surprising, I 
think.

[I've lost track of whose comment this is...]

The extractor I demoed wasn't PDF-specific.


Platform in the context of my comments really relates to operating 
systems i.e., most PDF extractors are operating system specific. That's 
why I mentioned the massive opportunity for Adobe (and 3rd parties too, 
as Mike Bergman added) in regards to providing Web Services to accessing 
and indexing PDF document content.





We want to leverage the productivity and simplicity that AWWW brings to data
representation, access, interaction, and integration.

Sure, but the additional costs, if any, on paper authors, reviewers, and 
readers have to be considered.  If these costs are eliminated or at least 
minimized then this good is much more likely to be realized.

With some help from Adobe we can have the best of all worlds here. I am going 
to take a look at their latest cloud offerings and associated APIs.

I forgot to attach the extractor I wrote -- done.  The demo didn't use any 
Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.


You forgot the extractor demo link :)



All the best,

Norman





--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord


The stack exchange discussion mostly talks about the user side of
things. Go back (quite) a few years and using PDF from tex was a pain,
pretty much up until pdflatex become the norm. 

For those who thing that latex is still the best, I do not see that an
HTML centric publishing framework should be a barrier. If the majority
of papers were being produced from Word, then it might be more of an
issue. 

Phil


Luca Matteis lmatt...@gmail.com writes:

 Sorry to jump into this once again but when it comes to typesetting
 nothing really comes close to Latex/PDF:
 http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
 not even HTML/CSS/JavaScript




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord
Norman Gray nor...@astro.gla.ac.uk writes:

 This won't dynamically reflow, it's true (and that's a pity), but if I ever
 get a tablet computer, I doubt I'll be able to resist producing versions in a
 layout which is targeted at that size of screen.


Sure, that's fine. But why not have a version which behaves reasonably
at all screen sizes. This should be achievable.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Sarven Capadisli

On 2014-10-07 11:39, Norman Gray wrote:

The original spark to the thread was a lament that SW and LD conferences don't 
mandate something XMLish for submissions because X(HT)ML is clearly better 
for... well ... dammit, it's Better.


Straw man argument. Please stop that now!

I will spell out the main proposal and purpose for you because it sounds 
like you are completely oblivious to them. Let me know if anything is 
unclear.


* Conferences on SW/LD research should encourage and allow submissions 
using the Web native technology stack (e.g., starting from HTML and 
friends for instance) alongside the existing requirements. As the 
required submission in PDF can be generated via HTML+CSS, those that 
wish to arrive at the PDF by their own means can still do so, meanwhile 
without asking or forcing the existing authorship or review process to 
change. It is backwards compatible. The underlying idea is to use our 
own technologies, not only for the sake of using them, but also to 
identify the pains as a precursor to raising the quality of the 
(Semantic) Web stack for scientific research publishing, discovery, and 
reuse. This is plain and simple dogfooding and it is important.


* There is an opportunity for granular data discovery, reuse, and 
machines to aid in reproducibility of scientific research. This goes 
completely beyond off the shelf metadata e.g., author, title, subject, 
or what you can stuff into LaTeX+Whatever, not to mention mangling 
around what's primarily intended for desktop and print, to squeeze in 
some Web in there. We are talking about making reasonable strides 
towards having scientific knowledge that is universally accessible 
on the Web. PDF and friends do not fit into that equation that well, 
however, no one is blocked from doing what they already do. Some of us 
would like to do a bit more than that to test things out so that we can 
collectively have more wins.


* There is also an opportunity to attract more funding and interest 
groups, if we can better assess the state of Web Science. This is 
simply due to the fact that we would be able to mine more useful 
information from existing research. Moreover, we can identify research 
areas of potential value better. It is to elevate the support that we 
can get from machines to excel and to do our work better. This is in 
contrast to what we can currently achieve with the existing workflow 
i.e., the current process is only concerned about making it easy for 
the author, reviewer, and publisher, and not about gleaning 
high-fidelity information.



A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.  That practically means 
metadata, plus perhaps some document structure, plus, if you're keen and can 
get the authors to invest their effort, some argumentation.  That's available 
for free (and right now) from LaTeX authors, and available from XHTML authors 
depending on how hard it would be to get them to put @profile attribute in the 
right places.
That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.  I believe that the demo in my earlier 
message undermines that claim as far as RDF goes.


Let me get this right: you are advocating that LaTeX + RDF/XML + 
whatever processes one has to go through, is a more sensible approach 
than HTML? If so, we have a different view on what creates a good UX.


It may come as news to you, but the SW/LD community is not in favour of 
authors using RDF/XML unless it is completely within some tool-chain 
left for machines to deal with. There are alternative RDF notations 
which are more preferable. You should look it up. The problem with your 
proposal is that, the author has to boggle their mind with two 
completely different syntaxes (LaTeX and RDF/XML), whereas the original 
proposal was to deal with one i.e., HTML. Styling is no more of an issue 
as the templates in the case of LaTeX is provided, and for HTML, I've 
made a modest PoC with:


https://github.com/csarven/linked-research

However, you are somehow completely oblivious to that even though it was 
mentioned several times now on this mailing list. No, it is not perfect, 
and yes it can be better. There are alternative solutions to achieve 
something along those lines with the same vision in mind, which area all 
okay too.


If this is not about coding, but rather using WYSIWYG editors or 
authoring/publication tools, have a look and try a few here or from a 
service near you:


* http://en.wikipedia.org/wiki/Comparison_of_HTML_editors

* http://en.wikipedia.org/wiki/List_of_content_management_systems

Or you know, take 30 seconds to create a WordPress account and another 
30 seconds to publish. Let me know if you still think that's 
insufficient or completely unreasonable / difficult for Web Science 
people to handle.


So, *do as you like, but do not prevent me* from doing encouraging the 
SW/LD 

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 tex4ht takes the slight strange approach of having an strange and
 incomprehensible command line, and then lots of scripts which do default
 options, of which xhmlatex is one. In my installation, they've only put
 the basic ones into the path, so I ran this with
 /usr/share/tex4ht/xhmlatex.


 Phil


 So someone has to package this up so that it can be easily used.  Before then,
 how can it be required for conferences?

http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex


 I have tex4ht installed, but there is no xhmlatex file to be found.  I managed
 to find what appears to be a good command line

I don't know why that would be. It is installed with the debian package,
although as I said, it is not in the system path. I found it with dpkg
-S. Am afraid it's a long time since I used an RPM based system, so I
can't remember how do this on fedora.


 htlatex schema-org-analysis.tex xhtml,mathml  -cunihtf -cvalidate

 This looks better when viewed, but the resultant HTML is unintelligible.

 There is definitely more work needed here before this can be considered as a
 potential solution.

Yes, I agree.

So, the question is how to enable this. One way would, for example, be
for ISWC and ESWC to accept HTML and have a prize for the best semantic
paper submitted. Then people with the inclination would do the work.

Again, I suspect it's not that much, but we will not know until we try.

Phil





Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 On 10/06/2014 11:00 AM, Phillip Lord wrote:
 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 On 10/06/2014 09:32 AM, Phillip Lord wrote:
 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 Who cares what the authors intend? I mean, they are not reading the
 paper, are they?

 For reviewing, what the authors intend is extremely important.  Having
 different rendering of the paper interfere with the authors' message is
 something that should be avoided at all costs.

 Really? So, for example, you think that a reviewer with impared vision
 should, for example, be forced to review a paper using the authors
 rendering, regardless of whether they can read it or not?

 No, but this is not what I was talking about. I was talking about
 interfering with the authors' message via changes from the rendering
 that the authors' set up.

 It *is* exactly what you are talking about.

 Well, maybe I was not being clear, but I thought that I was talking about
 rendering  changes interfering with comprehension of the authors' intent.


And if only you had a definition of rendering changes that interfere
with authors intent as opposed to just rendering changes.

I can guarantee that rendering a paper to speech WILL change at least
some of the authors intent because, for example, figures will not
reproduce. You state that this should be avoided at all costs.

I think this is wrong. There are many reasons to change rendering. That
should be the readers choice.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 So, you believe that there is an excellent set of tools for preparing,
 reviewing, and reading scientific publishing.

 Package them up and make them widely available.  If they are good, people will
 use them.

 Convince those who run conferences.  If these people are convinced, then they
 will allow their use in conferences or maybe even require their use.

Is that not the point of the discussion?

Unfortuantely, we do not know why ISWC and ESWC insist on PDF.

 I'm not convinced by what I'm seeing right now, however.

Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?

Phil




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Robert Stevens



What I'd suggest for conference organisers is something like the following:

1. Keep the PDF as the main thing, as it's not going anywhere soon.
3. Also allow submission in some alternative form, including semantic 
content, and have the conference run a competition for alternative 
publishing forms - including voting by delegates on what  they like and 
what they want. this could promote such alternative forms and offer a 
migration route over time.


Robert.

On 07/10/2014 13:27, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

So, you believe that there is an excellent set of tools for preparing,
reviewing, and reading scientific publishing.

Package them up and make them widely available.  If they are good, people will
use them.

Convince those who run conferences.  If these people are convinced, then they
will allow their use in conferences or maybe even require their use.

Is that not the point of the discussion?

Unfortuantely, we do not know why ISWC and ESWC insist on PDF.


I'm not convinced by what I'm seeing right now, however.

Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?

Phil




--
Professor Robert Stevens
Bio-health Informatics Group
School of Computer Science
University of Manchester
Oxford Road
Manchester
United Kingdom
M13 9PL

robert.stev...@manchester.ac.uk
Tel: +44 (0) 161 275 6251
Blog: http://robertdavidstevens.wordpress.com
Web: http://staff.cs.manchester.ac.uk/~stevensr/

KBO




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider
If you mean that published papers have to be in PDF, but that they can 
optionally have a second format, then I had no problem with this proposal.  I 
also have no problem with encouraging use of other formats.


However, this is an added burden on conference organizers.  Someone would have 
to volunteer to handle the extra work, particularly the work involved in 
checking that papers using the second format abide by the publishing requirements.


peter



On 10/07/2014 05:52 AM, Robert Stevens wrote:



What I'd suggest for conference organisers is something like the following:

1. Keep the PDF as the main thing, as it's not going anywhere soon.
3. Also allow submission in some alternative form, including semantic content,
and have the conference run a competition for alternative publishing forms -
including voting by delegates on what  they like and what they want. this
could promote such alternative forms and offer a migration route over time.

Robert.

On 07/10/2014 13:27, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

So, you believe that there is an excellent set of tools for preparing,
reviewing, and reading scientific publishing.

Package them up and make them widely available.  If they are good, people will
use them.

Convince those who run conferences.  If these people are convinced, then they
will allow their use in conferences or maybe even require their use.

Is that not the point of the discussion?

Unfortuantely, we do not know why ISWC and ESWC insist on PDF.


I'm not convinced by what I'm seeing right now, however.

Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?

Phil








Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider

On 10/07/2014 05:27 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


So, you believe that there is an excellent set of tools for preparing,
reviewing, and reading scientific publishing.

Package them up and make them widely available.  If they are good, people will
use them.

Convince those who run conferences.  If these people are convinced, then they
will allow their use in conferences or maybe even require their use.


Is that not the point of the discussion?


Not at all.  Where was the proposal to put together something that met the 
requirements of preparing, reviewing, and publishing scientific papers?


To me, the initial discussion was about how much better HTML was for carrying 
data.  Other aspects of paper preparation, review, and publishing were not 
being considered.  Now, maybe, aspects of presentation and review and ease of 
use are part of the discussion.   A change in the paper submission process 
needs to take into account what the paper submission process is about, not 
just some aspect of what might be included in submitted papers.



Unfortuantely, we do not know why ISWC and ESWC insist on PDF.


As far as I am concerned, ISWC and ESWC insist on PDF for submissions because 
the reviewing process is so much better with PDF than with anything else.



I'm not convinced by what I'm seeing right now, however.


Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?


Well, I remain totally unconvinced that any current HTML solution is as good 
as the current PDF setup.  Certainly htlatex is not suitable.  There may be 
some way to get tex4ht to do better, but no one has provided a solution. 
Sarven Capadisli sent me some HTML that looks much better, but even on a 
math-light paper I could see a number of glitches.  I haven't seen anything 
better than that.


It's not as if the basics (MathML, CSS, etc.)  are unavailable to put together 
most, or maybe even all, of an HTML-based solution.  These basics have been 
around for some time now.  However, I haven't seen a setup that is as good as 
LaTeX and PDF for preparation, review, and publishing of scientific papers.


Yes, it took a lot of effort to get to the current state with respect to LaTeX 
and PDF.  In the past, I experienced quite a number of problems with using 
LaTeX and PDF for writing, reviewing, and publishing scientific papers, but 
most of these are in the past.  Yes, there are still some problems with using 
LaTeX and PDF.  Produce something better and people will use it, eventually.



Phil


peter



Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider



On 10/07/2014 05:23 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about.


Well, maybe I was not being clear, but I thought that I was talking about
rendering  changes interfering with comprehension of the authors' intent.



And if only you had a definition of rendering changes that interfere
with authors intent as opposed to just rendering changes.

I can guarantee that rendering a paper to speech WILL change at least
some of the authors intent because, for example, figures will not
reproduce. You state that this should be avoided at all costs.

I think this is wrong. There are many reasons to change rendering. That
should be the readers choice.

Phil


I think that for reviewing the authors should be able to dictate how their 
submission looks, within the bounds of the submission requirements.  If the 
reviewer wants, or needs, to change the way a submission is presented then it 
is up to the reviewer to ensure that their review is not coloured by this change.


When I review papers I routinely point out presentation problems.  Sometimes I 
take into account presentation problems when I evaluate papers.  However, I 
try very hard to evaluate the submission based on what the authors submitted, 
not on any changes that I made to the submission.  For example, I will point 
out problems with using colours in graphs, but I will evaluate the paper based 
on the coloured version of the graphs, not a black and white version. 
However, if the authors submitted low-resolution figures and something is 
missing because of this, then I feel free to take this into account in my 
evaluation.


In a situation where I do not know what presentation the authors wanted, for 
example if explicit line breaks and indentation are sometimes preserved, but 
not always, the evaluation of submissions can become very much harder.


peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider



On 10/07/2014 05:20 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then,
how can it be required for conferences?


http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex


Somehow this is not in my tex4ht package.

In any case, the HTML output it produces is dreadful.   Text characters, even 
outside math, are replaced by numeric XML character entity references.


peter



Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 tex4ht takes the slight strange approach of having an strange and
 incomprehensible command line, and then lots of scripts which do default
 options, of which xhmlatex is one. In my installation, they've only put
 the basic ones into the path, so I ran this with
 /usr/share/tex4ht/xhmlatex.


 Phil


 So someone has to package this up so that it can be easily used.  Before 
 then,
 how can it be required for conferences?

 http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex

 Somehow this is not in my tex4ht package.

 In any case, the HTML output it produces is dreadful.   Text characters, even
 outside math, are replaced by numeric XML character entity references.


So, I am willing to spend some time getting this to work. I would like
to plug some ESWC papers into tex4ht, to get some HTML which works plain
and also with Sarven's templates so that it *looks* like a PDF.

Would you be willing to a) try it and b) give worked and short test
cases for things that do not work?

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Mark Diggory
Hi John, Kingsley, et al,

On Mon, Oct 6, 2014 at 8:39 AM, John Erickson olyerick...@gmail.com wrote:

 This is an incredibly rich and interestingly conversation. I think there
 are two separate themes:
 1. What is required and/or asked-for by the conference organizers...
 a. ...that is needed for the review process
 b. ...that is needed to implement value-added services for the conference
 c. ...that contributes to the body of work

 2. What is required and/or asked for by the publisher?

 All of (1) is about the meat of the contributions, including
 establishing a long-term legacy. (2) is about (presumably) prestigious
 output.

 What added services could esp. Easychair provide that would go beyond 1.a.
 and contribute to 1.b. and 1.c., etc.? Are there any Easychair committers
 watching this thread? ;)

 John

 --
 John S. Erickson, Ph.D.
 Deputy Director, Web Science Research Center
 Tetherless World Constellation (RPI)
 http://tw.rpi.edu olyerick...@gmail.com
 Twitter  Skype: olyerickson


This makes me think of PLoS. For example, PLoS has a published format
guidelines using Work and Latex (http://www.plosone.org/static/guidelines),
a workflow for semantically structuring their resulting output and their
final output is well structured and available in XML based on a known
standard (http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd),
PDF and the published HTML on their website (
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is transformed to HTML

http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

Interestingly as well, they have provided this framework in an open source
form:
http://www.ambraproject.org/

Clearly the publication process can support a semantic solution and when
its in the best interest of the publisher. They will adopt and drive their
own markup processes to meet external demand.

Providing tools that both the publisher and the author may use
independently could simplify such an effort, but is not a main driver in
achieving that final result you see in PLoS. This is especially the case
given even the debate concerning file formats here. For PLoS, the solution
that is currently successful is the one that worked to solve todays
immediate local need with todays tools.

Cheers,
Mark

p.s. Finally, on the reference of moving repositories such as EPrints and
DSpace towards supporting semantic markup of their contents. Being somewhat
of a participant in LoD on the DSpace side, I note that these efforts are
inherently just Repository Centric, describing the the structure of the
repository (IE Collections of Items), not the semantic structure contained
within the Item contents (articles, citations, formulas, data tables,
figures, ideas). In both platforms, these capabilities are in their
infancy, lacking any rendering other than to offer the original file for
download, they ultimately suffer from the absence of semantic structure in
the content going into them.

-- 
Mark R. Diggory


Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider
Sure, I have lots of papers (none for ESWC, though) that could serve as test 
cases.


peter


On 10/07/2014 07:49 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then,
how can it be required for conferences?


http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex


Somehow this is not in my tex4ht package.

In any case, the HTML output it produces is dreadful.   Text characters, even
outside math, are replaced by numeric XML character entity references.



So, I am willing to spend some time getting this to work. I would like
to plug some ESWC papers into tex4ht, to get some HTML which works plain
and also with Sarven's templates so that it *looks* like a PDF.

Would you be willing to a) try it and b) give worked and short test
cases for things that do not work?

Phil





Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Simon Spero
BLUF: This is where information science comes in. Technology must meet the
needs of real users.

It may be better to generate better Tagged PDFs, and to experiment, using
some existing methodology annotation ontologies, with generating auxiliary
files of triples. This might require new/changed latex packages, new
div/span classes, etc.
\huge

But what is really needed is actually working with SMEs to discover the
cultural practices within the field and subfield, and developing systems
that support their work styles. This is why Information Science is
important.

If there are changes in practices that would be beneficial, and these
benefits can be demonstrated to the appropriate audiences, then these can
be suggested.

If existing programs, libraries, and  operating systems can be modified to
provide these wins transparently, then it is easier to get the changes
adopted.

If the benefits require additional work, then the additional work must give
proportionate benefits to those doing the work, or be both of great benefit
to funding agencies or other gatekeepers, *and* be easily verifiable.

An example might be a proof (or justified belief)   that a paper and it's
supplemental materials do, or do not contain everything required to attempt
to replicate the results.
This might be feasible in many fields through combination of annotation,
with sufficiently powerful KR language and reasoning system.

Similarly, relatively simple meta-statistical analysis can note common
errors (like multiple comparisons that do not correct for False Discovery
Rate). This can be easy if the analysis code is embedded in the paper (eg
SWeave), or if the adjustment method is part of the annotation, and the
decision process need not be total.

This kind of validation can be useful to researchers (less embarrassment),
and useful to gatekeepers (less to manually review).

Convincing communities working with large datasets to use RDF as a native
data format is unlikely to work.

The primary problem is that it isn't a very good one. It's great for
combining data from multiple sources- as long as ever datum is true.
If you want to be less credulous , KMAC YOYO.

Convincing people to add metadata describing  values in structures as
owl/rdfs datatypes or classes is much easier- for example,  as HDF5
attributes.

If the benefits require major changes to the cultural practices within a
given knowledge community, then they must be extremely important *to that
community*, and will still be resisted, especially by those most
accultutrated into that knowledge community.

An example of this kind of change might be inclusion in supplemental
materials of analyses and data that did not give positive results. This
reduces the file drawer effect,  and may improve the justified level of
belief in the significance of published results (p  1.0).

This level of change may require a blood upgrade ( 
https://www.goodreads.com/quotes/4079-a-new-scientific-truth-does-not-triumph-by-convincing-its).


It might also be imposable from above by extreme measures (if more than 10%
of your claimed significant results can't be replicated, and you can't
provide a reasonable explanation in a court of law, you may be held liable
for consequential damages incurred by others reasonably relying on your
work, and reasonable costs  possible punitive damages for costs incurred
attempting to replicate.

Repeat offenders will be fed to a ravenous mob of psychology
undergraduates,  or forced to teach introductory creative writing ).

Simon
P. S.

[dvips was much easier if you had access to Distiller]

It is possible to add mathematical content to html pages, but it is not
easy.

MathML is not something that browser developers want, which means that the
only viable approach is MathJax (http://mathjax.org).

Mathjax is impressive, and supports a nice subset of LaTeX (including some
AMS).
However, it adds a noticeable delay to page rendering, as it is heavy duty
eczema script, and is computing layout on the fly.

It does not require server side support, so is usable from static sites
like github pages (see e g.  the tests at the bottom of 
http://who-wg.github.io).

However the common deployment pattern, using their CDN, adds archival
dependencies.

From a processing perspective, this does not make semantic processing of
the text much easier, as it may require eczema script code to be executed.
 On Oct 7, 2014 8:14 AM, Phillip Lord phillip.l...@newcastle.ac.uk
wrote:



 On 10/07/2014 05:20 AM, Phillip Lord wrote:

 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

  tex4ht takes the slight strange approach of having an strange and
 incomprehensible command line, and then lots of scripts which do default
 options, of which xhmlatex is one. In my installation, they've only put
 the basic ones into the path, so I ran this with
 /usr/share/tex4ht/xhmlatex.


 Phil


 So someone has to package this up so that it can be easily used.  Before
 then,
 how can it be required for 

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Kingsley Idehen

On 10/7/14 1:14 PM, Norman Gray wrote:

Sarven, hello.

On 2014 Oct 7, at 13:13, Sarven Capadisli i...@csarven.ca wrote:


On 2014-10-07 11:39, Norman Gray wrote:

The original spark to the thread was a lament that SW and LD conferences don't 
mandate something XMLish for submissions because X(HT)ML is clearly better 
for... well ... dammit, it's Better.

Straw man argument. Please stop that now!

I will spell out the main proposal and purpose for you because it sounds like 
you are completely oblivious to them. Let me know if anything is unclear.

My remark was intended as facetious rather than fractious, but if you feel I 
misjudged the balance, I apologise.

I want to clarify what I meant, because on reflection it explains (at least to 
me) why I'm participating in this thread at such length.  My intention was to 
indicate that I don't feel that HTML is as central as you, amongst others, seem 
to assert it is.

I characterise the web as:

   1. URIs for addressing things,
   2. HTTP for retrieving things (other protocols exist, but...),
   3. a downloadable format which clients can parse to obtain more URIs, with a 
'follow this' semantic.


How about:

1. HTTP URIs for naming (or identifying) things -- basically, the 
combined effects of denotation (signification) and connotation 
(perceptible description)
2. RDF abstract language for describing things -- systematic use of 
signs, syntax, and role semantics for communication
3. Notations for inscribing RDF language based descriptions to documents 
-- where notations serve the medium-specific purpose of representing the 
words of a language.


Once you have the base RDF Document in place, using a preferred 
notation, and subject to viewer preferences, you transform the RDF 
document into other document types (HTML, PDF, etc..), in line with 
viewer preferences.




Now, the obvious candidate for (3) is of course HTML; but on the web, and 
_especially_ on the Semantic Web, it can be anything: RDF in one or other 
format, XML+GRDDL, some discipline-specific format with has a link semantic in 
it, or even a PDF file with a standardised lump of RDF/XMP inside it.


The trouble with the paragraph above is that RDF isn't a format. That 
presumption is the root of mass confusion.



That RDF may be immediately present, or it may require some sort of heuristic 
or deterministic extraction (as Kingsley has discussed).

All of these are web-native technologies, and I'd go as far as to say that the 
_least_ interesting thing you can find at the end of a URI is an HTML file.


For sure!



The big deal, for me, in the idea of the Semantic Web, and the RDF world, is 
the realisation that the RDF model is sufficiently general that you can turn 
almost any structured data into RDF, put it into a big bucket, and start 
inferencing, querying, linking, and so on.  That generation/extraction of RDF 
is probably easier if the stuff is already pointy-bracketed for you, but that's 
only a detail.


Yes, which is why we have to think of RDF (accurately) as a Language, 
and never a format. The format issue is something that should have been 
attended to years ago in W3C literature i.e., the notion of abstract and 
concrete syntaxes leads to the misconception that RDF is about document 
content formats. The loose-coupling of language (signs, syntax, and 
semantics) and notations (representation of words of a language) aspect 
isn't visible, and as a result lost or overlooked (on a good day).


JSON-LD and TURTLE are all accurately pitched (across all related 
collateral) as Notations. Funnily enough, each is also associated with 
significant RDF uptake initiatives: TURTLE re., the LOD Cloud and 
JSON-LD re., Google, Bing!, Yandex, and possibly Yahoo!, as major RDF 
supporters and adopters that are driving mass production of HTML 
documents that include RDF-language based structured data (inline or via 
structured data islands using script/) .




The interesting thing, for me, is just how the web as a whole can go about 
collectively managing or facilitating this generation/extraction in a way which 
balances faithfulness to the original with interoperable meaning (Dublin Core 
and FOAF are truly wonderful things).  That is why I do feel that -- especially 
in this SW/LD community --

 HTML is a bit of a sideshow.


Yes, it is, but I think Sarven uses it as a simple starting point i.e., 
a point of least distraction, so to speak.


HTML is a splendid thing for all the reasons that you know and I know, but if it's seen 
as central, if all questions turn into what does that look like in HTML?, if 
it's so in-our-face that we can't see round it, then we miss the interesting questions.


Yes!


So it's not that I've a particular downer on HTML, or a particular enthusiasm for PDF, but I think 
that what does that look like in PDF? and what does that look like in FITS? 
(the format of choice in my area) are more interesting.


Yes.



(or put another way, I don't think that 

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider
PLOS is an interesting case.  The HTML for PLOS articles is relatively 
readable.  However, the HTML that the PLOS setup produces is failing at math, 
even for articles from August 2014.


As well, sometimes when I zoom in or out (so that I can see the math better) 
Firefox stops displaying the paper, and I have to reload the whole page.


Strangely, PLOS accepts low-resolution figures, which in one paper I looked at 
are quite difficult to read.


However, maybe the PLOS method can be improved to the point where the HTML is 
competitive with PDF.


peter






This makes me think of PLoS. For example, PLoS has a published format
guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a
workflow for semantically structuring their resulting output and their final
output is well structured and available in XML based on a known standard
(http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the
published HTML on their website
(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is transformed to HTML

http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML
http://www.plosone..org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

Interestingly as well, they have provided this framework in an open source form:
http://www.ambraproject.org/

Clearly the publication process can support a semantic solution and when its
in the best interest of the publisher. They will adopt and drive their own
markup processes to meet external demand.

Providing tools that both the publisher and the author may use independently
could simplify such an effort, but is not a main driver in achieving that
final result you see in PLoS. This is especially the case given even the
debate concerning file formats here. For PLoS, the solution that is currently
successful is the one that worked to solve todays immediate local need with
todays tools.

Cheers,
Mark

p.s. Finally, on the reference of moving repositories such as EPrints and
DSpace towards supporting semantic markup of their contents. Being somewhat of
a participant in LoD on the DSpace side, I note that these efforts are
inherently just Repository Centric, describing the the structure of the
repository (IE Collections of Items), not the semantic structure contained
within the Item contents (articles, citations, formulas, data tables, figures,
ideas). In both platforms, these capabilities are in their infancy, lacking
any rendering other than to offer the original file for download, they
ultimately suffer from the absence of semantic structure in the content going
into them.

--
Mark R. Diggory




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Sarven Capadisli

On 2014-10-06 06:59, Ivan Herman wrote:
 Of course, I could expect a Web technology related crows to use HTML 
source editing directly but the experience by Daniel and myself with the 
World Wide Web conference(!) is that people do not want to do that. 
(Researchers in, say, Web Search have proven to be unable or unwilling 
to edit HTML source. It was a real surprise...). Ie, the authoring tool 
offers are still limited.


Can you please elaborate on that? When was that and what tools were 
available or used? Do you have any documentation on the landscape from 
that time that we can use or learn from?


My understanding is that, you've experienced some issues about a decade 
ago and your reasoning is clouded by that. Do you think that it would be 
fair to revisit the situation based on today's landscape and see how it 
will play out?


From my perspective, we should have a bit more faith in the SW 
community because then we might actually strive to deliver, as opposed 
to walking away from the problem.


Like I said in my previous emails, (which I'm sure you've read), the 
current workshops on SW/LD research publishing did not deliver. Why do 
you have so much faith for waiting out and hope that they will deliver? 
They might, and I hope they do. But, I'm not putting all my chips on 
that option alone. I would rather see grass-roots efforts in parallel 
e.g., http://csarven.ca/call-for-linked-research


What's the number of human hours on CfP on Linked Science + Semantic 
Publishing so far? How was the delivery of machine and human-friendly 
research changed or evolved? What's visible or countable? On that front, 
what can we do right now that wasn't possible 5-10 years ago?


In the meantime, if the conferences, workshops can get back on track and 
motivate people (at least), we would not only see more value drawn out 
of the SW research, but also growing funding opportunities, and faster 
progress across the field.


I am disappointed by the fact that instead of addressing the core issue 
can the conferences allow or encourage the Web stack? we are 
discussing distractions e.g., perfection in authoring tools. Every user 
has their own preferences i.e., some will code, some will use tool X. 
What you are suggesting is that, lets wait it out because the 
developments may reveal the perfect authorship tooling. If that was ever 
the case, we'd see it in the general market, not something that might 
one day emerge out of SW/LD workshops.


I will bet that if the requirements evolve towards Webby submissions, 
within 3-5 years time, we'd see a notable change in how we collect, 
document and mine scientific research in SW. This is not just being 
hopeful. I believe that if all of the newcomers into the (academic) 
research scene start from HTML (and friends) instead of LaTeX/Word (and 
friends), we wouldn't be having this discussion. If the newcomes are 
told to deal with LaTeX/Word (regardless of hand coding or using a 
WYSIWYG editor) today, they are going to do exactly that. That basically 
pushes the date further for complete switch over to Webby tools because 
majority of those researchers would have to be flushed out of the 
system, before the next wave of Webby users can have their chance.


Even if we have all of the perfect or appropriate tooling (which I think 
is the wrong thing to aim for) right now, it will still take a few years 
to flush out or have the current LaTeX/Word users to evolve. I would 
rather see the smallest change happen right now than nothing at all.


*AGAIN*, technology is not the problem. #DIY

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 One problem with allowing HTML submission is ensuring that reviewers can
 correctly view the submission as the authors intended it to be viewed.  How
 would you feel if your paper was rejected because one of the reviewers could
 not view portions of it?  At least with PDF there is a reasonably good chance
 that every paper can be correctly viewed by all its reviewers, even if they
 have to print it out.  I don't think that the same claim can be made for
 HTML-based systems.


I don't think this is a valid point. It is certainly possible to write
HTML that will not be look good on every machine, but these days, it is
easier to write HTML that does.

The same is true with PDF. Font problems used to be routine. And, as
other people have said, it's very hard to write a PDF that looks good on
anything other than paper.


 Further, why should there be any technical preference for HTML at all?  (Yes,
 HTML is an open standard and PDF is a closed one, but is there anything else
 besides that?)  Web conference vitally use the web in their reviewing and
 publishing processes.  Doesn't that show their allegiance to the web?  Would
 the use of HTML make a conference more webby?

PDF is, I think, open these days. But, yes, I do think that conferences
should dog food. I mean, what would you think if W3C produced all of
their documents in PDF? Would that make sense?

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Luca Matteis lmatt...@gmail.com writes:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
 The real problem is still the missing tooling. Authors, even if technically
 savy like this community, want to do what they set up to do: write their
 papers as quickly as possible. They do not want to spend their time going
 through some esoteric CSS massaging, for example. Let us face it: we are not
 yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!

 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)


Yes, you can. Most of the publishers use XML at some point in their
process, and latex gets exported to that.

I am quite happy to keep LaTeX as a user interface, because it's very
nice, and the tools for it are mature for academic documents
(in practice, this means cross-referencing and bibliographies).

So, as well as providing a LNCS stylesheet, we'd need a htlatex cf.cfg,
and one CSS and it's done. Be good to have another CSS for on-screen
viewing; LNCS's back of a postage stamp is very poor for that.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Sarven Capadisli i...@csarven.ca writes:

 I will bet that if the requirements evolve towards Webby submissions, within
 3-5 years time, we'd see a notable change in how we collect, document and mine
 scientific research in SW. This is not just being hopeful. I believe that if
 all of the newcomers into the (academic) research scene start from HTML (and
 friends) instead of LaTeX/Word (and friends), we wouldn't be having this
 discussion. If the newcomes are told to deal with LaTeX/Word (regardless of
 hand coding or using a WYSIWYG editor) today, they are going to do exactly
 that.


I would look at an environment which has less external force. The free
software engineering community produces it's documents in a very
wide-range of formats. If you peruse github, the key characteristics
are, I think: that they are text formats because they are easy to version
with source and are hackable; and mostly they dump to HTML. PDFs are
very rare these days.

It would be fun to see what the most used are. Markdown is a big
contender, as we as language specific formats (python and
reStructuredText for example).

I don't believe that HTML is a good authoring format any more than PDF
is. I don't think see this as huge problem. HTML needs to be part of the
tool-chain, not all of it.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Kingsley Idehen

On 10/6/14 7:43 AM, Phillip Lord wrote:

I don't believe that HTML is a good authoring format any more than PDF
is. I don't think see this as huge problem. HTML needs to be part of the
tool-chain, not all of it.


+1

--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Mark Diggory
Hello,

My apologies if this is a repost (errors were encountered and my last post
bounced from the listserv)...

On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
  The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend their
 time going through some esoteric CSS massaging, for example. Let us face
 it: we are not yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!

 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)

 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.

 Best,
 Luca


There are tools, for example, theres already a bit of work to provide a
plugin for semantic markup in Microsoft Word (
https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (
https://trac.kwarc.info/sTeX/)

But, this is not a question of technology available to authors, but of
requirements defined by publishers. If authors are too busy for this
effort, then publishers facilitate that added value when it is in their
best interest.

For example, PLoS has a published format guidelines using Work and Latex (
http://www.plosone.org/static/guidelines), a workflow for semantically
structuring their resulting output and their final output is well
structured and available in XML based on a known standard (
http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the
published HTML on their website (
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is transformed to HTML

http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

Clearly the publication process can support solutions and when its in the
best interest of the publisher. They will adopt and drive their own markup
processes to meet external demand.

Providing tools that both the publisher and the author may use
independently could simplify such an effort, but is not a main driver in
achieving that final result you see in PLoS. This is especially the case
given that both file formats and efforts to produce the ideal solution
are inherently localized, competitive and diverse, not collaborative in
nature. For PLoS, the solution that is currently successful is the one that
worked to solve todays immediate local need with todays tools, not the one
that was perfectly designed to meet all tomorrows hypothetical requirements.

Cheers,
Mark Diggory

p.s. Finally, on the reference of moving repositories such as EPrints and
DSpace towards supporting semantic markup of their contents. Being somewhat
of a participant in LoD on the DSpace side, I note that these efforts are
inherently just Repository Centric, describing the the structure of the
repository (IE collections of files), not the semantic structure contained
within those files (ideas, citations, formulas, data tables, figures). In
both cases, these capabilities are in their infancy and without any strict
format and content driven publication workflow, and lacking any rendering
other than to offer the file for download, they ultimately suffer from the
same need for a common Semantic Document format that can be leveraged for
rendering, referencing and indexing.

-- 
[image: @mire Inc.]
*Mark Diggory*
*2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
*Esperantolaan 4, Heverlee 3001, Belgium*
http://www.atmire.com


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Mark Diggory
On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote:

 Hello Community,

 On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
  The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend their
 time going through some esoteric CSS massaging, for example. Let us face
 it: we are not yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!


 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)


 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.


 There are tools, for example, theres already a bit of work to provide a
 plugin for semantic markup in Microsoft Word (
 https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (
 https://trac.kwarc.info/sTeX/)

 But, this is not a question of technology available to authors, but of
 requirements defined by publishers. If authors are too busy for this
 effort, then publishers facilitate that added value when it is in their
 best interest.

 For example, PLoS has a published format guidelines using Work and Latex (
 http://www.plosone.org/static/guidelines), a workflow for semantically
 structuring their resulting output and their final output is well
 structured and available in XML based on a known standard (
 http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and
 the published HTML on their website (
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233
 ).

 This results In semantically meaningful XML that is transformed to HTML


 http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

 Clearly the publication process can support solutions and when its in the
 best interest of the publisher. They will adopt and drive their own markup
 processes to meet external demand.

 Providing tools that both the publisher and the author may use
 independently could simplify such an effort, but is not a main driver in
 achieving that final result you see in PLoS. This is especially the case
 given that both file formats and efforts to produce the ideal solution
 are inherently localized, competitive and diverse, not collaborative in
 nature. For PLoS, the solution that is currently successful is the one that
 worked to solve todays immediate local need with todays tools, not the one
 that was perfectly designed to meet all tomorrows hypothetical requirements.

 Cheers,
 Mark Diggory

 p.s. Finally, on the reference of moving repositories such as EPrints and
 DSpace towards supporting semantic markup of their contents. Being somewhat
 of a participant in LoD on the DSpace side, I note that these efforts are
 inherently just Repository Centric, describing the the structure of the
 repository (IE collections of files), not the semantic structure contained
 within those files (ideas, citations, formulas, data tables, figures). In
 both cases, these capabilities are in their infancy and without any strict
 format and content driven publication workflow, and lacking any rendering
 other than to offer the file for download, they ultimately suffer from the
 same need for a common Semantic Document format that can be leveraged for
 rendering, referencing and indexing.


 --
 [image: @mire Inc.]
 *Mark Diggory*
 *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
 *Esperantolaan 4, Heverlee 3001, Belgium*
 http://www.atmire.com




-- 
[image: @mire Inc.]
*Mark Diggory*
*2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
*Esperantolaan 4, Heverlee 3001, Belgium*
http://www.atmire.com


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Mark Diggory
Hello Community,

On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
  The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend their
 time going through some esoteric CSS massaging, for example. Let us face
 it: we are not yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!


 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)


 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.


There are tools, for example, theres already a bit of work to provide a
plugin for semantic markup in Microsoft Word (
https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (
https://trac.kwarc.info/sTeX/)

But, this is not a question of technology available to authors, but of
requirements defined by publishers. If authors are too busy for this
effort, then publishers facilitate that added value when it is in their
best interest.

For example, PLoS has a published format guidelines using Work and Latex (
http://www.plosone.org/static/guidelines), a workflow for semantically
structuring their resulting output and their final output is well
structured and available in XML based on a known standard (
http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the
published HTML on their website (
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is transformed to HTML

http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

Clearly the publication process can support solutions and when its in the
best interest of the publisher. They will adopt and drive their own markup
processes to meet external demand.

Providing tools that both the publisher and the author may use
independently could simplify such an effort, but is not a main driver in
achieving that final result you see in PLoS. This is especially the case
given that both file formats and efforts to produce the ideal solution
are inherently localized, competitive and diverse, not collaborative in
nature. For PLoS, the solution that is currently successful is the one that
worked to solve todays immediate local need with todays tools, not the one
that was perfectly designed to meet all tomorrows hypothetical requirements.

Cheers,
Mark Diggory

p.s. Finally, on the reference of moving repositories such as EPrints and
DSpace towards supporting semantic markup of their contents. Being somewhat
of a participant in LoD on the DSpace side, I note that these efforts are
inherently just Repository Centric, describing the the structure of the
repository (IE collections of files), not the semantic structure contained
within those files (ideas, citations, formulas, data tables, figures). In
both cases, these capabilities are in their infancy and without any strict
format and content driven publication workflow, and lacking any rendering
other than to offer the file for download, they ultimately suffer from the
same need for a common Semantic Document format that can be leveraged for
rendering, referencing and indexing.


-- 
[image: @mire Inc.]
*Mark Diggory*
*2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
*Esperantolaan 4, Heverlee 3001, Belgium*
http://www.atmire.com


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Paul Houle
Frankly I don't see the reason for the hate on PDF files.

I do a lot of reading on a tablet these days because I can take it to the
gym or on a walk or in the car.  Network reliability is not universal when
I leave the house (even if I had a $10 a GB LTE plan) so downloaded PDFs
are my document format of choice.

There might be a lot of hypothetical problems with PDFs,  and I am sure
there is a better way to view files on a small screen,  but practically I
have no trouble reading papers from arXiv.org,  books from oreilly.com,  be
these produced by a TeX-derived or Word-derived toolchains or a toolchain
that involves a real page layout tool for that matter.



On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com wrote:


 On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote:

 Hello Community,

 On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
  The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend their
 time going through some esoteric CSS massaging, for example. Let us face
 it: we are not yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!


 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)


 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.


 There are tools, for example, theres already a bit of work to provide a
 plugin for semantic markup in Microsoft Word (
 https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (
 https://trac.kwarc.info/sTeX/)

 But, this is not a question of technology available to authors, but of
 requirements defined by publishers. If authors are too busy for this
 effort, then publishers facilitate that added value when it is in their
 best interest.

 For example, PLoS has a published format guidelines using Work and Latex (
 http://www.plosone.org/static/guidelines), a workflow for semantically
 structuring their resulting output and their final output is well
 structured and available in XML based on a known standard (
 http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and
 the published HTML on their website (
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233
 ).

 This results In semantically meaningful XML that is transformed to HTML


 http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

 Clearly the publication process can support solutions and when its in the
 best interest of the publisher. They will adopt and drive their own markup
 processes to meet external demand.

 Providing tools that both the publisher and the author may use
 independently could simplify such an effort, but is not a main driver in
 achieving that final result you see in PLoS. This is especially the case
 given that both file formats and efforts to produce the ideal solution
 are inherently localized, competitive and diverse, not collaborative in
 nature. For PLoS, the solution that is currently successful is the one that
 worked to solve todays immediate local need with todays tools, not the one
 that was perfectly designed to meet all tomorrows hypothetical requirements.

 Cheers,
 Mark Diggory

 p.s. Finally, on the reference of moving repositories such as EPrints and
 DSpace towards supporting semantic markup of their contents. Being somewhat
 of a participant in LoD on the DSpace side, I note that these efforts are
 inherently just Repository Centric, describing the the structure of the
 repository (IE collections of files), not the semantic structure contained
 within those files (ideas, citations, formulas, data tables, figures). In
 both cases, these capabilities are in their infancy and without any strict
 format and content driven publication workflow, and lacking any rendering
 other than to offer the file for download, they ultimately suffer from the
 same need for a common Semantic Document format that can be leveraged for
 rendering, referencing and indexing.


 --
 [image: @mire Inc.]
 *Mark Diggory*
 *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
 *Esperantolaan 4, Heverlee 3001, Belgium*
 http://www.atmire.com




 --
 [image: @mire Inc.]
 *Mark Diggory*
 *2888 Loker Avenue East, Suite 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

On 10/06/2014 04:15 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


One problem with allowing HTML submission is ensuring that reviewers can
correctly view the submission as the authors intended it to be viewed.  How
would you feel if your paper was rejected because one of the reviewers could
not view portions of it?  At least with PDF there is a reasonably good chance
that every paper can be correctly viewed by all its reviewers, even if they
have to print it out.  I don't think that the same claim can be made for
HTML-based systems.



I don't think this is a valid point. It is certainly possible to write
HTML that will not be look good on every machine, but these days, it is
easier to write HTML that does.

The same is true with PDF. Font problems used to be routine. And, as
other people have said, it's very hard to write a PDF that looks good on
anything other than paper.


My aesthetics are different.  I routinely view PDFs on my laptop, and find 
that they indeed look great.  As I said before, I prefer PDF to HTML for 
viewing of just about any technical material on my computers.  Yes, on limited 
displays two-column PDF may not be viewable at all.  Single-column PDF should 
look good on displays with resolution of HD or better.


When I view HTML documents, even the ones I have written, I have to do a lot 
of adjusting to get something that looks even half-decent on the screen.  And 
when I print HTML documents, the result is invariably bad, and often very bad.


However, my point was not about looking good.  It was about being able to see 
the paper in the way that the author intended.  My experience is that this is 
generally possible with PDF, but generally not possible with HTML.  I do write 
papers with considerable math in them, so my experience may not be typical, 
but whenever I have tried to produce HTML versions of my papers, I have ended 
up quite frustrated because even I cannot get them to display the way I want 
them to.


It may be that there are now good tools for producing HTML that carries the 
intent of the author.  htlatex has been mentioned in this thread.  A solution 
that uses htlatex would have the benefit of building on much of the work that 
has been done to make latex a reasonable technology for producing papers.  If 
someone wants to create the necessary infrastructure to make htlatex work as 
well as pdflatex does, then feel free.




Further, why should there be any technical preference for HTML at all?  (Yes,
HTML is an open standard and PDF is a closed one, but is there anything else
besides that?)  Web conference vitally use the web in their reviewing and
publishing processes.  Doesn't that show their allegiance to the web?  Would
the use of HTML make a conference more webby?


PDF is, I think, open these days. But, yes, I do think that conferences
should dog food. I mean, what would you think if W3C produced all of
their documents in PDF? Would that make sense?


Actually, I would have been very happy if W3C had produced all its technical 
documents in PDF.  It would have made my life much easier.



Phil



peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

On 10/06/2014 04:27 AM, Phillip Lord wrote:

[On using htlatex for conferences.]


So, as well as providing a LNCS stylesheet, we'd need a htlatex cf.cfg,
and one CSS and it's done. Be good to have another CSS for on-screen
viewing; LNCS's back of a postage stamp is very poor for that.

Phil


I would be totally astonished if using htlatex as the main way to produce 
conference papers were as simple as this.


I just tried htlatex on my ISWC paper, and the result was, to put it mildly, 
horrible.  (One of my AAAI papers was about the same, the other one caused an 
undefined control sequence and only produced one page of output.)   Several 
parts of the paper were rendered in fixed-width fonts.  There was no attempt 
to limit line length.  Footnotes were in separate files.  Many non-scalable 
images were included, even for simple math.  My carefully designed layout for 
examples was modified in ways that made the examples harder to understand. 
The footnotes did not show up at all in the printed version.


That said, the result was better than I expected.  If someone upgrades htlatex 
to work well I'm quite willing to use it, but I expect that a lot of work is 
going to be needed.


peter



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Kingsley Idehen

On 10/6/14 10:25 AM, Paul Houle wrote:

Frankly I don't see the reason for the hate on PDF files.

I do a lot of reading on a tablet these days because I can take it to 
the gym or on a walk or in the car.  Network reliability is not 
universal when I leave the house (even if I had a $10 a GB LTE plan) 
so downloaded PDFs are my document format of choice.


There might be a lot of hypothetical problems with PDFs,  and I am 
sure there is a better way to view files on a small screen,  but 
practically I have no trouble reading papers from arXiv.org,  books 
from oreilly.com http://oreilly.com,  be these produced by a 
TeX-derived or Word-derived toolchains or a toolchain that involves a 
real page layout tool for that matter.


Paul,

As I see it, the issue here is more to do with PDF being the only 
option, rather than no PDFs at all. Put differently, we are not using 
our horses for course technology (the Web that emerges from AWWW 
exploitation) to produce horses for course conference artifacts. 
Instead, we continue to impose (overtly or covertly) specific options 
that are contradictory, and of diminishing value.


Conferences (associated with themes like Semantic Web and Linked Open 
Data) should accept submissions that provide open access to relevant 
research data. In a sense, imagine if PDFs where submitted without 
bibliographic references. Basically, that's what happening here with 
research data circa. 2014, where we have a functioning Web of Linked 
(Open) Data, which is based on AWWW.


Loosely coupling the print-friendly documents (PDFs, Latex etc.), 
http-browser friendly documents (HTML), and actual raw data references 
(which take the form of 5-Star Linked Open Data ) is a practical staring 
point. Adding experiment workflow (which is also becoming the norm in 
the bio informatics realm) is a nice bonus, as already demonstrated by 
examples provided by Hugh Glaser (see: this weekend's thread).


Kingsley







On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com 
mailto:mdigg...@atmire.com wrote:



On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com
mailto:mdigg...@atmire.com wrote:

Hello Community,

On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis
lmatt...@gmail.com mailto:lmatt...@gmail.com wrote:

On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org
mailto:i...@w3.org wrote:
 The real problem is still the missing tooling. Authors,
even if technically savy like this community, want to do
what they set up to do: write their papers as quickly as
possible. They do not want to spend their time going
through some esoteric CSS massaging, for example. Let us
face it: we are not yet there. The tools for authoring are
still very poor.

But are they still very poor? I mean, I think there are
more tools for
rendering HTML than there are for rendering Latex. In fact
there are
probably more tools for rendering HTML than anything else
out there,
because HTML is used more than anything else. Because HTML
powers the
Web! 



You can write in Word, and export in HTML. You can write
in Markdown
and export in HTML. You can probably write in Latex and
export in HTML
as well :) 



The tools are not the problem. The problem to me is the
printing
afterwords. Conferences/workshops need to print the
publications.
Printing consistent Latex/PDF templates is a lot easier
than printing
inconsistent (layout wise) HTML pages.


There are tools, for example, theres already a bit of work to
provide a plugin for semantic markup in Microsoft Word
(https://ucsdbiolit.codeplex.com/) and similar efforts on the
Latex side (https://trac.kwarc.info/sTeX/)

But, this is not a question of technology available to
authors, but of requirements defined by publishers. If authors
are too busy for this effort, then publishers facilitate that
added value when it is in their best interest.

For example, PLoS has a published format guidelines using Work
and Latex (http://www.plosone.org/static/guidelines), a
workflow for semantically structuring their resulting output
and their final output is well structured and available in XML
based on a known standard
(http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF
and the published HTML on their website

(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is
transformed to HTML



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 However, my point was not about looking good.  It was about being able to see
 the paper in the way that the author intended. 

Yes, I understand this. It's not something that I consider at all
important, which perhaps represents our different view points. Readers
have different preferences. I prefer reading in inverse video; I like to
be able to change font size to zoom in and out. I quite like fixed width
fonts. Other people like the two column thing. Other people want things
read to them.

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


 I do write papers with considerable math in them, so my experience may
 not be typical, but whenever I have tried to produce HTML versions of
 my papers, I have ended up quite frustrated because even I cannot get
 them to display the way I want them to.

I've been using mathjax on my website for a long time and it seems to
work well, although I am not maths heavy.


 It may be that there are now good tools for producing HTML that carries the
 intent of the author.  htlatex has been mentioned in this thread.  A solution
 that uses htlatex would have the benefit of building on much of the work that
 has been done to make latex a reasonable technology for producing papers.  If
 someone wants to create the necessary infrastructure to make htlatex work as
 well as pdflatex does, then feel free.

It's more to make htlatex work as well as lncs.sty works. htlatex
produces reasonable, if dull, HTML of the bat.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Luca Matteis
On Mon, Oct 6, 2014 at 5:29 PM, Phillip Lord
phillip.l...@newcastle.ac.uk wrote:
 Who cares what the authors intend? I mean, they are not reading the
 paper, are they?

Authors might have adjusted things that way specifically to deliver
their message. I think being able to have consistent layouts *as the
authors intend it* is a very important thing. It's also important on
the Web: people want their site to look  feel in a very specific and
consistent way.



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread John Erickson
This is an incredibly rich and interestingly conversation. I think there
are two separate themes:
1. What is required and/or asked-for by the conference organizers...
a. ...that is needed for the review process
b. ...that is needed to implement value-added services for the conference
c. ...that contributes to the body of work

2. What is required and/or asked for by the publisher?

All of (1) is about the meat of the contributions, including establishing
a long-term legacy. (2) is about (presumably) prestigious output.

What added services could esp. Easychair provide that would go beyond 1.a.
and contribute to 1.b. and 1.c., etc.? Are there any Easychair committers
watching this thread? ;)

John

On Mon, Oct 6, 2014 at 11:17 AM, Kingsley Idehen kide...@openlinksw.com
wrote:

  On 10/6/14 10:25 AM, Paul Houle wrote:

 Frankly I don't see the reason for the hate on PDF files.

  I do a lot of reading on a tablet these days because I can take it to
 the gym or on a walk or in the car.  Network reliability is not universal
 when I leave the house (even if I had a $10 a GB LTE plan) so downloaded
 PDFs are my document format of choice.

  There might be a lot of hypothetical problems with PDFs,  and I am sure
 there is a better way to view files on a small screen,  but practically I
 have no trouble reading papers from arXiv.org,  books from oreilly.com,
  be these produced by a TeX-derived or Word-derived toolchains or a
 toolchain that involves a real page layout tool for that matter.


 Paul,

 As I see it, the issue here is more to do with PDF being the only option,
 rather than no PDFs at all. Put differently, we are not using our horses
 for course technology (the Web that emerges from AWWW exploitation) to
 produce horses for course conference artifacts. Instead, we continue to
 impose (overtly or covertly) specific options that are contradictory, and
 of diminishing value.

 Conferences (associated with themes like Semantic Web and Linked Open
 Data) should accept submissions that provide open access to relevant
 research data. In a sense, imagine if PDFs where submitted without
 bibliographic references. Basically, that's what happening here with
 research data circa. 2014, where we have a functioning Web of Linked (Open)
 Data, which is based on AWWW.

 Loosely coupling the print-friendly documents (PDFs, Latex etc.),
 http-browser friendly documents (HTML), and actual raw data references
 (which take the form of 5-Star Linked Open Data ) is a practical staring
 point. Adding experiment workflow (which is also becoming the norm in the
 bio informatics realm) is a nice bonus, as already demonstrated by examples
 provided by Hugh Glaser (see: this weekend's thread).

 Kingsley






 On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com wrote:


 On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote:

 Hello Community,

  On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com
 wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
  The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend their
 time going through some esoteric CSS massaging, for example. Let us face
 it: we are not yet there. The tools for authoring are still very poor.

 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!


 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)


 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.


   There are tools, for example, theres already a bit of work to provide
 a plugin for semantic markup in Microsoft Word (
 https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side
 (https://trac.kwarc.info/sTeX/)

  But, this is not a question of technology available to authors, but of
 requirements defined by publishers. If authors are too busy for this
 effort, then publishers facilitate that added value when it is in their
 best interest.

 For example, PLoS has a published format guidelines using Work and Latex
 (http://www.plosone.org/static/guidelines), a workflow for semantically
 structuring their resulting output and their final output is well
 structured and available in XML based on a known standard (
 http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and
 the published HTML on their website (
 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 I would be totally astonished if using htlatex as the main way to produce
 conference papers were as simple as this.

 I just tried htlatex on my ISWC paper, and the result was, to put it mildly,
 horrible.  (One of my AAAI papers was about the same, the other one caused an
 undefined control sequence and only produced one page of output.)   Several
 parts of the paper were rendered in fixed-width fonts.  There was no attempt
 to limit line length.  Footnotes were in separate files.


The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web. Probably hover
overs would be a reasonable presentation for this.


 Many non-scalable images were included, even for simple math.

It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


 My carefully designed layout for examples was modified in ways that
 made the examples harder to understand. 

Perhaps this is a key difference between us. I don't care about the
layout, and want someone to do it for me; it's one of the reasons I use
latex as well.


 That said, the result was better than I expected.  If someone upgrades htlatex
 to work well I'm quite willing to use it, but I expect that a lot of work is
 going to be needed.

Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.

This is why it is important that web conferences allow HTML, which is
where the argument started. If you want something that prints just
right, PDF is the thing for you. If you you want to read your papers in
the bath, likewise, PDF is the thing for you. And that's fine by me (so
long as you don't mind me reading your papers in the bath!). But it
needs to not be the only option.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 08:38 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

I would be totally astonished if using htlatex as the main way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was, to put it mildly,
horrible.  (One of my AAAI papers was about the same, the other one caused an
undefined control sequence and only produced one page of output.)   Several
parts of the paper were rendered in fixed-width fonts.  There was no attempt
to limit line length.  Footnotes were in separate files.



The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web. Probably hover
overs would be a reasonable presentation for this.



Many non-scalable images were included, even for simple math.


It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure. 
 I don't know what the way to do this right would be, I just know that the 
version of htlatex for Fedora 20 fails to reasonably handle the math in this 
paper.



My carefully designed layout for examples was modified in ways that
made the examples harder to understand.


Perhaps this is a key difference between us. I don't care about the
layout, and want someone to do it for me; it's one of the reasons I use
latex as well.


There are many cases where line breaks and indentation are important for 
understanding.  Getting this sort of presentation right in latex is a pain for 
starters, but when it has been done, having the htlatex toolchain mess it up 
is a failure.



That said, the result was better than I expected.  If someone upgrades htlatex
to work well I'm quite willing to use it, but I expect that a lot of work is
going to be needed.


Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.


Well, I'm with ESWC and ISWC here.  The review process should be designed to 
make reviewing easy for reviewers.  Until viewing HTML output is as 
trouble-free as viewing PDF output, then PDF should be the required format.



This is why it is important that web conferences allow HTML, which is
where the argument started. If you want something that prints just
right, PDF is the thing for you. If you you want to read your papers in
the bath, likewise, PDF is the thing for you. And that's fine by me (so
long as you don't mind me reading your papers in the bath!). But it
needs to not be the only option.


Why?  What are the benefits of HTML reviewing, right now?  What are the 
benefits of HTML publishing, right now?  If there were HTML-based tools that 
worked well for preparing, reviewing, and reading scientific papers, then 
maybe conferences would use them.  However, conference organizers and 
reviewers have limited time, and are thus going for the simplest solution that 
works well.


If some group thinks that a good HTML-based solution is possible, then let 
them produce this solution.  If the group can get pre-approval of some 
conference, then more power to them.  However, I'm not going to vote for any 
pre-approval of some future solution when the current situation is satisficing.



Phil


peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 08:29 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

However, my point was not about looking good.  It was about being able to see
the paper in the way that the author intended.


Yes, I understand this. It's not something that I consider at all
important, which perhaps represents our different view points. Readers
have different preferences. I prefer reading in inverse video; I like to
be able to change font size to zoom in and out. I quite like fixed width
fonts. Other people like the two column thing. Other people want things
read to them.

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having 
different rendering of the paper interfere with the authors' message is 
something that should be avoided at all costs.  Similarly for reading papers, 
if the rendering of the paper interferes with the authors' message, that is a 
failure of the process.



I do write papers with considerable math in them, so my experience may
not be typical, but whenever I have tried to produce HTML versions of
my papers, I have ended up quite frustrated because even I cannot get
them to display the way I want them to.


I've been using mathjax on my website for a long time and it seems to
work well, although I am not maths heavy.



It may be that there are now good tools for producing HTML that carries the
intent of the author.  htlatex has been mentioned in this thread.  A solution
that uses htlatex would have the benefit of building on much of the work that
has been done to make latex a reasonable technology for producing papers.  If
someone wants to create the necessary infrastructure to make htlatex work as
well as pdflatex does, then feel free.


It's more to make htlatex work as well as lncs.sty works. htlatex
produces reasonable, if dull, HTML of the bat.


My experience is that htlatex produces very bad output.


Phil


peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Martynas Jusevičius
Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas

On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:


 On 10/06/2014 08:38 AM, Phillip Lord wrote:

 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 I would be totally astonished if using htlatex as the main way to produce
 conference papers were as simple as this.

 I just tried htlatex on my ISWC paper, and the result was, to put it
 mildly,
 horrible.  (One of my AAAI papers was about the same, the other one
 caused an
 undefined control sequence and only produced one page of output.)
 Several
 parts of the paper were rendered in fixed-width fonts.  There was no
 attempt
 to limit line length.  Footnotes were in separate files.



 The footnote thing is pretty strange, I have to agree. Although
 footnotes are a fairly alien concept wrt to the web. Probably hover
 overs would be a reasonable presentation for this.


 Many non-scalable images were included, even for simple math.


 It does MathML I think, which is then rendered client side. Or you could
 drop math-mode straight through and render client side with mathjax.


 Well, somehow png files are being produced for some math, which is a
 failure.  I don't know what the way to do this right would be, I just know
 that the version of htlatex for Fedora 20 fails to reasonably handle the
 math in this paper.

 My carefully designed layout for examples was modified in ways that
 made the examples harder to understand.


 Perhaps this is a key difference between us. I don't care about the
 layout, and want someone to do it for me; it's one of the reasons I use
 latex as well.


 There are many cases where line breaks and indentation are important for
 understanding.  Getting this sort of presentation right in latex is a pain
 for starters, but when it has been done, having the htlatex toolchain mess
 it up is a failure.

 That said, the result was better than I expected.  If someone upgrades
 htlatex
 to work well I'm quite willing to use it, but I expect that a lot of work
 is
 going to be needed.


 Which gets us back to the chicken and egg situation. I would probably do
 this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
 end up with the PDF output anyway.


 Well, I'm with ESWC and ISWC here.  The review process should be designed to
 make reviewing easy for reviewers.  Until viewing HTML output is as
 trouble-free as viewing PDF output, then PDF should be the required format.

 This is why it is important that web conferences allow HTML, which is
 where the argument started. If you want something that prints just
 right, PDF is the thing for you. If you you want to read your papers in
 the bath, likewise, PDF is the thing for you. And that's fine by me (so
 long as you don't mind me reading your papers in the bath!). But it
 needs to not be the only option.


 Why?  What are the benefits of HTML reviewing, right now?  What are the
 benefits of HTML publishing, right now?  If there were HTML-based tools that
 worked well for preparing, reviewing, and reading scientific papers, then
 maybe conferences would use them.  However, conference organizers and
 reviewers have limited time, and are thus going for the simplest solution
 that works well.

 If some group thinks that a good HTML-based solution is possible, then let
 them produce this solution.  If the group can get pre-approval of some
 conference, then more power to them.  However, I'm not going to vote for any
 pre-approval of some future solution when the current situation is
 satisficing.

 Phil


 peter





Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 It does MathML I think, which is then rendered client side. Or you could
 drop math-mode straight through and render client side with mathjax.

 Well, somehow png files are being produced for some math, which is a failure.

Yeah, you have to tell it to do mathml. The problem is that older
versions of the browsers don't render mathml, and image rendering was
the only option.

 I don't know what the way to do this right would be, I just know that the

 There are many cases where line breaks and indentation are important for
 understanding.  Getting this sort of presentation right in latex is a pain for
 starters, but when it has been done, having the htlatex toolchain mess it up
 is a failure.

Indeed. I believe that there are plans in future versions of HTML to
introduce a pre tag which prefers indentation and line breaks.


 Which gets us back to the chicken and egg situation. I would probably do
 this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
 end up with the PDF output anyway.

 Well, I'm with ESWC and ISWC here.  The review process should be designed to
 make reviewing easy for reviewers.

I *only* use PDF when reviewing. I never use it for viewing anything
else. I only use it for reviewing since I am forced to. 

Experiences differ, so I find this a far from compelling argument.


 This is why it is important that web conferences allow HTML, which is
 where the argument started. 

 Why?  What are the benefits of HTML reviewing, right now?  What are the
 benefits of HTML publishing, right now?

Well, we've been through this before, so I'll not repeat myself.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 Who cares what the authors intend? I mean, they are not reading the
 paper, are they?

 For reviewing, what the authors intend is extremely important.  Having
 different rendering of the paper interfere with the authors' message is
 something that should be avoided at all costs.

Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?

Of course, this is an extreme example, although not an unrealistic one.
It is fundamentally any different from my desire as I get older to be
able to change font size and refill paragraphs with ease. I see a
difference of scale, that is all.


 Similarly for reading papers, if the rendering of the paper interferes
 with the authors' message, that is a failure of the process.

Yes, I agree. Which is why, I believe, that the rendering of a paper
should be up to the reader.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider
It's not hard to query PDFs with SPARQL.  All you have to do is extract the 
metadata from the document and turn it into RDF, if needed.  Lots of programs 
extract and display this metadata already.


No, I don't think that viewing this issue from the reviewer perspective is too 
narrow.  Reviewers form  a vital part of the scientific publishing process. 
Anything that makes their jobs harder or the results that they produce worse 
is going to have to have very large benefits over the current setup.  In any 
case, I haven't been looking at the reviewer perspective only, even in the 
message quoted below.


peter

PS:  This is *not* to say that I think that the reviewing process is anywhere 
near ideal.  On the contrary, I think that the reviewing process has many 
problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas

On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:



On 10/06/2014 08:38 AM, Phillip Lord wrote:


Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


I would be totally astonished if using htlatex as the main way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was, to put it
mildly,
horrible.  (One of my AAAI papers was about the same, the other one
caused an
undefined control sequence and only produced one page of output.)
Several
parts of the paper were rendered in fixed-width fonts.  There was no
attempt
to limit line length.  Footnotes were in separate files.




The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web. Probably hover
overs would be a reasonable presentation for this.



Many non-scalable images were included, even for simple math.



It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.



Well, somehow png files are being produced for some math, which is a
failure.  I don't know what the way to do this right would be, I just know
that the version of htlatex for Fedora 20 fails to reasonably handle the
math in this paper.


My carefully designed layout for examples was modified in ways that
made the examples harder to understand.



Perhaps this is a key difference between us. I don't care about the
layout, and want someone to do it for me; it's one of the reasons I use
latex as well.



There are many cases where line breaks and indentation are important for
understanding.  Getting this sort of presentation right in latex is a pain
for starters, but when it has been done, having the htlatex toolchain mess
it up is a failure.


That said, the result was better than I expected.  If someone upgrades
htlatex
to work well I'm quite willing to use it, but I expect that a lot of work
is
going to be needed.



Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.



Well, I'm with ESWC and ISWC here.  The review process should be designed to
make reviewing easy for reviewers.  Until viewing HTML output is as
trouble-free as viewing PDF output, then PDF should be the required format.


This is why it is important that web conferences allow HTML, which is
where the argument started. If you want something that prints just
right, PDF is the thing for you. If you you want to read your papers in
the bath, likewise, PDF is the thing for you. And that's fine by me (so
long as you don't mind me reading your papers in the bath!). But it
needs to not be the only option.



Why?  What are the benefits of HTML reviewing, right now?  What are the
benefits of HTML publishing, right now?  If there were HTML-based tools that
worked well for preparing, reviewing, and reading scientific papers, then
maybe conferences would use them.  However, conference organizers and
reviewers have limited time, and are thus going for the simplest solution
that works well.

If some group thinks that a good HTML-based solution is possible, then let
them produce this solution.  If the group can get pre-approval of some
conference, then more power to them.  However, I'm not going to vote for any
pre-approval of some future solution when the current situation is
satisficing.


Phil



peter






Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

On 10/06/2014 09:28 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure.


Yeah, you have to tell it to do mathml. The problem is that older
versions of the browsers don't render mathml, and image rendering was
the only option.


Well, then someone is going to have to tell people how to do this.  What I saw 
for htlatex was that it just did the right thing.



I don't know what the way to do this right would be, I just know that the

There are many cases where line breaks and indentation are important for
understanding.  Getting this sort of presentation right in latex is a pain for
starters, but when it has been done, having the htlatex toolchain mess it up
is a failure.


Indeed. I believe that there are plans in future versions of HTML to
introduce a pre tag which prefers indentation and line breaks.



Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.


Well, I'm with ESWC and ISWC here.  The review process should be designed to
make reviewing easy for reviewers.


I *only* use PDF when reviewing. I never use it for viewing anything
else. I only use it for reviewing since I am forced to.

Experiences differ, so I find this a far from compelling argument.


It may not be a compelling argument when choosing between two new 
alternatives, but it is much more compelling argument against change.



This is why it is important that web conferences allow HTML, which is
where the argument started.



Why?  What are the benefits of HTML reviewing, right now?  What are the
benefits of HTML publishing, right now?


Well, we've been through this before, so I'll not repeat myself.

Phil



Yes, and I haven't seen any benefits using the current setup.

peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Martynas Jusevičius
Following the same logic, we still could have been using paper
submissions? All you have to do is to scan them to turn them into
PDFs.

It's been a while since I was in the university, but wasn't
dissemination an important part of science? What about dogfooding
after all?


Martynas

On Mon, Oct 6, 2014 at 6:48 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:
 It's not hard to query PDFs with SPARQL.  All you have to do is extract the
 metadata from the document and turn it into RDF, if needed.  Lots of
 programs extract and display this metadata already.

 No, I don't think that viewing this issue from the reviewer perspective is
 too narrow.  Reviewers form  a vital part of the scientific publishing
 process. Anything that makes their jobs harder or the results that they
 produce worse is going to have to have very large benefits over the current
 setup.  In any case, I haven't been looking at the reviewer perspective
 only, even in the message quoted below.

 peter

 PS:  This is *not* to say that I think that the reviewing process is
 anywhere near ideal.  On the contrary, I think that the reviewing process
 has many problems, particularly as it is performed in CS conferences.



 On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

 Dear Peter,

 please show me how to query PDFs with SPARQL. Then I'll believe there
 are no benefits of XHTML+RDFa over PDF.

 Addressing the issue from the reviewer perspective only is too narrow,
 don't you think?


 Martynas

 On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
 pfpschnei...@gmail.com wrote:



 On 10/06/2014 08:38 AM, Phillip Lord wrote:


 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


 I would be totally astonished if using htlatex as the main way to
 produce
 conference papers were as simple as this.

 I just tried htlatex on my ISWC paper, and the result was, to put it
 mildly,
 horrible.  (One of my AAAI papers was about the same, the other one
 caused an
 undefined control sequence and only produced one page of output.)
 Several
 parts of the paper were rendered in fixed-width fonts.  There was no
 attempt
 to limit line length.  Footnotes were in separate files.




 The footnote thing is pretty strange, I have to agree. Although
 footnotes are a fairly alien concept wrt to the web. Probably hover
 overs would be a reasonable presentation for this.


 Many non-scalable images were included, even for simple math.



 It does MathML I think, which is then rendered client side. Or you could
 drop math-mode straight through and render client side with mathjax.



 Well, somehow png files are being produced for some math, which is a
 failure.  I don't know what the way to do this right would be, I just
 know
 that the version of htlatex for Fedora 20 fails to reasonably handle the
 math in this paper.

 My carefully designed layout for examples was modified in ways that
 made the examples harder to understand.



 Perhaps this is a key difference between us. I don't care about the
 layout, and want someone to do it for me; it's one of the reasons I use
 latex as well.



 There are many cases where line breaks and indentation are important for
 understanding.  Getting this sort of presentation right in latex is a
 pain
 for starters, but when it has been done, having the htlatex toolchain
 mess
 it up is a failure.

 That said, the result was better than I expected.  If someone upgrades
 htlatex
 to work well I'm quite willing to use it, but I expect that a lot of
 work
 is
 going to be needed.



 Which gets us back to the chicken and egg situation. I would probably do
 this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
 end up with the PDF output anyway.



 Well, I'm with ESWC and ISWC here.  The review process should be designed
 to
 make reviewing easy for reviewers.  Until viewing HTML output is as
 trouble-free as viewing PDF output, then PDF should be the required
 format.

 This is why it is important that web conferences allow HTML, which is
 where the argument started. If you want something that prints just
 right, PDF is the thing for you. If you you want to read your papers in
 the bath, likewise, PDF is the thing for you. And that's fine by me (so
 long as you don't mind me reading your papers in the bath!). But it
 needs to not be the only option.



 Why?  What are the benefits of HTML reviewing, right now?  What are the
 benefits of HTML publishing, right now?  If there were HTML-based tools
 that
 worked well for preparing, reviewing, and reading scientific papers, then
 maybe conferences would use them.  However, conference organizers and
 reviewers have limited time, and are thus going for the simplest solution
 that works well.

 If some group thinks that a good HTML-based solution is possible, then
 let
 them produce this solution.  If the group can get pre-approval of some
 conference, then more power to them.  However, I'm not going to vote for
 any
 pre-approval of some future 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about.  I was talking about interfering 
with the authors' message via changes from the rendering that the authors' set up.



Of course, this is an extreme example, although not an unrealistic one.
It is fundamentally any different from my desire as I get older to be
able to change font size and refill paragraphs with ease. I see a
difference of scale, that is all.


I see these as completely different.  There are some aspects of rendering that 
generally do not interfere with intent.  There are other aspects of rendering 
that can easily interfere with intent.



Similarly for reading papers, if the rendering of the paper interferes
with the authors' message, that is a failure of the process.


Yes, I agree. Which is why, I believe, that the rendering of a paper
should be up to the reader


As this is why I believe that the authors' should be able to specify the 
rendering of their paper to the extent that they feel is needed to convey the 
intent of the paper.

.

Phil


peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Alexander Garcia Castro
I would be much more generic here,

show me how to query a bunch of PDFs with anything... of course, the answer
will go like you can extract the text and do A and the B and then get a
relatively decent text depending on A B and C. then someone else will
chime in and say and this is just because people dont know how to generate
PDFs, if one generates a PDF using ADOBE tools like A B and C then the PDF
will be perfect for text mining and bla bla bla

PDF is ok for a consistent layout, HTML is great for what it was created.
but neither of those formats, AFAIK were conceived, engineered for
scientific papers, executable, self describing, embedded within the web of
data, etc.

On Mon, Oct 6, 2014 at 9:19 AM, Martynas Jusevičius marty...@graphity.org
wrote:

 Dear Peter,

 please show me how to query PDFs with SPARQL. Then I'll believe there
 are no benefits of XHTML+RDFa over PDF.

 Addressing the issue from the reviewer perspective only is too narrow,
 don't you think?


 Martynas

 On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
 pfpschnei...@gmail.com wrote:
 
 
  On 10/06/2014 08:38 AM, Phillip Lord wrote:
 
  Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 
  I would be totally astonished if using htlatex as the main way to
 produce
  conference papers were as simple as this.
 
  I just tried htlatex on my ISWC paper, and the result was, to put it
  mildly,
  horrible.  (One of my AAAI papers was about the same, the other one
  caused an
  undefined control sequence and only produced one page of output.)
  Several
  parts of the paper were rendered in fixed-width fonts.  There was no
  attempt
  to limit line length.  Footnotes were in separate files.
 
 
 
  The footnote thing is pretty strange, I have to agree. Although
  footnotes are a fairly alien concept wrt to the web. Probably hover
  overs would be a reasonable presentation for this.
 
 
  Many non-scalable images were included, even for simple math.
 
 
  It does MathML I think, which is then rendered client side. Or you could
  drop math-mode straight through and render client side with mathjax.
 
 
  Well, somehow png files are being produced for some math, which is a
  failure.  I don't know what the way to do this right would be, I just
 know
  that the version of htlatex for Fedora 20 fails to reasonably handle the
  math in this paper.
 
  My carefully designed layout for examples was modified in ways that
  made the examples harder to understand.
 
 
  Perhaps this is a key difference between us. I don't care about the
  layout, and want someone to do it for me; it's one of the reasons I use
  latex as well.
 
 
  There are many cases where line breaks and indentation are important for
  understanding.  Getting this sort of presentation right in latex is a
 pain
  for starters, but when it has been done, having the htlatex toolchain
 mess
  it up is a failure.
 
  That said, the result was better than I expected.  If someone upgrades
  htlatex
  to work well I'm quite willing to use it, but I expect that a lot of
 work
  is
  going to be needed.
 
 
  Which gets us back to the chicken and egg situation. I would probably do
  this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
  end up with the PDF output anyway.
 
 
  Well, I'm with ESWC and ISWC here.  The review process should be
 designed to
  make reviewing easy for reviewers.  Until viewing HTML output is as
  trouble-free as viewing PDF output, then PDF should be the required
 format.
 
  This is why it is important that web conferences allow HTML, which is
  where the argument started. If you want something that prints just
  right, PDF is the thing for you. If you you want to read your papers in
  the bath, likewise, PDF is the thing for you. And that's fine by me (so
  long as you don't mind me reading your papers in the bath!). But it
  needs to not be the only option.
 
 
  Why?  What are the benefits of HTML reviewing, right now?  What are the
  benefits of HTML publishing, right now?  If there were HTML-based tools
 that
  worked well for preparing, reviewing, and reading scientific papers, then
  maybe conferences would use them.  However, conference organizers and
  reviewers have limited time, and are thus going for the simplest solution
  that works well.
 
  If some group thinks that a good HTML-based solution is possible, then
 let
  them produce this solution.  If the group can get pre-approval of some
  conference, then more power to them.  However, I'm not going to vote for
 any
  pre-approval of some future solution when the current situation is
  satisficing.
 
  Phil
 
 
  peter
 
 




-- 
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Alexander Garcia Castro
It's not hard to query PDFs with SPARQL.  All you have to do is extract
the metadata from the document and turn it into RDF, if needed.  Lots of
programs extract and display this metadata already.

in the age of the web of data why should I restrict my search just to
metadata? I want the full content, open access or not once I have the
document I should be able to mine the content of the document. I dont want
to limit my search just to simple metadata.

On Mon, Oct 6, 2014 at 9:48 AM, Peter F. Patel-Schneider 
pfpschnei...@gmail.com wrote:

 It's not hard to query PDFs with SPARQL.  All you have to do is extract
 the metadata from the document and turn it into RDF, if needed.  Lots of
 programs extract and display this metadata already.

 No, I don't think that viewing this issue from the reviewer perspective is
 too narrow.  Reviewers form  a vital part of the scientific publishing
 process. Anything that makes their jobs harder or the results that they
 produce worse is going to have to have very large benefits over the current
 setup.  In any case, I haven't been looking at the reviewer perspective
 only, even in the message quoted below.

 peter

 PS:  This is *not* to say that I think that the reviewing process is
 anywhere near ideal.  On the contrary, I think that the reviewing process
 has many problems, particularly as it is performed in CS conferences.



 On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

 Dear Peter,

 please show me how to query PDFs with SPARQL. Then I'll believe there
 are no benefits of XHTML+RDFa over PDF.

 Addressing the issue from the reviewer perspective only is too narrow,
 don't you think?


 Martynas

 On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
 pfpschnei...@gmail.com wrote:



 On 10/06/2014 08:38 AM, Phillip Lord wrote:


 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


 I would be totally astonished if using htlatex as the main way to
 produce
 conference papers were as simple as this.

 I just tried htlatex on my ISWC paper, and the result was, to put it
 mildly,
 horrible.  (One of my AAAI papers was about the same, the other one
 caused an
 undefined control sequence and only produced one page of output.)
 Several
 parts of the paper were rendered in fixed-width fonts.  There was no
 attempt
 to limit line length.  Footnotes were in separate files.




 The footnote thing is pretty strange, I have to agree. Although
 footnotes are a fairly alien concept wrt to the web. Probably hover
 overs would be a reasonable presentation for this.


  Many non-scalable images were included, even for simple math.



 It does MathML I think, which is then rendered client side. Or you could
 drop math-mode straight through and render client side with mathjax.



 Well, somehow png files are being produced for some math, which is a
 failure.  I don't know what the way to do this right would be, I just
 know
 that the version of htlatex for Fedora 20 fails to reasonably handle the
 math in this paper.

  My carefully designed layout for examples was modified in ways that
 made the examples harder to understand.



 Perhaps this is a key difference between us. I don't care about the
 layout, and want someone to do it for me; it's one of the reasons I use
 latex as well.



 There are many cases where line breaks and indentation are important for
 understanding.  Getting this sort of presentation right in latex is a
 pain
 for starters, but when it has been done, having the htlatex toolchain
 mess
 it up is a failure.

  That said, the result was better than I expected.  If someone upgrades
 htlatex
 to work well I'm quite willing to use it, but I expect that a lot of
 work
 is
 going to be needed.



 Which gets us back to the chicken and egg situation. I would probably do
 this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
 end up with the PDF output anyway.



 Well, I'm with ESWC and ISWC here.  The review process should be
 designed to
 make reviewing easy for reviewers.  Until viewing HTML output is as
 trouble-free as viewing PDF output, then PDF should be the required
 format.

  This is why it is important that web conferences allow HTML, which is
 where the argument started. If you want something that prints just
 right, PDF is the thing for you. If you you want to read your papers in
 the bath, likewise, PDF is the thing for you. And that's fine by me (so
 long as you don't mind me reading your papers in the bath!). But it
 needs to not be the only option.



 Why?  What are the benefits of HTML reviewing, right now?  What are the
 benefits of HTML publishing, right now?  If there were HTML-based tools
 that
 worked well for preparing, reviewing, and reading scientific papers, then
 maybe conferences would use them.  However, conference organizers and
 reviewers have limited time, and are thus going for the simplest solution
 that works well.

 If some group thinks that a good HTML-based solution is possible, then
 let
 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 On 10/06/2014 09:28 AM, Phillip Lord wrote:
 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 It does MathML I think, which is then rendered client side. Or you could
 drop math-mode straight through and render client side with mathjax.

 Well, somehow png files are being produced for some math, which is a 
 failure.

 Yeah, you have to tell it to do mathml. The problem is that older
 versions of the browsers don't render mathml, and image rendering was
 the only option.

 Well, then someone is going to have to tell people how to do this.  What I saw
 for htlatex was that it just did the right thing.


So, htlatex is part of TeX4Ht which does HTML. 

If you do xhmlatex then you get XHTML with, indeed, math mode in MathML.
So, for example, this output comes with the default xhmlatex.

math 
 xmlns=http://www.w3.org/1998/Math/MathML;  
display=inline mi 
e/mi mo 
class=MathClass-rel=/mo mi 
m/mimsupmrow 
mi 
c/mi/mrowmrow 
mn2/mn/mrow/msup 
/math

tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider
I don't think that scanning a printout retains any metadata that was in the 
electronic source so, no, this would not follow using the same logic.


I do agree that dissemination of results is one of the most important parts of 
the scientific process.  The argument here is, I think, what is the best way 
to support dissemination.


Eating your own dog food, is a separate matter, I think.  Eating your own dog 
good may help with uptake, but on the other hand it may interfere with 
dissemination, by making preparation of papers harder or making them harder to 
review or read.


peter




On 10/06/2014 10:09 AM, Martynas Jusevičius wrote:

Following the same logic, we still could have been using paper
submissions? All you have to do is to scan them to turn them into
PDFs.

It's been a while since I was in the university, but wasn't
dissemination an important part of science? What about dogfooding
after all?


Martynas

On Mon, Oct 6, 2014 at 6:48 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed.  Lots of
programs extract and display this metadata already.

No, I don't think that viewing this issue from the reviewer perspective is
too narrow.  Reviewers form  a vital part of the scientific publishing
process. Anything that makes their jobs harder or the results that they
produce worse is going to have to have very large benefits over the current
setup.  In any case, I haven't been looking at the reviewer perspective
only, even in the message quoted below.

peter

PS:  This is *not* to say that I think that the reviewing process is
anywhere near ideal.  On the contrary, I think that the reviewing process
has many problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:


Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas




[...]



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Phillip Lord
Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

 On 10/06/2014 09:32 AM, Phillip Lord wrote:
 Peter F. Patel-Schneider pfpschnei...@gmail.com writes:
 Who cares what the authors intend? I mean, they are not reading the
 paper, are they?

 For reviewing, what the authors intend is extremely important.  Having
 different rendering of the paper interfere with the authors' message is
 something that should be avoided at all costs.

 Really? So, for example, you think that a reviewer with impared vision
 should, for example, be forced to review a paper using the authors
 rendering, regardless of whether they can read it or not?

 No, but this is not what I was talking about. I was talking about
 interfering with the authors' message via changes from the rendering
 that the authors' set up.

It *is* exactly what you are talking about. If I want to render your
document to speech, then why should I not? What I am saying is that,
you, the author, should not wish to constrain the rendering, only really
the content. Effectively, if you are using latex, you are already doing
this, since latex defines the layout and not you.

But, I think we are talking in too abstract a term here. Should you be
able to constrain indentation for code blocks? Yes, of course, you
should. But, a quick look at the web shows that people do this all the
time.


 Similarly for reading papers, if the rendering of the paper interferes
 with the authors' message, that is a failure of the process.

 Yes, I agree. Which is why, I believe, that the rendering of a paper
 should be up to the reader

 As this is why I believe that the authors' should be able to specify the
 rendering of their paper to the extent that they feel is needed to convey the
 intent of the paper.

For scientific papers, I think this really is not very far. I mean, a
scientific paper is not a fashion store; it's a story designed to
persuade with data. 

I would like to see papers which are in the hands of the reader as much
as possible. Citation format should be for the reader. Math
presentation. Graphs should be interactive and zoomable, with the data
underneath as CSV. 

All of these are possible and routine with HTML now. I want to be free
to choose the organisation of my papers so that I can convey what I
want. At the moment, I cannot. The PDF is not reasonable for all, maybe
not even most of this. But some.

Phil



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider
Sure.  So extract the text from the PDF and query that.  It also would be nice 
to have access to the LaTeX sources.


What HTML publishing *might* have that is better than the above is to more 
easily embed some extra information into papers that can be queried.  Is this 
just metadata that could also be easily injected into PDFs?  If given this 
capability will a significant number of authors use it?  Is it instead better 
to have a separate document that has the information and not use HTML for 
publishing?


peter




On 10/06/2014 10:42 AM, Alexander Garcia Castro wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed.  Lots of programs
extract and display this metadata already.

in the age of the web of data why should I restrict my search just to
metadata? I want the full content, open access or not once I have the document
I should be able to mine the content of the document. I dont want to limit my
search just to simple metadata.

On Mon, Oct 6, 2014 at 9:48 AM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract
the metadata from the document and turn it into RDF, if needed.  Lots of
programs extract and display this metadata already.

No, I don't think that viewing this issue from the reviewer perspective is
too narrow.  Reviewers form  a vital part of the scientific publishing
process. Anything that makes their jobs harder or the results that they
produce worse is going to have to have very large benefits over the
current setup.  In any case, I haven't been looking at the reviewer
perspective only, even in the message quoted below.

peter

PS:  This is *not* to say that I think that the reviewing process is
anywhere near ideal.  On the contrary, I think that the reviewing process
has many problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas

On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:



On 10/06/2014 08:38 AM, Phillip Lord wrote:


Peter F. Patel-Schneider pfpschnei...@gmail.com
mailto:pfpschnei...@gmail.com writes:


I would be totally astonished if using htlatex as the main
way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was,
to put it
mildly,
horrible.  (One of my AAAI papers was about the same, the
other one
caused an
undefined control sequence and only produced one page of
output.)
Several
parts of the paper were rendered in fixed-width fonts.
There was no
attempt
to limit line length.  Footnotes were in separate files.




The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web.
Probably hover
overs would be a reasonable presentation for this.


Many non-scalable images were included, even for simple 
math.



It does MathML I think, which is then rendered client side. Or
you could
drop math-mode straight through and render client side with
mathjax.



Well, somehow png files are being produced for some math, which is a
failure.  I don't know what the way to do this right would be, I
just know
that the version of htlatex for Fedora 20 fails to reasonably
handle the
math in this paper.

My carefully designed layout for examples was modified in
ways that
made the examples harder to understand.



Perhaps this is a key difference between us. I don't care
about the
layout, and want someone to do it for me; it's one of the
reasons I use
latex as well.



There are many cases where line breaks and indentation are
important for
understanding.  Getting this sort of presentation right in latex
is a pain
for starters, but when it has been done, 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Kingsley Idehen

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
It's not hard to query PDFs with SPARQL.  All you have to do is 
extract the metadata from the document and turn it into RDF, if 
needed. Lots of programs extract and display this metadata already. 


Peter,

Having had 200+ (some-non-rdf-doc} to RDF document transformers built 
under my direct guidance, there are issues with your claim above:


1. The extractors are platform specific -- AWWW is about platform 
agnosticism (I don't want to mandate an OS for experiencing the power of 
Linked Open Data transformers / rdfizers)


2. It isn't solely about metadata  -- we also have raw data inside these 
documents confined to Tables, paragraphs of sentences


3. If querying a PDF was marginally simple, I would be demonstrating 
that using a SPARQL results URL in response to this post :-)


Possible != Simple and Productive.

We want to leverage the productivity and simplicity that AWWW brings to 
data representation, access, interaction, and integration.


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 10:44 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:28 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure.


Yeah, you have to tell it to do mathml. The problem is that older
versions of the browsers don't render mathml, and image rendering was
the only option.


Well, then someone is going to have to tell people how to do this.  What I saw
for htlatex was that it just did the right thing.



So, htlatex is part of TeX4Ht which does HTML.

If you do xhmlatex then you get XHTML with, indeed, math mode in MathML.
So, for example, this output comes with the default xhmlatex.

math
  xmlns=http://www.w3.org/1998/Math/MathML;
display=inline mi

e/mi mo

class=MathClass-rel=/mo mi

m/mimsupmrow
mi
c/mi/mrowmrow
mn2/mn/mrow/msup
/math


tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then, 
how can it be required for conferences?


I have tex4ht installed, but there is no xhmlatex file to be found.  I managed 
to find what appears to be a good command line


htlatex schema-org-analysis.tex xhtml,mathml  -cunihtf -cvalidate

This looks better when viewed, but the resultant HTML is unintelligible.

There is definitely more work needed here before this can be considered as a 
potential solution.


peter



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about.


Well, maybe I was not being clear, but I thought that I was talking about 
rendering  changes interfering with comprehension of the authors' intent.


peter

[...]







Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about. If I want to render your
document to speech, then why should I not? What I am saying is that,
you, the author, should not wish to constrain the rendering, only really
the content. Effectively, if you are using latex, you are already doing
this, since latex defines the layout and not you.

But, I think we are talking in too abstract a term here. Should you be
able to constrain indentation for code blocks? Yes, of course, you
should. But, a quick look at the web shows that people do this all the
time.


Sure, and htlatex appears to interfere with this indentation. At least it does 
in my ISWC paper.



Similarly for reading papers, if the rendering of the paper interferes
with the authors' message, that is a failure of the process.


Yes, I agree. Which is why, I believe, that the rendering of a paper
should be up to the reader


As this is why I believe that the authors' should be able to specify the
rendering of their paper to the extent that they feel is needed to convey the
intent of the paper.


For scientific papers, I think this really is not very far. I mean, a
scientific paper is not a fashion store; it's a story designed to
persuade with data.

I would like to see papers which are in the hands of the reader as much
as possible. Citation format should be for the reader. Math
presentation. Graphs should be interactive and zoomable, with the data
underneath as CSV.

All of these are possible and routine with HTML now. I want to be free
to choose the organisation of my papers so that I can convey what I
want. At the moment, I cannot. The PDF is not reasonable for all, maybe
not even most of this. But some.

Phil


So, you believe that there is an excellent set of tools for preparing, 
reviewing, and reading scientific publishing.


Package them up and make them widely available.  If they are good, people will 
use them.


Convince those who run conferences.  If these people are convinced, then they 
will allow their use in conferences or maybe even require their use.


I'm not convinced by what I'm seeing right now, however.

peter




Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider



On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed. Lots of programs
extract and display this metadata already.


Peter,

Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my
direct guidance, there are issues with your claim above:


Huh?  Every single PDF reader that I use can extract the PDF metadata and 
display it.  The metadata that I see in PDF documents uses a core set of 
properties that are easy to transform into RDF.  Of course, this core set is 
very small (title, author, and a few other things) so you don't get all that 
much out of the core set.




1. The extractors are platform specific -- AWWW is about platform agnosticism
(I don't want to mandate an OS for experiencing the power of Linked Open Data
transformers / rdfizers)


Well, the extractors would be specific to PDF, but that's hardly surprising, I 
think.



2. It isn't solely about metadata  -- we also have raw data inside these
documents confined to Tables, paragraphs of sentences


Well, sure, but is extracting information directly from the figures or tables 
or text being considered here?  I sure would like this to be possible.  How 
would it work in an HTML context?



3. If querying a PDF was marginally simple, I would be demonstrating that
using a SPARQL results URL in response to this post :-)


I'm not saying that it is so simple.  You do have to find the metadata block 
in the PDF and then look for the /Title, /Author, ... stuff.



Possible != Simple and Productive.


Yes, but there are lots of tools that display PDF metadata, so there are some 
who believe that the benefit is greater than the cost.



We want to leverage the productivity and simplicity that AWWW brings to data
representation, access, interaction, and integration.


Sure, but the additional costs, if any, on paper authors, reviewers, and 
readers have to be considered.  If these costs are eliminated or at least 
minimized then this good is much more likely to be realized.


peter






Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Ivan Shmakov
 Luca Matteis lmatt...@gmail.com writes:
 On Mon, Oct 6, 2014 at 5:29 PM, Phillip Lord wrote:

  Who cares what the authors intend?  I mean, they are not reading the
  paper, are they?

  Authors might have adjusted things that way specifically to deliver
  their message.  I think being able to have consistent layouts *as the
  authors intend it* is a very important thing.  It's also important on
  the Web: people want their site to look  feel in a very specific and
  consistent way.

Well, it’s also why we now have the things like the Stylish and
Greasemonkey add-ons for Firefox, and the http://userstyles.org/
resource on the Web (not to mention the whole world of “unusual”
Web browsers, such as Lynx.)  That is: the /readers/ too want to
tailor that “look and feel” to /their/ tastes, to get rid of the
poor design choices of the Web publishers, – and to thus improve
their “Web reading experience.”

-- 
FSF associate member #7257  http://boycottsystemd.org/  … 3013 B6A0 230E 334A



Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Kingsley Idehen

On 10/6/14 2:19 PM, Alexander Garcia Castro wrote:
querying PDFs is NOT simple and requires a lot of work -and usually 
produces lots of errors.


Yes, I believe I indicated that in my response to Peter i.e., it isn't 
simple or productive.



just querying metadata is not enough.


Yes, I said that too i.e., we want access to raw data.

As I said before, I understand the PDF as something that gives me a 
uniform layout. that is ok and necessary, but not enough or sufficient 
within the context of the web of data and scientific publications. I 
would like to have the content readily available for mining purposes. 
if I pay for the publication I should get access to the publication in 
every format it is available. the content should be presented in a way 
so that it makes sense within the web of data.  if it is the full 
content of the paper represented in RDF or XML fine. also, I would 
like to have well annotated content, this is simple and something that 
could quite easily be part of existing publication workflows. it may 
also be part of the guidelines for authors -for instance, identify and 
annotate rhetorical structures.


Modulo any confusing typos in my earlier posts, I don't see where we are 
disagreeing :-)



Kingsley


On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen 
kide...@openlinksw.com mailto:kide...@openlinksw.com wrote:


On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

It's not hard to query PDFs with SPARQL.  All you have to do
is extract the metadata from the document and turn it into
RDF, if needed. Lots of programs extract and display this
metadata already.


Peter,

Having had 200+ (some-non-rdf-doc} to RDF document transformers
built under my direct guidance, there are issues with your claim
above:

1. The extractors are platform specific -- AWWW is about platform
agnosticism (I don't want to mandate an OS for experiencing the
power of Linked Open Data transformers / rdfizers)

2. It isn't solely about metadata  -- we also have raw data inside
these documents confined to Tables, paragraphs of sentences

3. If querying a PDF was marginally simple, I would be
demonstrating that using a SPARQL results URL in response to this
post :-)

Possible != Simple and Productive.

We want to leverage the productivity and simplicity that AWWW
brings to data representation, access, interaction, and integration.


-- 
Regards,


Kingsley Idehen
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
http://www.openlinksw.com/blog/%7Ekidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID:
http://kingsley.idehen.net/dataspace/person/kidehen#this





--
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac




--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Norman Gray

Greetings.

On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote:

 querying PDFs is NOT simple and requires a lot of work -and usually
 produces lots of errors. just querying metadata is not enough. As I said
 before, I understand the PDF as something that gives me a uniform layout.
 that is ok and necessary, but not enough or sufficient within the context
 of the web of data and scientific publications. I would like to have the
 content readily available for mining purposes. if I pay for the publication
 I should get access to the publication in every format it is available. the
 content should be presented in a way so that it makes sense within the web
 of data.  if it is the full content of the paper represented in RDF or XML
 fine. also, I would like to have well annotated content, this is simple and
 something that could quite easily be part of existing publication
 workflows. it may also be part of the guidelines for authors -for instance,
 identify and annotate rhetorical structures.


The following might add something to this conversation.

It illustrates getting the metadata from a LaTeX file, putting it into an XMP 
packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of 
/Author, /Title, etc, this just focuses on the XMP packet.

This has the document metadata, the abstract, and an illustrative bit of 
argumentation.  Adding details about the document structure, and (RDF) pointers 
to any figures would be feasible, as would, I suspect, incorporating CSV files 
directly into the PDF.  Incorporating \begin{tabular} tables would be rather 
tricky, but not impossible.  I can't help feeling that the XHTML+RDFa 
equivalent would be longer and need more documentation to instruct the author 
where to put the RDFa magic.

It's not very fancy, and still has rough edges, but it only took me 100 
minutes, from a standing start.

Generating and querying this PDF seems pretty simple to me.



$ cat test-xmp.tex
\documentclass{article}

\usepackage{xmp-management}

\title{This is a test file}
\author{Norman Gray}
\date{2014 October 6}

\begin{document}

\maketitle

\abstract{It's easy to include metadata in \LaTeX\ files.

That's because there's plenty of metadata in there already.}

There is text and metatext within files.

\section{Further details}

In this section we could potentially discuss moving information
around.  I think we can assert that \claim{it is easy to move
  information around}, and, further, that \claim{making metadata
  readily available is a Good Thing}.  I hope that clears that up.
\end{document}
$ cat xmp-management.sty 
\ProvidesPackage{xmp-management}[2014/10/06]

\newwrite\xmp@ttlfile
\def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl
  \let\xmp@open\relax}
\long\def\xmp@stmt#1#2{%
  \xmp@open
  \write\xmp@ttlfile{ #1 #2.}}
\let\xmp@origtitle\title
\def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}}
\let\xmp@origauthor\author
\def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}}
\let\xmp@origdate\date
\def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}}

\long\def\abstract#1{
  \xmp@stmt{dc:abstract}{#1}
  \begin{quotation}\textbf{Abstract:} #1\end{quotation}}
\def\claim#1{
  \xmp@stmt{xmpinfo:claim}{#1}
  \emph{#1}}

\let\xmp@origsection\section
\def\section#1{\xmp@stmt{xmpinfo:has_section}{#1}
  \xmp@origsection{#1}}

\usepackage{xmpincl}
\AtBeginDocument{\includexmp{info}}
$ pdflatex test-xmp 
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
 restricted \write18 enabled.
entering extended mode
(./test-xmp.tex
LaTeX2e 2011/06/27
[...BLAH...]
Output written on test-xmp.pdf (1 page, 75667 bytes).
Transcript written on test-xmp.log.
$ cat test-xmp.ttl
 dc:title This is a test file.
 dc:creator Norman Gray.
 dc:created 2014 October 6.
 dc:abstract It's easy to include metadata in \LaTeX  \ files. \par That's 
because there's plenty of metadata in there already..
 xmpinfo:has_section Further details.
 xmpinfo:claim it is easy to move information around.
 xmpinfo:claim making metadata readily available is a Good Thing.
$ make info.xmp
sed 's/\\//g' test-xmp.ttl | \
  cat prefix.ttl - | \
  rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \
  sed '/\?xpacket/d' info.xmp.tmp  mv info.xmp.tmp info.xmp
$ pdflatex test-xmp 
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
 restricted \write18 enabled.
entering extended mode
(./test-xmp.tex
LaTeX2e 2011/06/27
[...BLAH...]
Output written on test-xmp.pdf (1 page, 77069 bytes).
Transcript written on test-xmp.log.
$ make extract-xmp   
cc -Wall -o extract-xmp extract-xmp.c
$ ./extract-xmp test-xmp.pdf
rdf:RDF xmlns:cc=http://creativecommons.org/ns#; 
xmlns:dc=http://purl.org/dc/elements/1.1/; 
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; 
xmlns:xapRights=http://ns.adobe.com/xap/1.0/rights/; 
xmlns:xmpinfo=http://example.org/xmpinfo; 
xml:base=file:test-xmp.pdf 
rdf:Description rdf:about= 
cc:license 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Luca Matteis
Sorry to jump into this once again but when it comes to typesetting
nothing really comes close to Latex/PDF:
http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
not even HTML/CSS/JavaScript

On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray nor...@astro.gla.ac.uk wrote:

 Greetings.

 On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com 
 wrote:

 querying PDFs is NOT simple and requires a lot of work -and usually
 produces lots of errors. just querying metadata is not enough. As I said
 before, I understand the PDF as something that gives me a uniform layout.
 that is ok and necessary, but not enough or sufficient within the context
 of the web of data and scientific publications. I would like to have the
 content readily available for mining purposes. if I pay for the publication
 I should get access to the publication in every format it is available. the
 content should be presented in a way so that it makes sense within the web
 of data.  if it is the full content of the paper represented in RDF or XML
 fine. also, I would like to have well annotated content, this is simple and
 something that could quite easily be part of existing publication
 workflows. it may also be part of the guidelines for authors -for instance,
 identify and annotate rhetorical structures.


 The following might add something to this conversation.

 It illustrates getting the metadata from a LaTeX file, putting it into an XMP 
 packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention 
 of /Author, /Title, etc, this just focuses on the XMP packet.

 This has the document metadata, the abstract, and an illustrative bit of 
 argumentation.  Adding details about the document structure, and (RDF) 
 pointers to any figures would be feasible, as would, I suspect, incorporating 
 CSV files directly into the PDF.  Incorporating \begin{tabular} tables would 
 be rather tricky, but not impossible.  I can't help feeling that the 
 XHTML+RDFa equivalent would be longer and need more documentation to instruct 
 the author where to put the RDFa magic.

 It's not very fancy, and still has rough edges, but it only took me 100 
 minutes, from a standing start.

 Generating and querying this PDF seems pretty simple to me.

 

 $ cat test-xmp.tex
 \documentclass{article}

 \usepackage{xmp-management}

 \title{This is a test file}
 \author{Norman Gray}
 \date{2014 October 6}

 \begin{document}

 \maketitle

 \abstract{It's easy to include metadata in \LaTeX\ files.

 That's because there's plenty of metadata in there already.}

 There is text and metatext within files.

 \section{Further details}

 In this section we could potentially discuss moving information
 around.  I think we can assert that \claim{it is easy to move
   information around}, and, further, that \claim{making metadata
   readily available is a Good Thing}.  I hope that clears that up.
 \end{document}
 $ cat xmp-management.sty
 \ProvidesPackage{xmp-management}[2014/10/06]

 \newwrite\xmp@ttlfile
 \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl
   \let\xmp@open\relax}
 \long\def\xmp@stmt#1#2{%
   \xmp@open
   \write\xmp@ttlfile{ #1 #2.}}
 \let\xmp@origtitle\title
 \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}}
 \let\xmp@origauthor\author
 \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}}
 \let\xmp@origdate\date
 \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}}

 \long\def\abstract#1{
   \xmp@stmt{dc:abstract}{#1}
   \begin{quotation}\textbf{Abstract:} #1\end{quotation}}
 \def\claim#1{
   \xmp@stmt{xmpinfo:claim}{#1}
   \emph{#1}}

 \let\xmp@origsection\section
 \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1}
   \xmp@origsection{#1}}

 \usepackage{xmpincl}
 \AtBeginDocument{\includexmp{info}}
 $ pdflatex test-xmp
 This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
  restricted \write18 enabled.
 entering extended mode
 (./test-xmp.tex
 LaTeX2e 2011/06/27
 [...BLAH...]
 Output written on test-xmp.pdf (1 page, 75667 bytes).
 Transcript written on test-xmp.log.
 $ cat test-xmp.ttl
  dc:title This is a test file.
  dc:creator Norman Gray.
  dc:created 2014 October 6.
  dc:abstract It's easy to include metadata in \LaTeX  \ files. \par 
 That's because there's plenty of metadata in there already..
  xmpinfo:has_section Further details.
  xmpinfo:claim it is easy to move information around.
  xmpinfo:claim making metadata readily available is a Good Thing.
 $ make info.xmp
 sed 's/\\//g' test-xmp.ttl | \
   cat prefix.ttl - | \
   rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \
   sed '/\?xpacket/d' info.xmp.tmp  mv info.xmp.tmp info.xmp
 $ pdflatex test-xmp
 This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
  restricted \write18 enabled.
 entering extended mode
 (./test-xmp.tex
 LaTeX2e 2011/06/27
 [...BLAH...]
 Output written on test-xmp.pdf (1 page, 77069 bytes).
 Transcript written on test-xmp.log.
 $ make extract-xmp
 cc -Wall -o 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider
Neat.  This could be extended to putting a full table of contents into the 
metadata, and in lots of other ways.  The other nice thing about it is that it 
would be possible to push the same data through a LaTeX to HTML toolchain for 
those who want HTML output.


peter

On 10/06/2014 03:18 PM, Norman Gray wrote:


Greetings.

On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote:


querying PDFs is NOT simple and requires a lot of work -and usually
produces lots of errors. just querying metadata is not enough. As I said
before, I understand the PDF as something that gives me a uniform layout.
that is ok and necessary, but not enough or sufficient within the context
of the web of data and scientific publications. I would like to have the
content readily available for mining purposes. if I pay for the publication
I should get access to the publication in every format it is available. the
content should be presented in a way so that it makes sense within the web
of data.  if it is the full content of the paper represented in RDF or XML
fine. also, I would like to have well annotated content, this is simple and
something that could quite easily be part of existing publication
workflows. it may also be part of the guidelines for authors -for instance,
identify and annotate rhetorical structures.



The following might add something to this conversation.

It illustrates getting the metadata from a LaTeX file, putting it into an XMP 
packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of 
/Author, /Title, etc, this just focuses on the XMP packet.

This has the document metadata, the abstract, and an illustrative bit of 
argumentation.  Adding details about the document structure, and (RDF) pointers 
to any figures would be feasible, as would, I suspect, incorporating CSV files 
directly into the PDF.  Incorporating \begin{tabular} tables would be rather 
tricky, but not impossible.  I can't help feeling that the XHTML+RDFa 
equivalent would be longer and need more documentation to instruct the author 
where to put the RDFa magic.

It's not very fancy, and still has rough edges, but it only took me 100 
minutes, from a standing start.

Generating and querying this PDF seems pretty simple to me.




[...]






Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Kingsley Idehen

On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:



On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
It's not hard to query PDFs with SPARQL.  All you have to do is 
extract the
metadata from the document and turn it into RDF, if needed. Lots of 
programs

extract and display this metadata already.


Peter,

Having had 200+ (some-non-rdf-doc} to RDF document transformers built 
under my

direct guidance, there are issues with your claim above:


Huh?  Every single PDF reader that I use can extract the PDF metadata 
and display it.


Again, this isn't about metadata.

The metadata that I see in PDF documents uses a core set of properties 
that are easy to transform into RDF.


Metadata isn't the issue at hand.

Of course, this core set is very small (title, author, and a few other 
things) so you don't get all that much out of the core set.


See my comments above :)






1. The extractors are platform specific -- AWWW is about platform 
agnosticism
(I don't want to mandate an OS for experiencing the power of Linked 
Open Data

transformers / rdfizers)


Well, the extractors would be specific to PDF, but that's hardly 
surprising, I think.



2. It isn't solely about metadata  -- we also have raw data inside these
documents confined to Tables, paragraphs of sentences


Well, sure, but is extracting information directly from the figures or 
tables or text being considered here?  I sure would like this to be 
possible.  How would it work in an HTML context?


Each table is a Class.
Each table record is an instance of the Class represented by the table.
Each table field is a property of a Class represented by the table
Each table field value's data type can be used to discern the range of 
each Class property.


Depending on what the sentences and paragraphs are about you can make an 
RDF statement per sentence.


3. If querying a PDF was marginally simple, I would be demonstrating 
that

using a SPARQL results URL in response to this post :-)


I'm not saying that it is so simple.  You do have to find the metadata 
block in the PDF and then look for the /Title, /Author, ... stuff.


But it could be simple if PDF didn't have the issues I outlined in 
regards to extraction technology. Funnily enough, there's a massive 
opportunity for Adobe to solve this problem, especially as they've now 
ventured heavily into cloud enabling their technologies, If they provide 
APIs from the cloud, this problem could become much simpler to address 
in regards to productive solutions where PDFs become less of the data 
silos that they are today.



Possible != Simple and Productive.


Yes, but there are lots of tools that display PDF metadata, so there 
are some who believe that the benefit is greater than the cost.


Metadata isn't the fundamental quest here.



We want to leverage the productivity and simplicity that AWWW brings 
to data

representation, access, interaction, and integration.


Sure, but the additional costs, if any, on paper authors, reviewers, 
and readers have to be considered.  If these costs are eliminated or 
at least minimized then this good is much more likely to be realized.


With some help from Adobe we can have the best of all worlds here. I am 
going to take a look at their latest cloud offerings and associated APIs.






peter








--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Mike Bergman

Hi Adobe lurkers,

Kingsley has just handed you a valuable means to keep users tied to your 
technologies:


On 10/6/2014 8:18 PM, Kingsley Idehen wrote:

On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:



On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:



I'm not saying that it is so simple.  You do have to find the metadata
block in the PDF and then look for the /Title, /Author, ... stuff.


But it could be simple if PDF didn't have the issues I outlined in
regards to extraction technology. Funnily enough, there's a massive
opportunity for Adobe to solve this problem, especially as they've now
ventured heavily into cloud enabling their technologies, If they provide
APIs from the cloud, this problem could become much simpler to address
in regards to productive solutions where PDFs become less of the data
silos that they are today.




Of course, it probably makes sense for Adobe to do the work, but there 
is also enough known in open source about PDFs for a third party to do 
this as well.


Good idea, K!

Mike



Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Michael Brunnbauer

Hello Paul,

On Sat, Oct 04, 2014 at 06:47:19PM -0500, Paul Tyson wrote:
 I certainly was not suggesting this. It would indeed be silly to publish
 large collections of empirical quantitative propositions in RDF.

Yes. And describing such collections with RDF on a level above basic metadata
is not so silly but very difficult in many cases - as I tried to show with my
example.

 Connecting those propositions to significant conclusions through sound
 arguments is the more important problem. They will attempt to do so,
 presumably, by creating monographs in an electronic source format that
 has more or less structure to it. The structure will support many useful
 operations, including formatting the content for different media,
 hyperlinking to other resources, indexing, and metadata gleaning. The
 structure will most likely *not* support any programmatic operations to
 expose the logical form of the arguments in such a way that another
 person could extract them and put them into his own logic machine to
 confirm, deny, strengthen, or weaken the arguments.
 
 Take for example a research paper whose argument proceeded along the
 lines of All men are mortal; Socrates is a man; therefore Socrates is
 mortal. Along comes a skeptic who purports to have evidence that
 Socrates is not a man. He publishes the evidence in such a way that
 other users can if they wish insert the conclusion from such evidence in
 place of the minor premise in the original researcher's argument. Then
 the conclusion cannot be affirmed. The original researcher must either
 find a different form of argument to prove his conclusion, overturn the
 skeptic's evidence (by further argument, also machine-processable), or
 withdraw his conclusion.
 
 This simple model illustrates how human knowledge has progressed for
 millenia, mediated solely by oral, written, and visual and diagrammatic
 communication. I am suggesting we enlist computers to do something more
 for us in this realm than just speeding up the millenia-old mechanisms.

Can you express this argument with triples? I would not be able to do that.
Maybe if I devoted my life to it - starting with the famous the cat sat on a
mat example. The end result would be incomprehensible to others and
absolutely useless.

I even doubt that science works the way you describe it. Mathematics works 
this way and there are good reasons that formal proofs are absolute exeptions
in this field ca. 2014.

Basic metadata is good. Publishing datasets with the paper is good. Having
typed links in the paper is good. But I would not demand to go further.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpInHXgZpTbr.pgp
Description: PGP signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser

 On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote:
 
...
 Basic metadata is good. Publishing datasets with the paper is good. Having
 typed links in the paper is good. But I would not demand to go further.
 
+1
++1 - the dataset publishing can include the workflow, tools etc, and metadata 
about that.


-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652





Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Dominic Oldman
Further to Hugh's comment about the non-techhy world I found this interesting 
quote on the Web.


The web is more a social creation than a technical one. I designed it for a 
social effect — to help people work together — and not as a technical toy. The 
ultimate goal of the Web is to support and improve our weblike existence in the 
world.


I like this blue sky thinking and it seems to suggest (to me) that sometimes 
constantly moving technical engineering is not always productive or 
collaborative. 

Dominic 

(I will have a look at e-prints :-))



 From: Hugh Glaser h...@glasers.org
To: Daniel Schwabe dschw...@inf.puc-rio.br 
Cc: SW-forum Web semantic-...@w3.org; Linking Open Data public-lod@w3.org; 
Phillip Lord phillip.l...@newcastle.ac.uk; Eric Prud'hommeaux e...@w3.org; 
Peter F. Patel-Schneider pfpschnei...@gmail.com; Bernadette Hyland 
bhyl...@3roundstones.com 
Sent: Saturday, October 4, 2014 12:14 PM
Subject: Re: scientific publishing process (was Re: Cost and access)
 

Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment 
instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Longer version.
I too have a deep sense of deja vu all over yet again :-)

But I have learned something - on-one seems to collaborate with people outside 
the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc 
(unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even 
any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply 
important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few 
tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that 
could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org is an Open Source Linked Data publishing platform for 
publications that handles the document (in any format) and the shallow 
metadata, but could easily have deep as well if people generated it.
Eg http://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it - 
http://foreign.rkbexplorer.com, with currently 24G  182854666 triples, so such 
software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and 
there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar 
communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be 
a trivialish task to move the great DogFood efforts over to it. DogFood really 
is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open 
Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to make http://www.dspace.org do what we 
want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentioned http://www.myexperiment.org before, but can’t remember if I 
have mentioned http://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for example https://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Hugh

 On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote:
 
 As is often the case on the Internet, this discussion gives me a terrible 
 sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Ivan Herman
This is not a direct answer to Daniel, but rather expanding on what he said. 
Actually, he and I were (and still are) in the same IW3C2 committee, ie, we 
share the experience; and I was one of those (although the credit really goes 
to Bob Hopgood, actually, who was pushing that the most) who tried to come up 
with a proper XHTML template.

The real problem is still the missing tooling. Authors, even if technically 
savy like this community, want to do what they set up to do: write their papers 
as quickly as possible. They do not want to spend their time going through some 
esoteric CSS massaging, for example. Let us face it: we are not yet there. The 
tools for authoring are still very poor. This in spite of the fact that many 
realize that PDF is really not the format for our age; we need much more than a 
reproduction of a printed page digitally (as someone referred to in the thread 
I really suffer when I have to read, let alone review, an article in PDF on my 
iPad...).

But I do see an evolution that might change in the coming years. Laura dropped 
the magic word on the early phases if this thread: ePub. ePub is a packaged 
(zip archived) HTML site, with some additional information. It is the format 
that most of the ebook readers understand (hey, it can even be converted into a 
Kindle format:-). Both Firefox and Chrome have ePub reader extensions available 
and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect 
(hope) that the convergence between ePub and browsers will bring these even 
closer in the coming years. Because ePub is a packaged web site, with the core 
content in HTML5 (or SVG), metadata can be added to the content in RDFa, 
microdata, embedded JSON-LD; in fact, metadata can also be added to the archive 
as a separate file so if you are crazy enough you can even add RDF data in 
RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a 
hypertext as you can just master:-)

Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there 
is an evolution. The fact is that publishers are working on XML first (or 
HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare 
their documents in, essentially, HTML (well, a restricted profile thereof), and 
the output is then produced in EPUB, PDF, or pure HTML at the end. Companies 
are created that do similar things and where small(er) publishers can develop 
full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is 
possible to use these for a big conference, although, who knows?). Importantly 
to this community, these tools also include annotation facilities, akin to MS 
Word's commenting tools.

Where does it take us _now_? Much against my instinct and with a bleeding heart 
I have to accept that conferences of the size of WWW, but even ISWC or ESWC, 
cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not 
today. It is a chicken and egg problem, and change may come only with events, 
as well as more progressive scholarly publishers, experimenting with this. Just 
like Daniel (and Bernadette) I would love to see that happening for smaller 
workshops (if budget allows, I could imagine a workshop teaming up with, say, 
Metrodigi to produce the workshop's proceedings). But I am optimistic that the 
change will happen within a foreseeable time and our community (as any 
scholarly community, I believe) will have to prepare itself for a change in 
this area. 

Adding my 2¢ to Daniel's:-)

Ivan

P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And 
this is the saddest story of all: MathML has been around for a long time, and 
it is, actually, part of ePUB as well, but authoring proper mathematics is the 
toughest with the tools out there. Sigh...

P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space...


[1] https://atlas.oreilly.com
[2] http://metrodigi.com
[3] https://www.inkling.com



On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote:

 As is often the case on the Internet, this discussion gives me a terrible 
 sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW conference 
 series, of which I am part) first tried to require HTML for the WWW 
 conference paper submissions, then was forced to make it optional because 
 authors simply refused to write in HTML, and eventually dropped it because NO 
 ONE (ok, very very few hardy souls) actually sent in HTML submissions.
 Our conclusion at the time was that the tools simply were not there, and it 
 was too much of a PITA for people to produce HTML instead of using the text 
 editors they are used to. Things don't seem to have changed much since.
 And this is simply looking at formatting the pages, never mind the whole 
 issue of actually producing hypertext (ie., turning the article's text into 
 linked hypertext), beyond the easily 

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Laura Dawson
I think I mentioned previously, Ivan, but perhaps not on this thread -
Hugh McGuire has developed a Wordpress tool called PressBooks which allows
you to write a book in HTML and export it as an EPUB file. He even
supports schema.org markup in a separate plugin.
(http://www.pressbooks.com)

On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote:

This is not a direct answer to Daniel, but rather expanding on what he
said. Actually, he and I were (and still are) in the same IW3C2
committee, ie, we share the experience; and I was one of those (although
the credit really goes to Bob Hopgood, actually, who was pushing that the
most) who tried to come up with a proper XHTML template.

The real problem is still the missing tooling. Authors, even if
technically savy like this community, want to do what they set up to do:
write their papers as quickly as possible. They do not want to spend
their time going through some esoteric CSS massaging, for example. Let us
face it: we are not yet there. The tools for authoring are still very
poor. This in spite of the fact that many realize that PDF is really not
the format for our age; we need much more than a reproduction of a
printed page digitally (as someone referred to in the thread I really
suffer when I have to read, let alone review, an article in PDF on my
iPad...).

But I do see an evolution that might change in the coming years. Laura
dropped the magic word on the early phases if this thread: ePub. ePub is
a packaged (zip archived) HTML site, with some additional information. It
is the format that most of the ebook readers understand (hey, it can even
be converted into a Kindle format:-). Both Firefox and Chrome have ePub
reader extensions available and Mac OS comes with a free ebook reader
(iBook) that is based on it. I expect (hope) that the convergence between
ePub and browsers will bring these even closer in the coming years.
Because ePub is a packaged web site, with the core content in HTML5 (or
SVG), metadata can be added to the content in RDFa, microdata, embedded
JSON-LD; in fact, metadata can also be added to the archive as a separate
file so if you are crazy enough you can even add RDF data in RDF/XML (no,
please, don't do it:-). And, of course, it can be as much as a hypertext
as you can just master:-)

Tooling? No, not yet:-( Well, not yet for lambda users. But there, too,
there is an evolution. The fact is that publishers are working on XML
first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that
authors prepare their documents in, essentially, HTML (well, a restricted
profile thereof), and the output is then produced in EPUB, PDF, or pure
HTML at the end. Companies are created that do similar things and where
small(er) publishers can develop full projects (Metrodigi, Inkling,
Hachette, ...; but I do not think it is possible to use these for a big
conference, although, who knows?). Importantly to this community, these
tools also include annotation facilities, akin to MS Word's commenting
tools.

Where does it take us _now_? Much against my instinct and with a bleeding
heart I have to accept that conferences of the size of WWW, but even ISWC
or ESWC, cannot reasonably ask their submitters to submit in ePub (or
HTML). Yet. Not today. It is a chicken and egg problem, and change may
come only with events, as well as more progressive scholarly publishers,
experimenting with this. Just like Daniel (and Bernadette) I would love
to see that happening for smaller workshops (if budget allows, I could
imagine a workshop teaming up with, say, Metrodigi to produce the
workshop's proceedings). But I am optimistic that the change will happen
within a foreseeable time and our community (as any scholarly community,
I believe) will have to prepare itself for a change in this area.

Adding my 2¢ to Daniel's:-)

Ivan

P.S. For LaTeX users: I guess the main advantage of LaTeX is the math
part. And this is the saddest story of all: MathML has been around for a
long time, and it is, actually, part of ePUB as well, but authoring
proper mathematics is the toughest with the tools out there. Sigh...

P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that
space...


[1] https://atlas.oreilly.com
[2] http://metrodigi.com
[3] https://www.inkling.com



On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote:

 As is often the case on the Internet, this discussion gives me a
terrible sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW
conference series, of which I am part) first tried to require HTML for
the WWW conference paper submissions, then was forced to make it
optional because authors simply refused to write in HTML, and eventually
dropped it because NO ONE (ok, very very few hardy souls) actually sent
in HTML submissions.
 Our conclusion at the time was that the tools simply were not there,
and it was too much of a PITA for people to produce HTML instead of

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Alexander Garcia Castro
metadata, sure. it is a must. BUT good and thought for the web of data. not
designed for paper based collections. From my experience it is not so much
about representing everything from the paper as triplets. there will be
statements that won't be representable, also, such approach may not be
efficient.

why don't we just go a little bit further up from the lowest hanging fruit
and start talking about self describing documents? well annotated documents
with well structured metadata that are interoperable. this is easy,
achievable, requires little tooling, does not put any burden on the author,
delivers interoperability beyond just simple hyperlinks, it is much more
elegant than adhering to HTML, etc.

On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote:


  On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote:
 
 ...
  Basic metadata is good. Publishing datasets with the paper is good.
 Having
  typed links in the paper is good. But I would not demand to go further.
 
 +1
 ++1 - the dataset publishing can include the workflow, tools etc, and
 metadata about that.


 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652






-- 
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac


Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Breslin, John
+1

John
http://Bresl.in

 On 5 Oct 2014, at 15:39, Ivan Herman i...@w3.org wrote:
 
 This is not a direct answer to Daniel, but rather expanding on what he said. 
 Actually, he and I were (and still are) in the same IW3C2 committee, ie, we 
 share the experience; and I was one of those (although the credit really goes 
 to Bob Hopgood, actually, who was pushing that the most) who tried to come up 
 with a proper XHTML template.
 
 The real problem is still the missing tooling. Authors, even if technically 
 savy like this community, want to do what they set up to do: write their 
 papers as quickly as possible. They do not want to spend their time going 
 through some esoteric CSS massaging, for example. Let us face it: we are not 
 yet there. The tools for authoring are still very poor. This in spite of the 
 fact that many realize that PDF is really not the format for our age; we need 
 much more than a reproduction of a printed page digitally (as someone 
 referred to in the thread I really suffer when I have to read, let alone 
 review, an article in PDF on my iPad...).
 
 But I do see an evolution that might change in the coming years. Laura 
 dropped the magic word on the early phases if this thread: ePub. ePub is a 
 packaged (zip archived) HTML site, with some additional information. It is 
 the format that most of the ebook readers understand (hey, it can even be 
 converted into a Kindle format:-). Both Firefox and Chrome have ePub reader 
 extensions available and Mac OS comes with a free ebook reader (iBook) that 
 is based on it. I expect (hope) that the convergence between ePub and 
 browsers will bring these even closer in the coming years. Because ePub is a 
 packaged web site, with the core content in HTML5 (or SVG), metadata can be 
 added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata 
 can also be added to the archive as a separate file so if you are crazy 
 enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, 
 of course, it can be as much as a hypertext as you can just master:-)
 
 Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there 
 is an evolution. The fact is that publishers are working on XML first (or 
 HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare 
 their documents in, essentially, HTML (well, a restricted profile thereof), 
 and the output is then produced in EPUB, PDF, or pure HTML at the end. 
 Companies are created that do similar things and where small(er) publishers 
 can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not 
 think it is possible to use these for a big conference, although, who 
 knows?). Importantly to this community, these tools also include annotation 
 facilities, akin to MS Word's commenting tools.
 
 Where does it take us _now_? Much against my instinct and with a bleeding 
 heart I have to accept that conferences of the size of WWW, but even ISWC or 
 ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). 
 Yet. Not today. It is a chicken and egg problem, and change may come only 
 with events, as well as more progressive scholarly publishers, experimenting 
 with this. Just like Daniel (and Bernadette) I would love to see that 
 happening for smaller workshops (if budget allows, I could imagine a workshop 
 teaming up with, say, Metrodigi to produce the workshop's proceedings). But I 
 am optimistic that the change will happen within a foreseeable time and our 
 community (as any scholarly community, I believe) will have to prepare itself 
 for a change in this area. 
 
 Adding my 2¢ to Daniel's:-)
 
 Ivan
 
 P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. 
 And this is the saddest story of all: MathML has been around for a long time, 
 and it is, actually, part of ePUB as well, but authoring proper mathematics 
 is the toughest with the tools out there. Sigh...
 
 P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that 
 space...
 
 
 [1] https://atlas.oreilly.com
 [2] http://metrodigi.com
 [3] https://www.inkling.com
 
 
 
 On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote:
 
 As is often the case on the Internet, this discussion gives me a terrible 
 sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW conference 
 series, of which I am part) first tried to require HTML for the WWW 
 conference paper submissions, then was forced to make it optional because 
 authors simply refused to write in HTML, and eventually dropped it because 
 NO ONE (ok, very very few hardy souls) actually sent in HTML submissions.
 Our conclusion at the time was that the tools simply were not there, and it 
 was too much of a PITA for people to produce HTML instead of using the text 
 editors they are used to. Things don't seem to have changed much since.
 And this is simply looking at 

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Ivan Herman

On 05 Oct 2014, at 16:47 , Laura Dawson laura.daw...@bowker.com wrote:

 I think I mentioned previously, Ivan, but perhaps not on this thread -
 Hugh McGuire has developed a Wordpress tool called PressBooks which allows
 you to write a book in HTML and export it as an EPUB file. He even
 supports schema.org markup in a separate plugin.
 (http://www.pressbooks.com)

Indeed, I forgot!

The problem with this service (but also for the others I guess) is that, at 
least through the standard offers on the sites), they may not be appropriate 
for a workshop, that would require leaving access to a large(r) numbers of 
submitters in the submission phase, followed by a selection process to end up 
in a small number of the submissions in the final book. This does not really 
fit in the business models. It should be up to the scholarly publishers to pick 
this up...

(But I guess we digress greatly from the main topic of this mailing list, ie, 
semantic web...)

Ivan

 
 On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote:
 
 This is not a direct answer to Daniel, but rather expanding on what he
 said. Actually, he and I were (and still are) in the same IW3C2
 committee, ie, we share the experience; and I was one of those (although
 the credit really goes to Bob Hopgood, actually, who was pushing that the
 most) who tried to come up with a proper XHTML template.
 
 The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend
 their time going through some esoteric CSS massaging, for example. Let us
 face it: we are not yet there. The tools for authoring are still very
 poor. This in spite of the fact that many realize that PDF is really not
 the format for our age; we need much more than a reproduction of a
 printed page digitally (as someone referred to in the thread I really
 suffer when I have to read, let alone review, an article in PDF on my
 iPad...).
 
 But I do see an evolution that might change in the coming years. Laura
 dropped the magic word on the early phases if this thread: ePub. ePub is
 a packaged (zip archived) HTML site, with some additional information. It
 is the format that most of the ebook readers understand (hey, it can even
 be converted into a Kindle format:-). Both Firefox and Chrome have ePub
 reader extensions available and Mac OS comes with a free ebook reader
 (iBook) that is based on it. I expect (hope) that the convergence between
 ePub and browsers will bring these even closer in the coming years.
 Because ePub is a packaged web site, with the core content in HTML5 (or
 SVG), metadata can be added to the content in RDFa, microdata, embedded
 JSON-LD; in fact, metadata can also be added to the archive as a separate
 file so if you are crazy enough you can even add RDF data in RDF/XML (no,
 please, don't do it:-). And, of course, it can be as much as a hypertext
 as you can just master:-)
 
 Tooling? No, not yet:-( Well, not yet for lambda users. But there, too,
 there is an evolution. The fact is that publishers are working on XML
 first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that
 authors prepare their documents in, essentially, HTML (well, a restricted
 profile thereof), and the output is then produced in EPUB, PDF, or pure
 HTML at the end. Companies are created that do similar things and where
 small(er) publishers can develop full projects (Metrodigi, Inkling,
 Hachette, ...; but I do not think it is possible to use these for a big
 conference, although, who knows?). Importantly to this community, these
 tools also include annotation facilities, akin to MS Word's commenting
 tools.
 
 Where does it take us _now_? Much against my instinct and with a bleeding
 heart I have to accept that conferences of the size of WWW, but even ISWC
 or ESWC, cannot reasonably ask their submitters to submit in ePub (or
 HTML). Yet. Not today. It is a chicken and egg problem, and change may
 come only with events, as well as more progressive scholarly publishers,
 experimenting with this. Just like Daniel (and Bernadette) I would love
 to see that happening for smaller workshops (if budget allows, I could
 imagine a workshop teaming up with, say, Metrodigi to produce the
 workshop's proceedings). But I am optimistic that the change will happen
 within a foreseeable time and our community (as any scholarly community,
 I believe) will have to prepare itself for a change in this area.
 
 Adding my 2¢ to Daniel's:-)
 
 Ivan
 
 P.S. For LaTeX users: I guess the main advantage of LaTeX is the math
 part. And this is the saddest story of all: MathML has been around for a
 long time, and it is, actually, part of ePUB as well, but authoring
 proper mathematics is the toughest with the tools out there. Sigh...
 
 P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that
 space...
 
 
 [1] https://atlas.oreilly.com
 [2] 

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Diogo FC Patrao
Hi Peter

Yes, these tags are semantic, in the context of a document. One could
declare a document section instead of saying that there's a container. This
way one can easily make a table of contents of several documents.

Not semantic in the sense they describe the knowledge in that document -
that's what RDF, OWL are for.

cheers



--
diogo patrão



On Fri, Oct 3, 2014 at 7:04 PM, Peter F. Patel-Schneider 
pfpschnei...@gmail.com wrote:

 Hmm.  Are these semantic?  All these seem to do is to signal parts of a
 document.

 What I would consider to be semantic would be a way of extracting the
 mathematical content of a document.

 peter


 On 10/03/2014 02:32 PM, Diogo FC Patrao wrote:

 html5 has so-called semantic tags, like header, section.



 --
 diogo patrão



 On Fri, Oct 3, 2014 at 6:01 PM, john.nj.dav...@bt.com
 mailto:john.nj.dav...@bt.com wrote:

  Yes, but what makes HTML better for being webby than PDF?
 Because it is a mark-up language (albeit largely syntactic) which
 makes it
 much more amenable to machine processing?

 -Original Message-
 From: Peter F. Patel-Schneider [mailto:pfpschnei...@gmail.com
 mailto:pfpschnei...@gmail.com]
 Sent: 03 October 2014 21:15
 To: Diogo FC Patrao
 Cc: Phillip Lord; semantic-...@w3.org mailto:semantic-...@w3.org;
 public-lod@w3.org mailto:public-lod@w3.org
 Subject: Re: scientific publishing process (was Re: Cost and access)



 On 10/03/2014 10:25 AM, Diogo FC Patrao wrote:
  
  
   On Fri, Oct 3, 2014 at 1:38 PM, Peter F. Patel-Schneider
   pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com
 mailto:pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com
 wrote:
  
   One problem with allowing HTML submission is ensuring that
 reviewers can
   correctly view the submission as the authors intended it to be
 viewed.
   How would you feel if your paper was rejected because one of
 the
 reviewers
   could not view portions of it?  At least with PDF there is a
 reasonably
   good chance that every paper can be correctly viewed by all its
 reviewers,
   even if they have to print it out.  I don't think that the same
 claim can
   be made for HTML-based systems.
  
  
  
   The majority of journals I'm familiar with mandates a certain
 format
   for
   submission: font size, figure format, etc. So, in a HTML format
   submission, there should be rules as well, a standard CSS and the
   right elements and classes. Not different from getting a word(c) or
 latex template.

 This might help.  However, someone has to do this, and ensure that the
 result is generally viewable.
  
  
   Web conference vitally use the web in their reviewing and
 publishing
   processes.  Doesn't that show their allegiance to the web?
 Would
 the use
   of HTML make a conference more webby?
  
  
   As someone said, this is leading by example.

 Yes, but what makes HTML better for being webby than PDF?

  
   dfcp
  
  
  
   peter
  





Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser
Hi Alexander,
 On 5 Oct 2014, at 15:57, Alexander Garcia Castro alexgarc...@gmail.com 
 wrote:
 
 metadata, sure. it is a must. BUT good and thought for the web of data. not 
 designed for paper based collections. From my experience it is not so much 
 about representing everything from the paper as triplets. there will be 
 statements that won't be representable, also, such approach may not be 
 efficient. 
 
 why don't we just go a little bit further up from the lowest hanging fruit 
 and start talking about self describing documents? well annotated documents 
 with well structured metadata that are interoperable. this is easy, 
 achievable, requires little tooling, does not put any burden on the author, 
 delivers interoperability beyond just simple hyperlinks, it is much more 
 elegant than adhering to HTML, etc.   
You lost me here.
Who or what does the well annotated documents and well structured metadata”? 
If it isn’t any burden for the authors.
Easy and little tooling - I wonder what methods and tools you have in mind?

These have proved to be hard problems - otherwise we wouldn’t be having this 
painful discussion.

Best
Hugh
 
 On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote:
 
  On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote:
 
 ...
  Basic metadata is good. Publishing datasets with the paper is good. Having
  typed links in the paper is good. But I would not demand to go further.
 
 +1
 ++1 - the dataset publishing can include the workflow, tools etc, and 
 metadata about that.
 
 
 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
 
 
 
 
 
 
 -- 
 Alexander Garcia
 http://www.alexandergarcia.name/
 http://www.usefilm.com/photographer/75943.html
 http://www.linkedin.com/in/alexgarciac
 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652





Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Hugh Glaser
Hi Ivan,
 On 5 Oct 2014, at 16:42, Ivan Herman i...@w3.org wrote:
 
 
 On 05 Oct 2014, at 16:47 , Laura Dawson laura.daw...@bowker.com wrote:
 
 I think I mentioned previously, Ivan, but perhaps not on this thread -
 Hugh McGuire has developed a Wordpress tool called PressBooks which allows
 you to write a book in HTML and export it as an EPUB file. He even
 supports schema.org markup in a separate plugin.
 (http://www.pressbooks.com)
 
 Indeed, I forgot!
 
 The problem with this service (but also for the others I guess) is that, at 
 least through the standard offers on the sites), they may not be appropriate 
 for a workshop, that would require leaving access to a large(r) numbers of 
 submitters in the submission phase, followed by a selection process to end up 
 in a small number of the submissions in the final book. This does not really 
 fit in the business models. It should be up to the scholarly publishers to 
 pick this up…
Yes, we must keep remembering that the documents are simply one bit of a social 
machine, long before they get anywhere near (the unlikely event of them) being 
published.
 
 (But I guess we digress greatly from the main topic of this mailing list, ie, 
 semantic web…)
We did that quite a while ago, I think :-)
But in the end you just gotta go with the flow, man.

Best
Hugh
 
 Ivan
 
 
 On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote:
 
 This is not a direct answer to Daniel, but rather expanding on what he
 said. Actually, he and I were (and still are) in the same IW3C2
 committee, ie, we share the experience; and I was one of those (although
 the credit really goes to Bob Hopgood, actually, who was pushing that the
 most) who tried to come up with a proper XHTML template.
 
 The real problem is still the missing tooling. Authors, even if
 technically savy like this community, want to do what they set up to do:
 write their papers as quickly as possible. They do not want to spend
 their time going through some esoteric CSS massaging, for example. Let us
 face it: we are not yet there. The tools for authoring are still very
 poor. This in spite of the fact that many realize that PDF is really not
 the format for our age; we need much more than a reproduction of a
 printed page digitally (as someone referred to in the thread I really
 suffer when I have to read, let alone review, an article in PDF on my
 iPad...).
 
 But I do see an evolution that might change in the coming years. Laura
 dropped the magic word on the early phases if this thread: ePub. ePub is
 a packaged (zip archived) HTML site, with some additional information. It
 is the format that most of the ebook readers understand (hey, it can even
 be converted into a Kindle format:-). Both Firefox and Chrome have ePub
 reader extensions available and Mac OS comes with a free ebook reader
 (iBook) that is based on it. I expect (hope) that the convergence between
 ePub and browsers will bring these even closer in the coming years.
 Because ePub is a packaged web site, with the core content in HTML5 (or
 SVG), metadata can be added to the content in RDFa, microdata, embedded
 JSON-LD; in fact, metadata can also be added to the archive as a separate
 file so if you are crazy enough you can even add RDF data in RDF/XML (no,
 please, don't do it:-). And, of course, it can be as much as a hypertext
 as you can just master:-)
 
 Tooling? No, not yet:-( Well, not yet for lambda users. But there, too,
 there is an evolution. The fact is that publishers are working on XML
 first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that
 authors prepare their documents in, essentially, HTML (well, a restricted
 profile thereof), and the output is then produced in EPUB, PDF, or pure
 HTML at the end. Companies are created that do similar things and where
 small(er) publishers can develop full projects (Metrodigi, Inkling,
 Hachette, ...; but I do not think it is possible to use these for a big
 conference, although, who knows?). Importantly to this community, these
 tools also include annotation facilities, akin to MS Word's commenting
 tools.
 
 Where does it take us _now_? Much against my instinct and with a bleeding
 heart I have to accept that conferences of the size of WWW, but even ISWC
 or ESWC, cannot reasonably ask their submitters to submit in ePub (or
 HTML). Yet. Not today. It is a chicken and egg problem, and change may
 come only with events, as well as more progressive scholarly publishers,
 experimenting with this. Just like Daniel (and Bernadette) I would love
 to see that happening for smaller workshops (if budget allows, I could
 imagine a workshop teaming up with, say, Metrodigi to produce the
 workshop's proceedings). But I am optimistic that the change will happen
 within a foreseeable time and our community (as any scholarly community,
 I believe) will have to prepare itself for a change in this area.
 
 Adding my 2¢ to Daniel's:-)
 
 Ivan
 
 P.S. For LaTeX users: I guess the main advantage of LaTeX 

Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Luca Matteis
On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
 The real problem is still the missing tooling. Authors, even if technically 
 savy like this community, want to do what they set up to do: write their 
 papers as quickly as possible. They do not want to spend their time going 
 through some esoteric CSS massaging, for example. Let us face it: we are not 
 yet there. The tools for authoring are still very poor.

But are they still very poor? I mean, I think there are more tools for
rendering HTML than there are for rendering Latex. In fact there are
probably more tools for rendering HTML than anything else out there,
because HTML is used more than anything else. Because HTML powers the
Web!

You can write in Word, and export in HTML. You can write in Markdown
and export in HTML. You can probably write in Latex and export in HTML
as well :)

The tools are not the problem. The problem to me is the printing
afterwords. Conferences/workshops need to print the publications.
Printing consistent Latex/PDF templates is a lot easier than printing
inconsistent (layout wise) HTML pages.

Best,
Luca



Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Kingsley Idehen

On 10/5/14 6:19 AM, Hugh Glaser wrote:

On 5 Oct 2014, at 11:07, Michael Brunnbauerbru...@netestate.de  wrote:


...

Basic metadata is good. Publishing datasets with the paper is good. Having
typed links in the paper is good. But I would not demand to go further.


+1
++1 - the dataset publishing can include the workflow, tools etc, and metadata 
about that.


+1

For context. Hence, my +1 for Hugh's detailed example which also veers 
towards building on a variety of existing efforts rather than ripping 
and replacing etc..


The data behind these papers doesn't need to be locked in tables, in 
PDFs. Neither do the descriptions of the data in question (the so called 
metadata), or the workflows involved.


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Kingsley Idehen

On 10/5/14 9:55 AM, Dominic Oldman wrote:
Further to Hugh's comment about the non-techhy world I found this 
interesting quote on the Web.


The web is more a social creation than a technical one. I designed it 
for a social effect — to help people work together — and not as a 
technical toy. The ultimate goal of the Web is to support and improve 
our weblike existence in the world.


I like this blue sky thinking and it seems to suggest (to me) that 
sometimes constantly moving technical engineering is not always 
productive or collaborative.


Dominic

(I will have a look at e-prints :-))



Yes! The Web is fundamentally about collaboration (which is social) and 
data flow (even when this data is subject to data access policies and 
access control lists etc..).


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Laura Dawson
Word adds all sorts of horrible tags to things and makes the HTML
virtually unrender-able.

On 10/5/14, 4:19 PM, Luca Matteis lmatt...@gmail.com wrote:

On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
 The real problem is still the missing tooling. Authors, even if
technically savy like this community, want to do what they set up to do:
write their papers as quickly as possible. They do not want to spend
their time going through some esoteric CSS massaging, for example. Let
us face it: we are not yet there. The tools for authoring are still very
poor.

But are they still very poor? I mean, I think there are more tools for
rendering HTML than there are for rendering Latex. In fact there are
probably more tools for rendering HTML than anything else out there,
because HTML is used more than anything else. Because HTML powers the
Web!

You can write in Word, and export in HTML. You can write in Markdown
and export in HTML. You can probably write in Latex and export in HTML
as well :)

The tools are not the problem. The problem to me is the printing
afterwords. Conferences/workshops need to print the publications.
Printing consistent Latex/PDF templates is a lot easier than printing
inconsistent (layout wise) HTML pages.

Best,
Luca





Re: scientific publishing process (was Re: Cost and access)

2014-10-05 Thread Ivan Herman

On 05 Oct 2014, at 22:19 , Luca Matteis lmatt...@gmail.com wrote:

 On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote:
 The real problem is still the missing tooling. Authors, even if technically 
 savy like this community, want to do what they set up to do: write their 
 papers as quickly as possible. They do not want to spend their time going 
 through some esoteric CSS massaging, for example. Let us face it: we are not 
 yet there. The tools for authoring are still very poor.
 
 But are they still very poor? I mean, I think there are more tools for
 rendering HTML than there are for rendering Latex. In fact there are
 probably more tools for rendering HTML than anything else out there,
 because HTML is used more than anything else. Because HTML powers the
 Web!
 
 You can write in Word, and export in HTML. You can write in Markdown
 and export in HTML. You can probably write in Latex and export in HTML
 as well :)
 
 The tools are not the problem. The problem to me is the printing
 afterwords. Conferences/workshops need to print the publications.
 Printing consistent Latex/PDF templates is a lot easier than printing
 inconsistent (layout wise) HTML pages.

Interestingly, my experience is just about the opposite. Sorry:-)

Yes, tools to _render_ HTML are around. But the issue is the _production_ of 
those pages (and, to make one step further alongside my original mail, to 
produce an ePub once the HTML pages are around). Word (as Laura remarked) 
produces nearly useless HTML; OpenOffice/LibreOffice is not much better I am 
afraid. Markdown is fine indeed, and markdown editors like Mou produce proper 
HTML, but the markup (sic!) facilities of markdown are limited. It is all right 
for simple books, but I suspect it would be more of a problem for scientific 
articles. (But yes, that is an avenue to explore.) WYSIWYG HTML editors exist 
by now, but I am not sure they are satisfactory either (I use BlueGriffon 
often, but I still have to switch back and forth between source mode and 
WYSIWYG mode, which beats the purpose). Of course, I could expect a Web 
technology related crows to use HTML source editing directly but the experience 
by Daniel and myself with the World Wide Web conference(!) is that people do 
not want to do that. (Researchers in, say, Web Search have proven to be unable 
or unwilling to edit HTML source. It was a real surprise...). Ie, the authoring 
tool offers are still limited.

On the other hand... how long do we want to care about printing? The WWW 
conference (to stay with that example) has given up on printed proceedings for 
a while. The proceedings are published by the ACM and offered through their 
digital library, and the individual papers are available on-line on the 
conference site. I know that ISWC and (I believe) ESWC still produce printed 
Springer Proceedings but I wonder how long; who needs those in print? I must 
admit that I have not picked up a printed proceedings or journal article for 
many years, I look for the online versions instead. Of course, I may print a 
single paper because I want to read it while, for example, on the train, but 
then I do not really care about the way it looks. And, with tablets, even this 
usage is becoming less significant. That being said, producing a proper PDF 
from HTML is again not a problem, CSS has a number of page/print specific terms 
and is being actively worked on in this respect.

Cheers

Ivan 

 
 Best,
 Luca



Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Sarven Capadisli

On 2014-10-04 04:14, Daniel Schwabe wrote:

As is often the case on the Internet, this discussion gives me a terrible sense 
of dejá vu. We've had this discussion many times before.
Some years back the IW3C2 (the steering committee for the WWW conference 
series, of which I am part) first tried to require HTML for the WWW conference 
paper submissions, then was forced to make it optional because authors simply 
refused to write in HTML, and eventually dropped it because NO ONE (ok, very 
very few hardy souls) actually sent in HTML submissions.
Our conclusion at the time was that the tools simply were not there, and it was 
too much of a PITA for people to produce HTML instead of using the text editors 
they are used to. Things don't seem to have changed much since.


Hi Daniel, here is my long reply as usual and I hope you'll give it a 
shot :)


I've offered *a* solution that is compatible with the existing workflow 
without asking for any extra work from the OC/PCs, with the exception 
that the Web-native technologies for the submissions are officially 
encouraged. They will get their PDF in the end to cater the existing 
pipeline. In the meantime, the community retains higher quality research 
documents.



And this is simply looking at formatting the pages, never mind the whole issue of 
actually producing hypertext (ie., turning the article's text into linked hypertext), 
beyond the easily automated ones (e.g., links to authors, references to papers, etc..). 
Producing good hypertext, and consuming it, is much harder than writing plain text. And 
most authors are not trained in producing this kind of content. Making this actually 
semantic in some sense is still, in my view, a research topic, not a routine 
reality.
Until we have robust tools that make it as easy for authors to write papers 
with the advantages afforded by PDF, without its shortcomings, I do not see 
this changing.


I disagree that we don't have sufficient or robust tools to author and 
publish web pages. I find it ironic that we are still debating on this 
issue as if we are in the early-mid 90s. Or ignoring [2], or the 
possibility to use a service which offers [3] to publish a (pardon me 
for saying) but a friggin' web page.


If it is about coding, I find it unreasonable or unprofessional to 
think that a Computer/Web Scientist in 2014 that's publicly funded to do 
their academic endeavors is incapable of groking HTML. But, somehow 
LaTeX is presumed to be okay for the new post-graduate that's coming in. 
Really? Or is the real reason that no one is asking them to do otherwise?


They can randomly pick a WYSIWYG editor tool or an existing publishing 
service. No one is forcing anyone to hand-code anything. Just as no one 
is forced to hand code LaTeX.


We have the tools and even services to help us do all of that. Both from 
and outside of SW. We had them for a long time. What was lacking was a 
continuous green light to use them. That light stopped flashing as 
you've mentioned.


But again, our core problems are not technical in nature.


I would love to see experiments (e.g., certain workshops) to try it out before 
making this a requirement for whole conferences.


I disagree. The fact that workshops or tracks on linked science or 
semantic publishing didn't deliver is a clear sign that they have the 
wrong process at the root. When those workshops ask for submissions to 
be in PDF, that's the definition of irony. There is no useful 
machine-friendly research objects! Opportunity lost at every single CfP.


Yet, we eloquently describe hypothetical systems or tools that will one 
day do all the magic for us instead of taking a good look at what's 
right in front of us.


So, lets talk about putting the cart before the horse. A lot of time and 
energy (e.g., public funding) that could have been better used simply by 
actually *having the data*. And, then figuring out how to utilize that. 
There is no data, so what's there to analyze or learn from? Some 
research trying to figure out what to do with trivial and limited 
metadata e.g., title, abstract, authors, subjects? Is 
data.semanticweb.org (dog food) the best we can show for our 
dogfooding ability?


I can't search/query for research knowledge on topic T, that used 
variables X, Y, which implemented a workflow step S, that's cited by or 
used those exact parameters, that happens to use the datasets that I'm 
planning to use in my research.


Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?


Bernadette's suggestions are a good step in this direction, although I suspect 
it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).


Nothing is stopping us from doing things in parallel and we are in fact. 
Close-by efforts from workshops to force11, public-dwbp-wg, 
public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, 
besides the whole SW/LD stack, which benefits scientific research 
communication and 

Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Michael Brunnbauer

Hello Paul,

On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote:
 Yes. We are setting the bar too low. The field of knowledge computing
 will only reach maturity when authors can publish their theses in such a
 manner that one can programmatically extract the concepts, propositions,
 and arguments;

I thought Kingsley is the only one seriously suggesting that we communicate in
triples. Let's take one step back to the proposal of making research datasets
machine readable with RDF.

Please go to http://crcns.org/NWB

Have a look at an example dataset:

 http://crcns.org/data-sets/hc/hc-3/about-hc-3

The total size of the data is about 433 GB compressed

Even if you do not use triples for all of that (which would be insane),
specifying a structured data container is a very difficult task.

So instead of talking about setting the bar higher, why not just help the 
people over there with their problem?

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpwtKuCa75P2.pgp
Description: PGP signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Hugh Glaser
Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment 
instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Longer version.
I too have a deep sense of deja vu all over yet again :-)

But I have learned something - on-one seems to collaborate with people outside 
the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc 
(unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even 
any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply 
important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few 
tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that 
could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org is an Open Source Linked Data publishing platform for 
publications that handles the document (in any format) and the shallow 
metadata, but could easily have deep as well if people generated it.
Eg http://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it - 
http://foreign.rkbexplorer.com, with currently 24G  182854666 triples, so such 
software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and 
there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar 
communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be 
a trivialish task to move the great DogFood efforts over to it. DogFood really 
is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open 
Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to make http://www.dspace.org do what we 
want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentioned http://www.myexperiment.org before, but can’t remember if I 
have mentioned http://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for example https://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Hugh

 On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote:
 
 As is often the case on the Internet, this discussion gives me a terrible 
 sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW conference 
 series, of which I am part) first tried to require HTML for the WWW 
 conference paper submissions, then was forced to make it optional because 
 authors simply refused to write in HTML, and eventually dropped it because NO 
 ONE (ok, very very few hardy souls) actually sent in HTML submissions.
 Our conclusion at the time was that the tools simply were not there, and it 
 was too much of a PITA for people to produce HTML instead of using the text 
 editors they are used to. Things don't seem to have changed much since.
 And this is simply looking at formatting the pages, never mind the whole 
 issue of actually producing hypertext (ie., turning the article's text into 
 linked hypertext), beyond the easily automated ones (e.g., links to authors, 
 references to papers, etc..). Producing good hypertext, and consuming it, is 
 much harder than writing plain text. And most authors are not trained in 
 producing this kind of content. Making this actually semantic in some 

Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Jürgen Jakobitsch
PDFs are surprisingly flexible and open containers for transporting around
Stuff

hi, i'm feeling tempted to add something provocative ;-)

PDFs are surprisingly mature in disguising all the 'bla bla' and make it
look nice...

= http://tractatus-online.appspot.com/Tractatus/jonathan/index.html

wkr turnguard




| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web   : http://www.semantic-web.at/
| foaf  : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web   : http://www.turnguard.com
| foaf  : http://www.turnguard.com/turnguard
| g+: https://plus.google.com/111233759991616358206/posts
| skype : jakobitsch-punkt
| xmlns:tg  = http://www.turnguard.com/turnguard#;

2014-10-04 14:47 GMT+02:00 Norman Gray nor...@astro.gla.ac.uk:


 Bernadette, hello.

 On 2014 Oct 4, at 00:36, Bernadette Hyland bhyl...@3roundstones.com
 wrote:

 ... a really useful message which pulls several of these threads
 together.  The following is a rather fragmentary response.

 As a reference point, I tend to think publication = LaTeX - PDF.  To
 pre-dispel a misconception, here, I'm not being a cheerleader for PDF
 below, but a fair fraction of the antagonism directed towards PDF in this
 thread is, I think, misplaced -- PDF is not the problem.

  We'd do ourselves a huge favor if we showed (STM) publishing executives
 why this Linked Data stuff matters anyway.

 They know.  A surprisingly large fraction of the Article Processing Charge
 we pay to them goes on extracting, managing and sharing metadata.  That
 includes DOIs, Crossref feeds, science direct, and so on and so on, and so
 (it seems) on.  It also includes conversion to XML: if you submit a LaTeX
 file to a big publisher, the first thing they'll do is convert it to
 XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and
 preserve that; several of them then re-generate LaTeX for final production.

 To a large extent, I suspect publishers now regard metadata management as
 their Job -- in the sense of their contribution to the scholarly endeavour
 -- and they could do without the dead trees.  If you can offer them a way
 of making metadata _insertion_ easier, which is cost effective, can be
 scaled up, and which a _broad_ range of authors will accept (the hard bit),
 they'll rip your arm off.

  1) PDF works well for (STM) publishers who require fixed page display;

 Yes, and for authors.  Given an alternative between an HTML version of a
 paper and a PDF version, I will _always_ choose the PDF, because it's
 zero-hassle, more reliably faithful to the author's original, more
 readable, and I can read it in the bath.

  2) PDF doesn't take advantage of the advances we've made in machine
 readability;

 If by this you mean RDF, then yes, the naive ways of generating PDFs are
 not RDF-aware.  So we shouldn't be naive...

 XMP is an ISO standard (as PDF is, and like it originating from Adobe) and
 is a type of RDF (well, an irritatingly 90% profile of RDF, but let that
 pass).  Though it's not trivial, it's not hard to generate an XMP packet
 and get it into a PDF, and once there, the metadata job is mostly done.

  3) In fact, PDFs suck on eBook readers which are all about flexible page
 layout; and

 Sure, but they're not intended for e-book readers, so of course they're
 poor at that.

  4) We already have the necessary Web Standards to address the problem,
 so no need to recreate the wheel.

 If, again, you mean RDF, then I agree completely.

  -- Produce a Web-based tool that allows researchers to share their
 [privately | publicly ] funded knowledge and produces a variety of outputs:
 LaTeX, PDF and carries with it a machine readable representation.

 Well, not web-based: I'd want something I can run on my own machine.

  Do people agree with the following SOLUTION approach?
 
  The international standards to solve this exist. Standards from W3C and
 the International Digital Publishing Forum (IDPF).[2]  Use (X)HTML for
 generalized document creation/rendering. Use CSS for styling. Use MathML
 for formulas. Use JS for action. Use RDF to model the metadata within HTML.

 PDF and XMP are both ISO standards, too.  LaTeX isn't a Standard standard,
 but it's pretty damn stable.

 MathML one would _not_ want to type.  The only ways of generating MathML,
 that I'm slightly familiar with, start with TeX syntax.  There are
 presumably GUI-based ones, too *shudder*.

  I propose a 'walk before we run' approach but do better than basic
 metadata (i.e., title, author name, institution, abstract).  Link to other
 scholarly communities/projects such as Vivo.[3]

 I generate Atom feeds for my PDF lecture notes.  The feed content is
 extracted from the XMP and from the /Author, /Title, etc, metadata within
 the PDF.  That metadata gets there 

Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Kingsley Idehen

On 10/4/14 7:14 AM, Hugh Glaser wrote:

Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment 
instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Longer version.
I too have a deep sense of deja vu all over yet again:-)

But I have learned something - on-one seems to collaborate with people outside 
the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc 
(unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even 
any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply 
important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few 
tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that 
could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org  is an Open Source Linked Data publishing platform for 
publications that handles the document (in any format) and the shallow 
metadata, but could easily have deep as well if people generated it.
Eghttp://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it 
-http://foreign.rkbexplorer.com, with currently 24G  182854666 triples, so 
such software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and 
there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar 
communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be 
a trivialish task to move the great DogFood efforts over to it. DogFood really 
is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open 
Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to makehttp://www.dspace.org  do what we 
want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentionedhttp://www.myexperiment.org  before, but can’t remember if I 
have mentionedhttp://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for examplehttps://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Hugh


+1


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


  1   2   >