Re: scientific publishing process (was Re: Cost and access)
On 2014-10-07 15:44, Peter F. Patel-Schneider wrote: Well, I remain totally unconvinced that any current HTML solution is as good as the current PDF setup. Certainly htlatex is not suitable. There may be some way to get tex4ht to do better, but no one has provided a solution. Sarven Capadisli sent me some HTML that looks much better, but even on a math-light paper I could see a number of glitches. I haven't seen anything better than that. Would you mind creating an issue for the glitches that you are experiencing? https://github.com/csarven/linked-research/issues Please mention your environment and the documents you've looked at. Also keep in mind the LNCS and ACM SIG authoring guidelines. The purpose of the LNCS and ACM CSS is to adhere to the authoring guidelines so that the the generated PDF file or print output looks as expected (within reason). Much appreciated! -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Done. The goal of a new paper-preparation and display system should, however, be to be better than what is currently available. Most HTML-based solutions do not exploit the benefits of HTML, strangely enough. Consider, for example, citation links. They generally jump you to the references section. They should instead pop up the reference, as is done in Wikipedia. Similarly for links to figures. Instead of blindly jumping to the figure, they should do something better, perhaps popping up the figure or, if the figure is already visible, just highlighting it. I have put in both of these as issues. peter On 10/08/2014 03:18 AM, Sarven Capadisli wrote: On 2014-10-07 15:44, Peter F. Patel-Schneider wrote: Well, I remain totally unconvinced that any current HTML solution is as good as the current PDF setup. Certainly htlatex is not suitable. There may be some way to get tex4ht to do better, but no one has provided a solution. Sarven Capadisli sent me some HTML that looks much better, but even on a math-light paper I could see a number of glitches. I haven't seen anything better than that. Would you mind creating an issue for the glitches that you are experiencing? https://github.com/csarven/linked-research/issues Please mention your environment and the documents you've looked at. Also keep in mind the LNCS and ACM SIG authoring guidelines. The purpose of the LNCS and ACM CSS is to adhere to the authoring guidelines so that the the generated PDF file or print output looks as expected (within reason). Much appreciated! -Sarven http://csarven.ca/#i
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-08 14:10, Peter F. Patel-Schneider wrote: Done. The goal of a new paper-preparation and display system should, however, be to be better than what is currently available. Most HTML-based solutions do not exploit the benefits of HTML, strangely enough. Consider, for example, citation links. They generally jump you to the references section. They should instead pop up the reference, as is done in Wikipedia. Similarly for links to figures. Instead of blindly jumping to the figure, they should do something better, perhaps popping up the figure or, if the figure is already visible, just highlighting it. I have put in both of these as issues. Thanks a lot for the issues! Really great to have this feedback. I have resolved and commented on some of those already, and will look at the rest very shortly. I am all for improving the interaction as well. I'd like to state again that the development was so far focused on adhering to the LNCS/ACM guidelines, and improving the final PDF/print product. That is to get on reasonable grounds with the state of the art. Moving on: I plan to bring in the interaction and framework to easily semantically enrich the document as well as the overall UX. I have some preliminary code in my dev branch, and will bring it forward, and would like feedback as well. Thanks again and please continue to bring forward any issues or feature requests. Contributors are most welcome! -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: PLOS is an interesting case. The HTML for PLOS articles is relatively readable. However, the HTML that the PLOS setup produces is failing at math, even for articles from August 2014. As well, sometimes when I zoom in or out (so that I can see the math better) Firefox stops displaying the paper, and I have to reload the whole page. Interesting bug that. Worth reporting to PLoS. Strangely, PLOS accepts low-resolution figures, which in one paper I looked at are quite difficult to read. Yep. Although, it often provides several links to download higher res images, including in the original file format. Quite handy. However, maybe the PLOS method can be improved to the point where the HTML is competitive with PDF. Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is because scientists are used to viewing in print format, I suspect, but partly not. I'm hoping that, eventually, PLoS will stop using image based maths. I'd like to be able to zoom maths independently, and copy and paste it in either mathml or tex. Mathjax does this now already. Phil
Re: scientific publishing process (was Re: Cost and access)
On 10/08/2014 05:31 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: PLOS is an interesting case. The HTML for PLOS articles is relatively readable. However, the HTML that the PLOS setup produces is failing at math, even for articles from August 2014. As well, sometimes when I zoom in or out (so that I can see the math better) Firefox stops displaying the paper, and I have to reload the whole page. Interesting bug that. Worth reporting to PLoS. PLoS doesn't appear to have a bug reporting system in place. Even their general assistance email is obsfucated. I sent them a message anyway. Strangely, PLOS accepts low-resolution figures, which in one paper I looked at are quite difficult to read. Yep. Although, it often provides several links to download higher res images, including in the original file format. Quite handy. In this case, even the original was low resolution. However, maybe the PLOS method can be improved to the point where the HTML is competitive with PDF. Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is because scientists are used to viewing in print format, I suspect, but partly not. I'm hoping that, eventually, PLoS will stop using image based maths. I'd like to be able to zoom maths independently, and copy and paste it in either mathml or tex. Mathjax does this now already. I would suggest that this should have been one of their highest priorities. Phil peter
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: The goal of a new paper-preparation and display system should, however, be to be better than what is currently available. Most HTML-based solutions do not exploit the benefits of HTML, strangely enough. Consider, for example, citation links. They generally jump you to the references section. They should instead pop up the reference, as is done in Wikipedia. Yes, I agree. I do this on my blog or rather provide it as an option. The reference list is also automatically generated here, so, for example, there is no metadata associated with the two references in this post: http://www.russet.org.uk/blog/3015 In both cases, the reference list is formed from the metadata on the other end of the link, gathered either from the HTML, or in the case of arXiv from their XML-RPC interface. Similarly for links to figures. Instead of blindly jumping to the figure, they should do something better, perhaps popping up the figure or, if the figure is already visible, just highlighting it. Or better still, providing access to the code and data from which the figure is derived. Phil
Re: scientific publishing process (was Re: Cost and access)
Dear Sarven, I really appreciate the work that you're doing with trying to style an HTML page to look similar to the Latex templates. But there's so many typesetting details that are not available in browsers, which means you're going to do a lot of DOM hacking to be able to produce the same quality typography that Latex is capable of. Latex will justify text, automatically hyphenate, provide proper spacing, and other typesetting features. Not to mention kerning. Kerning is a *huge* thing in typography and with HTML you're stuck with creating a DOM element for every single letter - yup you heard me right. I think it would be super cool to create some sort of JavaScript framework that would enable the same level of typography that Latex is capable of, but you'll eventually hit some hard limitations and you'll probably be stuck drawing on a canvas. What are your ideas regarding these problems? On Wed, Oct 8, 2014 at 2:26 PM, Sarven Capadisli i...@csarven.ca wrote: On 2014-10-08 14:10, Peter F. Patel-Schneider wrote: Done. The goal of a new paper-preparation and display system should, however, be to be better than what is currently available. Most HTML-based solutions do not exploit the benefits of HTML, strangely enough. Consider, for example, citation links. They generally jump you to the references section. They should instead pop up the reference, as is done in Wikipedia. Similarly for links to figures. Instead of blindly jumping to the figure, they should do something better, perhaps popping up the figure or, if the figure is already visible, just highlighting it. I have put in both of these as issues. Thanks a lot for the issues! Really great to have this feedback. I have resolved and commented on some of those already, and will look at the rest very shortly. I am all for improving the interaction as well. I'd like to state again that the development was so far focused on adhering to the LNCS/ACM guidelines, and improving the final PDF/print product. That is to get on reasonable grounds with the state of the art. Moving on: I plan to bring in the interaction and framework to easily semantically enrich the document as well as the overall UX. I have some preliminary code in my dev branch, and will bring it forward, and would like feedback as well. Thanks again and please continue to bring forward any issues or feature requests. Contributors are most welcome! -Sarven http://csarven.ca/#i
Re: scientific publishing process (was Re: Cost and access)
I'm always at a bit of a loss when I read this sort of thing. Kerning, seriously? We can't share scientific content in HTML because of kerning? In practice, web browsers do a perfectly reasonable job of text layout, in real time, and do it in a way that allows easy reflowing. The thing about Sarven's LNCS style sheets, for instance, is that I like the most is that I can turn them off; I don't like the LNCS format. Having said all of that, 5 minutes of googling suggests that, kerning support is in Canditate Recommendation form from W3C, and at least three different JS libraries that support it. Phil Luca Matteis lmatt...@gmail.com writes: I really appreciate the work that you're doing with trying to style an HTML page to look similar to the Latex templates. But there's so many typesetting details that are not available in browsers, which means you're going to do a lot of DOM hacking to be able to produce the same quality typography that Latex is capable of. Latex will justify text, automatically hyphenate, provide proper spacing, and other typesetting features. Not to mention kerning. Kerning is a *huge* thing in typography and with HTML you're stuck with creating a DOM element for every single letter - yup you heard me right. I think it would be super cool to create some sort of JavaScript framework that would enable the same level of typography that Latex is capable of, but you'll eventually hit some hard limitations and you'll probably be stuck drawing on a canvas. What are your ideas regarding these problems? On Wed, Oct 8, 2014 at 2:26 PM, Sarven Capadisli i...@csarven.ca wrote: On 2014-10-08 14:10, Peter F. Patel-Schneider wrote: Done. The goal of a new paper-preparation and display system should, however, be to be better than what is currently available. Most HTML-based solutions do not exploit the benefits of HTML, strangely enough. Consider, for example, citation links. They generally jump you to the references section. They should instead pop up the reference, as is done in Wikipedia. Similarly for links to figures. Instead of blindly jumping to the figure, they should do something better, perhaps popping up the figure or, if the figure is already visible, just highlighting it. I have put in both of these as issues. Thanks a lot for the issues! Really great to have this feedback. I have resolved and commented on some of those already, and will look at the rest very shortly. I am all for improving the interaction as well. I'd like to state again that the development was so far focused on adhering to the LNCS/ACM guidelines, and improving the final PDF/print product. That is to get on reasonable grounds with the state of the art. Moving on: I plan to bring in the interaction and framework to easily semantically enrich the document as well as the overall UX. I have some preliminary code in my dev branch, and will bring it forward, and would like feedback as well. Thanks again and please continue to bring forward any issues or feature requests. Contributors are most welcome! -Sarven http://csarven.ca/#i -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Re: scientific publishing process (was Re: Cost and access)
Hi Sarven, Congratulations for kicking off a thread that has received over 150 replies across two W3 lists in a week. That is impressive! This isn't the first time (nor the last) that it has been discussed. The active discussion reaffirms the need to drive a closer dialog between Web technologists publishers for scientific publishing. One gets the sense, there is serious depth of expertise on the publishing workflow on these lists. People have taken considerable time to reply be constructive with ideas to advance the effort. Thanks. Can anyone advise on whether the publishers in 2014 are in fact on the 'front lines' of defining these standards that affect their core business, i.e., Web standards that are the foundation for layout typography? Is this an opportunity for W3C members to take this up as a topic for discussion at the upcoming TPAC? Perhaps this is already scheduled? W3C staffers, any guidance on this? I still content there is a great business opportunity for an entrepreneurial Web publishing-savvy team to build something really useful immediately have 1000+ researchers provide feedback drive use. Cheers, Bernadette Hyland CEO, 3 Round Stones, Inc. http://3roundstones.com http://about.me/bernadettehyland PS. It's also clear, your PhD dissertation topic is of keen interest Sarven!! We'd like to read it when you're done (no pressure ;-) On Oct 8, 2014, at 10:09 AM, Gray, Alasdair a.j.g.g...@hw.ac.uk wrote: On 8 Oct 2014, at 13:31, Phillip Lord phillip.l...@newcastle.ac.uk wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: [snip] However, maybe the PLOS method can be improved to the point where the HTML is competitive with PDF. Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is because scientists are used to viewing in print format, I suspect, but partly not. Or is that because they want to import it into their own reference management system, e.g. Mendeley, which does not support the HTML version? Alasdair [snip] Phil Alasdair J G Gray Lecturer in Computer Science, Heriot-Watt University, UK. Email: a.j.g.g...@hw.ac.uk Web: http://www.alasdairjggray.co.uk ORCID: http://orcid.org/-0002-5711-4872 Telephone: +44 131 451 3429 Twitter: @gray_alasdair We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply. Heriot-Watt University is a Scottish charity registered under charity number SC000278.
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-08 15:14, Luca Matteis wrote: Dear Sarven, I really appreciate the work that you're doing with trying to style an HTML page to look similar to the Latex templates. But there's so many typesetting details that are not available in browsers, which means you're going to do a lot of DOM hacking to be able to produce the same quality typography that Latex is capable of. Latex will justify text, automatically hyphenate, provide proper spacing, and other typesetting features. Not to mention kerning. Kerning is a *huge* thing in typography and with HTML you're stuck with creating a DOM element for every single letter - yup you heard me right. I think it would be super cool to create some sort of JavaScript framework that would enable the same level of typography that Latex is capable of, but you'll eventually hit some hard limitations and you'll probably be stuck drawing on a canvas. What are your ideas regarding these problems? We do not have to have everything pixel perfect and comprehensive all up front. That is a common pitfall. Applying the Pareto principle is preferable. LaTeX is great for what it is intended for! This was never in question. We are however looking at a bigger picture for Web Science communication and access. There will be far more concerns than the presentation layer alone. As for your technical questions: we need to create issues or features, and more importantly, open discussions like in these threads, to better understand what the SW research community's needs are. So, please create an issue because what you raise is important to be looked into further. I do not have all the technical answers, even though I am very close to the world of typeface, typography, and book design :) In any case, if it was possible in LaTeX, I hope it is not naive of me to say that it can be achieved (if not already) in HTML+CSS+JavaScript. -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 10/8/14 10:18 AM, Sarven Capadisli wrote: On 2014-10-08 15:14, Luca Matteis wrote: Dear Sarven, I really appreciate the work that you're doing with trying to style an HTML page to look similar to the Latex templates. But there's so many typesetting details that are not available in browsers, which means you're going to do a lot of DOM hacking to be able to produce the same quality typography that Latex is capable of. Latex will justify text, automatically hyphenate, provide proper spacing, and other typesetting features. Not to mention kerning. Kerning is a *huge* thing in typography and with HTML you're stuck with creating a DOM element for every single letter - yup you heard me right. I think it would be super cool to create some sort of JavaScript framework that would enable the same level of typography that Latex is capable of, but you'll eventually hit some hard limitations and you'll probably be stuck drawing on a canvas. What are your ideas regarding these problems? We do not have to have everything pixel perfect and comprehensive all up front. That is a common pitfall. Applying the Pareto principle is preferable. LaTeX is great for what it is intended for! This was never in question. We are however looking at a bigger picture for Web Science communication and access. There will be far more concerns than the presentation layer alone. As for your technical questions: we need to create issues or features, and more importantly, open discussions like in these threads, to better understand what the SW research community's needs are. So, please create an issue because what you raise is important to be looked into further. I do not have all the technical answers, even though I am very close to the world of typeface, typography, and book design :) In any case, if it was possible in LaTeX, I hope it is not naive of me to say that it can be achieved (if not already) in HTML+CSS+JavaScript. -Sarven http://csarven.ca/#i Sarven, Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked Open Data URI that identifies Github issue tracker for Linked Data Research: [1] http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 -- Linked Open Data URI (basic entity description page) [2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose over relations facets oriented entity description page [3] http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data -- Vapor Report (re., Linked Open Data principles adherence) . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Reference management (was: Re: scientific publishing process (was Re: Cost and access))
On Oct 8, 2014 10:15 AM, Gray, Alasdair a.j.g.g...@hw.ac.uk wrote: Or is that because they want to import it into their own reference management system, e.g. Mendeley, which does not support the HTML version? 1. It is quite easy to embedded metadata in HTML pages in forms designed for accurate importing into reference managers (Hellman 2009). Mendeley has been known to have problems with imports in cases where a proxy server is involved. COinS does have the slight problem of being kind of being based on top of OpenURL, which is made of lose (Hellman 2010) , but is the current least bad solution. 2. There is ongoing work to create a decent ontology for better embedding. The BibEx work for schema.org is going in the right direction (Bibex 2014). The Library of Congress BIBFRAME effort (LC 2014) is going in the right direction iff the right direction is defined as straight off a cliff - see eg Spero (2013) 2. There is a good comparison of Docear, Mendeley,and Zotero available in Beel (2014), which is remarkably balanced given that he is the PI for Docear. He includes a link to an earlier post mocking several completely unbalanced comparison charts prepared by different vendors (he finishes by making a similar chart showing Docear is the only possible choice. Table snark FTW.) My personal favorite tool is Bibdesk (2014), which is Mac and bibtex specific, but justifies this by using many Mac specific capabilities. There is some support for integration into word (Don't mention the Word. I mentioned it once but I think I got away with it.) 3. All of these tools could benefit from even simple subsumption reasoning (although vocabularies like the LCSH have errors that lead to amusing and frustrating results - everything about doorbells is also about mammals, eschatology, the soul, and psychotherapy (Spero 2008). It is important to recognize the difference between a knowledge organization system, for describing intentional concepts, and knowledge representation systems, for describing a view of reality. Leonard Cohen via Elaine Svenonius authorizes laughing at people who confuse the two. http://ibiblio.org/ses/anyqs.jpg 3. Extended rants on misunderstandings of plausible Ontologies and ontologies of the Bibliographic Universe omitted (cough SKOS cough). Simon References Beel, Jorean (2014) . Comprehensive Comparison of Reference Managers : Mendeley vs. Zotero vs. Docear. Available at http://www.docear.org/2014/01/15/comprehensive-comparison-of-reference-managers-mendeley-vs-zotero-vs-docear/ Bibdesk (2014). Bibdesk wiki: Main Page. Available at http://sourceforge.net/p/bibdesk/wiki/Main_Page/ BibEx (2014). Schema Bib Extend Community Group Wiki: Main Page. Available at http://www.w3.org/community/schemabibex/wiki/index.php?title=Main_Page Hellman, Eric (2009). OpenURL COinS : A convention to embed bibliographic metadata in HTML. Available at http://ocoins.info Hellman, Eric (2010). It's cool to hate on OpenURL (was Re: Twitter Annotations). Available at https://listserv.nd.edu/cgi-bin/wa?A2=CODE4LIB;axd%2FoQ;201004291208400400 https://www.mail-archive.com/code4lib@listserv.nd.edu/msg07857.html LC (2014). BIBFRAME : Bibliographic Framework Initiative. Available at http://www.loc.gov/bibframe/ Spero, Simon (2008). LCSH is to Thesaurus as Doorbell is to Mammal: visualizing structural problems in the Library of Congress subject headings. In Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications. DCMI. Available at: http://iBiblio.org/ses/poster.pdf Spero, Simon (2013). Prolegomena to any future metadata. Available at http://www.ibiblio.org/fred2.0/wordpress/?p=269
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-08 18:38, Kingsley Idehen wrote: Sarven, Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked Open Data URI that identifies Github issue tracker for Linked Data Research: [1] http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 -- Linked Open Data URI (basic entity description page) [2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose over relations facets oriented entity description page [3] http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data -- Vapor Report (re., Linked Open Data principles adherence) . It's pretty cool that you can grab stuff out of GitHub issues, even comments! Papers link to code and then to commits and issues. See also [1]. Even comments e.g., [2]. Or even in the direction of paper comments which can be integrated and picked right up from the page e.g., [3]. Just need to add +/-1 buttons and triplify the review ;) With WebID+ACL, we have the rest. Do I have write access (via WebID?)to something like [4] ? e.g., deleting an older label or triple :) [1] http://git2prov.org/ [2] https://linkeddata.uriburner.com/about/html/http/csarven.ca/call-for-linked-research [3] https://linkeddata.uriburner.com/about/html/http/csarven.ca/sense-of-lsd-analysis%01comment_20140808164434 [4] http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 -Sarven smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 10/8/14 3:13 PM, Sarven Capadisli wrote: On 2014-10-08 18:38, Kingsley Idehen wrote: Sarven, Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked Open Data URI that identifies Github issue tracker for Linked Data Research: [1] http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 -- Linked Open Data URI (basic entity description page) [2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose over relations facets oriented entity description page [3] http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data -- Vapor Report (re., Linked Open Data principles adherence) . It's pretty cool that you can grab stuff out of GitHub issues, even comments! Papers link to code and then to commits and issues. See also [1]. Even comments e.g., [2]. Or even in the direction of paper comments which can be integrated and picked right up from the page e.g., [3]. Just need to add ±1 buttons and triplify the review ;) With WebID+ACL, we have the rest. Do I have write access (via WebID?)to something like [4] ? e.g., deleting an older label or triple :) [1] http://git2prov.org/ [2] https://linkeddata.uriburner.com/about/html/http/csarven.ca/call-for-linked-research [3] https://linkeddata.uriburner.com/about/html/http/csarven.ca/sense-of-lsd-analysis%01comment_20140808164434 [4] http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4 -Sarven Yes, there are WebID+TLS and/or NetID+TLS based ACLs [1][2][3] in place. In addition, you can always make a full TURTLE doc in some data space, or embed your TURTLE in any text slot (e.g., comments or description fields) provided by a Web app/service using Nanotation, and you are set re. payload for upload into URIBurner. Basically, you have the following RWW options: 1. Append RDF statements to the existing the RDF document (named graph) identified by IRI http://csarven.ca/sense-of-lsd-analysis -- all you do is refresh the URIBurner URI as data changes in github (?sponger:get=add at the end of a URIBurner URI has this effect) 2. Overwrite statements in the existing RDF document (named graph) -- simply add ?@Lookup@=refresh=clean to the end of the URIBurner URI, for this effect. Of course there's lots more, but I'll let this flow one step at a time :-) Links: [1] http://bit.ly/enterprise-identity-management-and-attribute-based-access-controls [2] http://www.slideshare.net/kidehen/how-virtuoso-enables-attributed-based-access-controls/34 -- WebID-TLS (authenticates WebIDs) [3] hhttp://www.slideshare.net/kidehen/how-virtuoso-enables-attributed-based-access-controls/40 -- NetID-TLS (authenticates LinkedIn, Facebook, Twitter, G+, Amazon, Dropbox, and many other identities) [4] http://bit.ly/blog-post-about-nanotation -- Nanotation (this SHOULD work wherever you're able to input plain text). -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
* Luca Matteis lmatt...@gmail.com [2014-10-07 00:41+0200] Sorry to jump into this once again but when it comes to typesetting nothing really comes close to Latex/PDF: http://tex.stackexchange.com/questions/120271/alternatives-to-latex - not even HTML/CSS/JavaScript Making a floating model look like Latex/PDF at all resolutions seems impossible. Perhaps targeting a fixed (A4 or 8½×11 @300dpi) resolution is quite doable. Doing so allows one to use fixed position for all CSS directives. But Eric, that sucks!! Well, sort of, because we can't conveniently read it on a phone and it doesn't fill large displays, but that may be a small price to pay to be able to use all of the rich markup that we wax poetic about on this list. If it does work, then we can figure out ways to script it so it has a simply-controlled, predictable behavior at a certain resolution but is reasonable at arbitrary resolutions. On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote: querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. just querying metadata is not enough. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. The following might add something to this conversation. It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF. Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet. This has the document metadata, the abstract, and an illustrative bit of argumentation. Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF. Incorporating \begin{tabular} tables would be rather tricky, but not impossible. I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic. It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start. Generating and querying this PDF seems pretty simple to me. $ cat test-xmp.tex \documentclass{article} \usepackage{xmp-management} \title{This is a test file} \author{Norman Gray} \date{2014 October 6} \begin{document} \maketitle \abstract{It's easy to include metadata in \LaTeX\ files. That's because there's plenty of metadata in there already.} There is text and metatext within files. \section{Further details} In this section we could potentially discuss moving information around. I think we can assert that \claim{it is easy to move information around}, and, further, that \claim{making metadata readily available is a Good Thing}. I hope that clears that up. \end{document} $ cat xmp-management.sty \ProvidesPackage{xmp-management}[2014/10/06] \newwrite\xmp@ttlfile \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl \let\xmp@open\relax} \long\def\xmp@stmt#1#2{% \xmp@open \write\xmp@ttlfile{ #1 #2.}} \let\xmp@origtitle\title \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}} \let\xmp@origauthor\author \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}} \let\xmp@origdate\date \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}} \long\def\abstract#1{ \xmp@stmt{dc:abstract}{#1} \begin{quotation}\textbf{Abstract:} #1\end{quotation}} \def\claim#1{ \xmp@stmt{xmpinfo:claim}{#1} \emph{#1}} \let\xmp@origsection\section \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1} \xmp@origsection{#1}} \usepackage{xmpincl} \AtBeginDocument{\includexmp{info}} $ pdflatex test-xmp This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) restricted \write18 enabled. entering extended mode (./test-xmp.tex LaTeX2e 2011/06/27 [...BLAH...] Output written on test-xmp.pdf (1 page, 75667 bytes). Transcript written on test-xmp.log. $ cat test-xmp.ttl dc:title This is a test file. dc:creator Norman Gray. dc:created 2014 October 6.
Re: scientific publishing process (was Re: Cost and access)
+1 This is precisely one of the main ideas we pursued in Wf4Ever. The paper in whatever format is not enough, you also need to preserve the methods and their implementation, including the workflows and the datasets, not only for validation and reproducibility purpose in the face of publication but ultimately for incremental reuse and scientific development. Publications indeed shouldn't be seen as a static piece of paper but rather as a (linked) piece of knowledge which can be revised and evolve in time. So, tooling is required that supports the management of the lifecycle of such knowledge, from creation of specific research objects to reuse, including ways to deal with decay and exploration and inspection capabilities. In this direction, we took incremental steps through actual deployments of project outcomes in the previously mentioned platforms. Furthermore, we also integrated almost the whole set of functionalities into the ROHub.org platform, which was demonstrated in the last Semantic Publishing Challenge in ESWC [1,2] as a step forward in the direction you mention. To me it would make absolute sense to see further community pull of this kind of tooling, starting with their utilization in the conferences and journals of our own field (ESWC, ISWC, etc.) in order to incubate, gain traction, and draw conclusions that we could generalize to other domains. If this sounds appealing to the folks in this list, please let me know. Cheers, Jose [1] http://2014.eswc-conferences.org/sites/default/files/eswc2014-challenges_spc_submission_3.pdf [2] http://2014.eswc-conferences.org/program/semwebeval On 04/10/2014 13:14, Hugh Glaser wrote: (c) Workflows and Datasets I have mentionedhttp://www.myexperiment.org before, but can’t remember if I have mentionedhttp://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for examplehttps://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. -- Dr. Jose Manuel Gomez-Perez Director RD jmgo...@isoco.com #T +34913349797 #M +34609077103 Avda. del Partenón 10, Planta 1, Oficina 1.3A Campo de las Naciones 28042 Madrid, Spain iSOCO enabling the networked economy www.isoco.com P Please consider your environmental responsibility before printing this e-mail --- Este mensaje no contiene virus ni malware porque la protección de avast! Antivirus está activa. http://www.avast.com
Re: scientific publishing process (was Re: Cost and access)
Kingsley and all, hello. On 2014 Oct 7, at 02:18, Kingsley Idehen kide...@openlinksw.com wrote: On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote: On 10/06/2014 11:03 AM, Kingsley Idehen wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the Huh? Every single PDF reader that I use can extract the PDF metadata and display it. Again, this isn't about metadata. With all respect to the larger goal of having fully semanticked-up documents, I think the question _is_ all about metadata. The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better. _One_ thing it would be better for is supporting the sort of full-scale RDF-everything view that you've described so eloquently. But if that's your goal, then lexing the source text is really going to be the least of your problems. A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles. That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation. That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places. So no, not just about 'metadata' in the narrow sense, but I think this thread is about what RDF you can in practice extract from the materials that authors can in practice be induced or obliged to submit to conference proceedings. That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'. I believe that the demo in my earlier message undermines that claim as far as RDF goes. 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) Well, the extractors would be specific to PDF, but that's hardly surprising, I think. [I've lost track of whose comment this is...] The extractor I demoed wasn't PDF-specific. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered. If these costs are eliminated or at least minimized then this good is much more likely to be realized. With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs. I forgot to attach the extractor I wrote -- done. The demo didn't use any Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it. All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK extract-xmp.c Description: Binary data
Re: scientific publishing process (was Re: Cost and access)
Eric, hello. This is a bit of a side-issue, but... On 2014 Oct 7, at 07:13, Eric Prud'hommeaux e...@w3.org wrote: * Luca Matteis lmatt...@gmail.com [2014-10-07 00:41+0200] Sorry to jump into this once again but when it comes to typesetting nothing really comes close to Latex/PDF: http://tex.stackexchange.com/questions/120271/alternatives-to-latex - not even HTML/CSS/JavaScript Making a floating model look like Latex/PDF at all resolutions seems impossible. Perhaps targeting a fixed (A4 or 8½×11 @300dpi) resolution is quite doable. This isn't as hard as you might think (if I'm understanding you correctly). At http://purl.org/nxg/text/general-relativity I have some lecture notes. The downloads there include: http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2.pdf http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2-usletter.pdf http://www.astro.gla.ac.uk/users/norman/lectures/GR/part2-screen.pdf Those come from the _same_ source file with different \documentclass options (I keep meaning to do something about the marginal notes in the screen version, but have never got around to it). There's no resolution/DPI problem, because these are all vector fonts, not bitmaps. There's should be no 'missing font' problem because the fonts are automatically embedded properly (the maths font in those documents is a commercial one, so it's unlikely to be on your computer). This won't dynamically reflow, it's true (and that's a pity), but if I ever get a tablet computer, I doubt I'll be able to resist producing versions in a layout which is targeted at that size of screen. All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: scientific publishing process (was Re: Cost and access)
On 10/7/14 5:39 AM, Norman Gray wrote: Kingsley and all, hello. On 2014 Oct 7, at 02:18, Kingsley Idehen kide...@openlinksw.com wrote: On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote: On 10/06/2014 11:03 AM, Kingsley Idehen wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the Huh? Every single PDF reader that I use can extract the PDF metadata and display it. Again, this isn't about metadata. With all respect to the larger goal of having fully semanticked-up documents, I think the question _is_ all about metadata. It can't be. The metadata focus is a subtle misconception. We need access to all of the data in the document. The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better. The initial gripe (as I've always seen it) is that we are trying to tell the world about Linked Open Data virtues while rarely putting them to use (instinctively) ourselves. It just so happens that conferences are provide an example that most have experienced in some capacity. _One_ thing it would be better for is supporting the sort of full-scale RDF-everything view that you've described so eloquently. But if that's your goal, then lexing the source text is really going to be the least of your problems. A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles. Yes, or just make references to RDF sources relevant to the paper, but on the basis that those references (to the degree possible) resolve. This also about the data represented in tabular form (as tables) and the data behind the tables, so to speak. That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation. That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places. So no, not just about 'metadata' in the narrow sense, but I think this thread is about what RDF you can in practice extract from the materials that authors can in practice be induced or obliged to submit to conference proceedings. For those conferences associated with themes such as Linked Open Data and the Semantic Web, RDF should be the norm for structured data representation. If that isn't possible then what are we saying to the world about RDF, in regards to structured data representation and data de-silo-fication? That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'. The are linked :-) I believe that the demo in my earlier message undermines that claim as far as RDF goes. 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) Well, the extractors would be specific to PDF, but that's hardly surprising, I think. [I've lost track of whose comment this is...] The extractor I demoed wasn't PDF-specific. Platform in the context of my comments really relates to operating systems i.e., most PDF extractors are operating system specific. That's why I mentioned the massive opportunity for Adobe (and 3rd parties too, as Mike Bergman added) in regards to providing Web Services to accessing and indexing PDF document content. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered. If these costs are eliminated or at least minimized then this good is much more likely to be realized. With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs. I forgot to attach the extractor I wrote -- done. The demo didn't use any Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it. You forgot the extractor demo link :) All the best, Norman -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
The stack exchange discussion mostly talks about the user side of things. Go back (quite) a few years and using PDF from tex was a pain, pretty much up until pdflatex become the norm. For those who thing that latex is still the best, I do not see that an HTML centric publishing framework should be a barrier. If the majority of papers were being produced from Word, then it might be more of an issue. Phil Luca Matteis lmatt...@gmail.com writes: Sorry to jump into this once again but when it comes to typesetting nothing really comes close to Latex/PDF: http://tex.stackexchange.com/questions/120271/alternatives-to-latex - not even HTML/CSS/JavaScript
Re: scientific publishing process (was Re: Cost and access)
Norman Gray nor...@astro.gla.ac.uk writes: This won't dynamically reflow, it's true (and that's a pity), but if I ever get a tablet computer, I doubt I'll be able to resist producing versions in a layout which is targeted at that size of screen. Sure, that's fine. But why not have a version which behaves reasonably at all screen sizes. This should be achievable. Phil
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-07 11:39, Norman Gray wrote: The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better. Straw man argument. Please stop that now! I will spell out the main proposal and purpose for you because it sounds like you are completely oblivious to them. Let me know if anything is unclear. * Conferences on SW/LD research should encourage and allow submissions using the Web native technology stack (e.g., starting from HTML and friends for instance) alongside the existing requirements. As the required submission in PDF can be generated via HTML+CSS, those that wish to arrive at the PDF by their own means can still do so, meanwhile without asking or forcing the existing authorship or review process to change. It is backwards compatible. The underlying idea is to use our own technologies, not only for the sake of using them, but also to identify the pains as a precursor to raising the quality of the (Semantic) Web stack for scientific research publishing, discovery, and reuse. This is plain and simple dogfooding and it is important. * There is an opportunity for granular data discovery, reuse, and machines to aid in reproducibility of scientific research. This goes completely beyond off the shelf metadata e.g., author, title, subject, or what you can stuff into LaTeX+Whatever, not to mention mangling around what's primarily intended for desktop and print, to squeeze in some Web in there. We are talking about making reasonable strides towards having scientific knowledge that is universally accessible on the Web. PDF and friends do not fit into that equation that well, however, no one is blocked from doing what they already do. Some of us would like to do a bit more than that to test things out so that we can collectively have more wins. * There is also an opportunity to attract more funding and interest groups, if we can better assess the state of Web Science. This is simply due to the fact that we would be able to mine more useful information from existing research. Moreover, we can identify research areas of potential value better. It is to elevate the support that we can get from machines to excel and to do our work better. This is in contrast to what we can currently achieve with the existing workflow i.e., the current process is only concerned about making it easy for the author, reviewer, and publisher, and not about gleaning high-fidelity information. A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles. That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation. That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places. That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'. I believe that the demo in my earlier message undermines that claim as far as RDF goes. Let me get this right: you are advocating that LaTeX + RDF/XML + whatever processes one has to go through, is a more sensible approach than HTML? If so, we have a different view on what creates a good UX. It may come as news to you, but the SW/LD community is not in favour of authors using RDF/XML unless it is completely within some tool-chain left for machines to deal with. There are alternative RDF notations which are more preferable. You should look it up. The problem with your proposal is that, the author has to boggle their mind with two completely different syntaxes (LaTeX and RDF/XML), whereas the original proposal was to deal with one i.e., HTML. Styling is no more of an issue as the templates in the case of LaTeX is provided, and for HTML, I've made a modest PoC with: https://github.com/csarven/linked-research However, you are somehow completely oblivious to that even though it was mentioned several times now on this mailing list. No, it is not perfect, and yes it can be better. There are alternative solutions to achieve something along those lines with the same vision in mind, which area all okay too. If this is not about coding, but rather using WYSIWYG editors or authoring/publication tools, have a look and try a few here or from a service near you: * http://en.wikipedia.org/wiki/Comparison_of_HTML_editors * http://en.wikipedia.org/wiki/List_of_content_management_systems Or you know, take 30 seconds to create a WordPress account and another 30 seconds to publish. Let me know if you still think that's insufficient or completely unreasonable / difficult for Web Science people to handle. So, *do as you like, but do not prevent me* from doing encouraging the SW/LD
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for conferences? http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex I have tex4ht installed, but there is no xhmlatex file to be found. I managed to find what appears to be a good command line I don't know why that would be. It is installed with the debian package, although as I said, it is not in the system path. I found it with dpkg -S. Am afraid it's a long time since I used an RPM based system, so I can't remember how do this on fedora. htlatex schema-org-analysis.tex xhtml,mathml -cunihtf -cvalidate This looks better when viewed, but the resultant HTML is unintelligible. There is definitely more work needed here before this can be considered as a potential solution. Yes, I agree. So, the question is how to enable this. One way would, for example, be for ISWC and ESWC to accept HTML and have a prize for the best semantic paper submitted. Then people with the inclination would do the work. Again, I suspect it's not that much, but we will not know until we try. Phil
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 11:00 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. It *is* exactly what you are talking about. Well, maybe I was not being clear, but I thought that I was talking about rendering changes interfering with comprehension of the authors' intent. And if only you had a definition of rendering changes that interfere with authors intent as opposed to just rendering changes. I can guarantee that rendering a paper to speech WILL change at least some of the authors intent because, for example, figures will not reproduce. You state that this should be avoided at all costs. I think this is wrong. There are many reasons to change rendering. That should be the readers choice. Phil
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: So, you believe that there is an excellent set of tools for preparing, reviewing, and reading scientific publishing. Package them up and make them widely available. If they are good, people will use them. Convince those who run conferences. If these people are convinced, then they will allow their use in conferences or maybe even require their use. Is that not the point of the discussion? Unfortuantely, we do not know why ISWC and ESWC insist on PDF. I'm not convinced by what I'm seeing right now, however. Sure, but at least the discussion has meant that you have looked at some of the tools again. That's no bad thing. My question would be, are more convinced than you were last time you looked or less? Phil
Re: scientific publishing process (was Re: Cost and access)
What I'd suggest for conference organisers is something like the following: 1. Keep the PDF as the main thing, as it's not going anywhere soon. 3. Also allow submission in some alternative form, including semantic content, and have the conference run a competition for alternative publishing forms - including voting by delegates on what they like and what they want. this could promote such alternative forms and offer a migration route over time. Robert. On 07/10/2014 13:27, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: So, you believe that there is an excellent set of tools for preparing, reviewing, and reading scientific publishing. Package them up and make them widely available. If they are good, people will use them. Convince those who run conferences. If these people are convinced, then they will allow their use in conferences or maybe even require their use. Is that not the point of the discussion? Unfortuantely, we do not know why ISWC and ESWC insist on PDF. I'm not convinced by what I'm seeing right now, however. Sure, but at least the discussion has meant that you have looked at some of the tools again. That's no bad thing. My question would be, are more convinced than you were last time you looked or less? Phil -- Professor Robert Stevens Bio-health Informatics Group School of Computer Science University of Manchester Oxford Road Manchester United Kingdom M13 9PL robert.stev...@manchester.ac.uk Tel: +44 (0) 161 275 6251 Blog: http://robertdavidstevens.wordpress.com Web: http://staff.cs.manchester.ac.uk/~stevensr/ KBO
Re: scientific publishing process (was Re: Cost and access)
If you mean that published papers have to be in PDF, but that they can optionally have a second format, then I had no problem with this proposal. I also have no problem with encouraging use of other formats. However, this is an added burden on conference organizers. Someone would have to volunteer to handle the extra work, particularly the work involved in checking that papers using the second format abide by the publishing requirements. peter On 10/07/2014 05:52 AM, Robert Stevens wrote: What I'd suggest for conference organisers is something like the following: 1. Keep the PDF as the main thing, as it's not going anywhere soon. 3. Also allow submission in some alternative form, including semantic content, and have the conference run a competition for alternative publishing forms - including voting by delegates on what they like and what they want. this could promote such alternative forms and offer a migration route over time. Robert. On 07/10/2014 13:27, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: So, you believe that there is an excellent set of tools for preparing, reviewing, and reading scientific publishing. Package them up and make them widely available. If they are good, people will use them. Convince those who run conferences. If these people are convinced, then they will allow their use in conferences or maybe even require their use. Is that not the point of the discussion? Unfortuantely, we do not know why ISWC and ESWC insist on PDF. I'm not convinced by what I'm seeing right now, however. Sure, but at least the discussion has meant that you have looked at some of the tools again. That's no bad thing. My question would be, are more convinced than you were last time you looked or less? Phil
Re: scientific publishing process (was Re: Cost and access)
On 10/07/2014 05:27 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: So, you believe that there is an excellent set of tools for preparing, reviewing, and reading scientific publishing. Package them up and make them widely available. If they are good, people will use them. Convince those who run conferences. If these people are convinced, then they will allow their use in conferences or maybe even require their use. Is that not the point of the discussion? Not at all. Where was the proposal to put together something that met the requirements of preparing, reviewing, and publishing scientific papers? To me, the initial discussion was about how much better HTML was for carrying data. Other aspects of paper preparation, review, and publishing were not being considered. Now, maybe, aspects of presentation and review and ease of use are part of the discussion. A change in the paper submission process needs to take into account what the paper submission process is about, not just some aspect of what might be included in submitted papers. Unfortuantely, we do not know why ISWC and ESWC insist on PDF. As far as I am concerned, ISWC and ESWC insist on PDF for submissions because the reviewing process is so much better with PDF than with anything else. I'm not convinced by what I'm seeing right now, however. Sure, but at least the discussion has meant that you have looked at some of the tools again. That's no bad thing. My question would be, are more convinced than you were last time you looked or less? Well, I remain totally unconvinced that any current HTML solution is as good as the current PDF setup. Certainly htlatex is not suitable. There may be some way to get tex4ht to do better, but no one has provided a solution. Sarven Capadisli sent me some HTML that looks much better, but even on a math-light paper I could see a number of glitches. I haven't seen anything better than that. It's not as if the basics (MathML, CSS, etc.) are unavailable to put together most, or maybe even all, of an HTML-based solution. These basics have been around for some time now. However, I haven't seen a setup that is as good as LaTeX and PDF for preparation, review, and publishing of scientific papers. Yes, it took a lot of effort to get to the current state with respect to LaTeX and PDF. In the past, I experienced quite a number of problems with using LaTeX and PDF for writing, reviewing, and publishing scientific papers, but most of these are in the past. Yes, there are still some problems with using LaTeX and PDF. Produce something better and people will use it, eventually. Phil peter
Re: scientific publishing process (was Re: Cost and access)
On 10/07/2014 05:23 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 11:00 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. It *is* exactly what you are talking about. Well, maybe I was not being clear, but I thought that I was talking about rendering changes interfering with comprehension of the authors' intent. And if only you had a definition of rendering changes that interfere with authors intent as opposed to just rendering changes. I can guarantee that rendering a paper to speech WILL change at least some of the authors intent because, for example, figures will not reproduce. You state that this should be avoided at all costs. I think this is wrong. There are many reasons to change rendering. That should be the readers choice. Phil I think that for reviewing the authors should be able to dictate how their submission looks, within the bounds of the submission requirements. If the reviewer wants, or needs, to change the way a submission is presented then it is up to the reviewer to ensure that their review is not coloured by this change. When I review papers I routinely point out presentation problems. Sometimes I take into account presentation problems when I evaluate papers. However, I try very hard to evaluate the submission based on what the authors submitted, not on any changes that I made to the submission. For example, I will point out problems with using colours in graphs, but I will evaluate the paper based on the coloured version of the graphs, not a black and white version. However, if the authors submitted low-resolution figures and something is missing because of this, then I feel free to take this into account in my evaluation. In a situation where I do not know what presentation the authors wanted, for example if explicit line breaks and indentation are sometimes preserved, but not always, the evaluation of submissions can become very much harder. peter
Re: scientific publishing process (was Re: Cost and access)
On 10/07/2014 05:20 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for conferences? http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex Somehow this is not in my tex4ht package. In any case, the HTML output it produces is dreadful. Text characters, even outside math, are replaced by numeric XML character entity references. peter
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for conferences? http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex Somehow this is not in my tex4ht package. In any case, the HTML output it produces is dreadful. Text characters, even outside math, are replaced by numeric XML character entity references. So, I am willing to spend some time getting this to work. I would like to plug some ESWC papers into tex4ht, to get some HTML which works plain and also with Sarven's templates so that it *looks* like a PDF. Would you be willing to a) try it and b) give worked and short test cases for things that do not work? Phil
Re: scientific publishing process (was Re: Cost and access)
Hi John, Kingsley, et al, On Mon, Oct 6, 2014 at 8:39 AM, John Erickson olyerick...@gmail.com wrote: This is an incredibly rich and interestingly conversation. I think there are two separate themes: 1. What is required and/or asked-for by the conference organizers... a. ...that is needed for the review process b. ...that is needed to implement value-added services for the conference c. ...that contributes to the body of work 2. What is required and/or asked for by the publisher? All of (1) is about the meat of the contributions, including establishing a long-term legacy. (2) is about (presumably) prestigious output. What added services could esp. Easychair provide that would go beyond 1.a. and contribute to 1.b. and 1.c., etc.? Are there any Easychair committers watching this thread? ;) John -- John S. Erickson, Ph.D. Deputy Director, Web Science Research Center Tetherless World Constellation (RPI) http://tw.rpi.edu olyerick...@gmail.com Twitter Skype: olyerickson This makes me think of PLoS. For example, PLoS has a published format guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard (http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website ( http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Interestingly as well, they have provided this framework in an open source form: http://www.ambraproject.org/ Clearly the publication process can support a semantic solution and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given even the debate concerning file formats here. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools. Cheers, Mark p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE Collections of Items), not the semantic structure contained within the Item contents (articles, citations, formulas, data tables, figures, ideas). In both platforms, these capabilities are in their infancy, lacking any rendering other than to offer the original file for download, they ultimately suffer from the absence of semantic structure in the content going into them. -- Mark R. Diggory
Re: scientific publishing process (was Re: Cost and access)
Sure, I have lots of papers (none for ESWC, though) that could serve as test cases. peter On 10/07/2014 07:49 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for conferences? http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex Somehow this is not in my tex4ht package. In any case, the HTML output it produces is dreadful. Text characters, even outside math, are replaced by numeric XML character entity references. So, I am willing to spend some time getting this to work. I would like to plug some ESWC papers into tex4ht, to get some HTML which works plain and also with Sarven's templates so that it *looks* like a PDF. Would you be willing to a) try it and b) give worked and short test cases for things that do not work? Phil
Re: scientific publishing process (was Re: Cost and access)
BLUF: This is where information science comes in. Technology must meet the needs of real users. It may be better to generate better Tagged PDFs, and to experiment, using some existing methodology annotation ontologies, with generating auxiliary files of triples. This might require new/changed latex packages, new div/span classes, etc. \huge But what is really needed is actually working with SMEs to discover the cultural practices within the field and subfield, and developing systems that support their work styles. This is why Information Science is important. If there are changes in practices that would be beneficial, and these benefits can be demonstrated to the appropriate audiences, then these can be suggested. If existing programs, libraries, and operating systems can be modified to provide these wins transparently, then it is easier to get the changes adopted. If the benefits require additional work, then the additional work must give proportionate benefits to those doing the work, or be both of great benefit to funding agencies or other gatekeepers, *and* be easily verifiable. An example might be a proof (or justified belief) that a paper and it's supplemental materials do, or do not contain everything required to attempt to replicate the results. This might be feasible in many fields through combination of annotation, with sufficiently powerful KR language and reasoning system. Similarly, relatively simple meta-statistical analysis can note common errors (like multiple comparisons that do not correct for False Discovery Rate). This can be easy if the analysis code is embedded in the paper (eg SWeave), or if the adjustment method is part of the annotation, and the decision process need not be total. This kind of validation can be useful to researchers (less embarrassment), and useful to gatekeepers (less to manually review). Convincing communities working with large datasets to use RDF as a native data format is unlikely to work. The primary problem is that it isn't a very good one. It's great for combining data from multiple sources- as long as ever datum is true. If you want to be less credulous , KMAC YOYO. Convincing people to add metadata describing values in structures as owl/rdfs datatypes or classes is much easier- for example, as HDF5 attributes. If the benefits require major changes to the cultural practices within a given knowledge community, then they must be extremely important *to that community*, and will still be resisted, especially by those most accultutrated into that knowledge community. An example of this kind of change might be inclusion in supplemental materials of analyses and data that did not give positive results. This reduces the file drawer effect, and may improve the justified level of belief in the significance of published results (p 1.0). This level of change may require a blood upgrade ( https://www.goodreads.com/quotes/4079-a-new-scientific-truth-does-not-triumph-by-convincing-its). It might also be imposable from above by extreme measures (if more than 10% of your claimed significant results can't be replicated, and you can't provide a reasonable explanation in a court of law, you may be held liable for consequential damages incurred by others reasonably relying on your work, and reasonable costs possible punitive damages for costs incurred attempting to replicate. Repeat offenders will be fed to a ravenous mob of psychology undergraduates, or forced to teach introductory creative writing ). Simon P. S. [dvips was much easier if you had access to Distiller] It is possible to add mathematical content to html pages, but it is not easy. MathML is not something that browser developers want, which means that the only viable approach is MathJax (http://mathjax.org). Mathjax is impressive, and supports a nice subset of LaTeX (including some AMS). However, it adds a noticeable delay to page rendering, as it is heavy duty eczema script, and is computing layout on the fly. It does not require server side support, so is usable from static sites like github pages (see e g. the tests at the bottom of http://who-wg.github.io). However the common deployment pattern, using their CDN, adds archival dependencies. From a processing perspective, this does not make semantic processing of the text much easier, as it may require eczema script code to be executed. On Oct 7, 2014 8:14 AM, Phillip Lord phillip.l...@newcastle.ac.uk wrote: On 10/07/2014 05:20 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for
Re: scientific publishing process (was Re: Cost and access)
On 10/7/14 1:14 PM, Norman Gray wrote: Sarven, hello. On 2014 Oct 7, at 13:13, Sarven Capadisli i...@csarven.ca wrote: On 2014-10-07 11:39, Norman Gray wrote: The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better. Straw man argument. Please stop that now! I will spell out the main proposal and purpose for you because it sounds like you are completely oblivious to them. Let me know if anything is unclear. My remark was intended as facetious rather than fractious, but if you feel I misjudged the balance, I apologise. I want to clarify what I meant, because on reflection it explains (at least to me) why I'm participating in this thread at such length. My intention was to indicate that I don't feel that HTML is as central as you, amongst others, seem to assert it is. I characterise the web as: 1. URIs for addressing things, 2. HTTP for retrieving things (other protocols exist, but...), 3. a downloadable format which clients can parse to obtain more URIs, with a 'follow this' semantic. How about: 1. HTTP URIs for naming (or identifying) things -- basically, the combined effects of denotation (signification) and connotation (perceptible description) 2. RDF abstract language for describing things -- systematic use of signs, syntax, and role semantics for communication 3. Notations for inscribing RDF language based descriptions to documents -- where notations serve the medium-specific purpose of representing the words of a language. Once you have the base RDF Document in place, using a preferred notation, and subject to viewer preferences, you transform the RDF document into other document types (HTML, PDF, etc..), in line with viewer preferences. Now, the obvious candidate for (3) is of course HTML; but on the web, and _especially_ on the Semantic Web, it can be anything: RDF in one or other format, XML+GRDDL, some discipline-specific format with has a link semantic in it, or even a PDF file with a standardised lump of RDF/XMP inside it. The trouble with the paragraph above is that RDF isn't a format. That presumption is the root of mass confusion. That RDF may be immediately present, or it may require some sort of heuristic or deterministic extraction (as Kingsley has discussed). All of these are web-native technologies, and I'd go as far as to say that the _least_ interesting thing you can find at the end of a URI is an HTML file. For sure! The big deal, for me, in the idea of the Semantic Web, and the RDF world, is the realisation that the RDF model is sufficiently general that you can turn almost any structured data into RDF, put it into a big bucket, and start inferencing, querying, linking, and so on. That generation/extraction of RDF is probably easier if the stuff is already pointy-bracketed for you, but that's only a detail. Yes, which is why we have to think of RDF (accurately) as a Language, and never a format. The format issue is something that should have been attended to years ago in W3C literature i.e., the notion of abstract and concrete syntaxes leads to the misconception that RDF is about document content formats. The loose-coupling of language (signs, syntax, and semantics) and notations (representation of words of a language) aspect isn't visible, and as a result lost or overlooked (on a good day). JSON-LD and TURTLE are all accurately pitched (across all related collateral) as Notations. Funnily enough, each is also associated with significant RDF uptake initiatives: TURTLE re., the LOD Cloud and JSON-LD re., Google, Bing!, Yandex, and possibly Yahoo!, as major RDF supporters and adopters that are driving mass production of HTML documents that include RDF-language based structured data (inline or via structured data islands using script/) . The interesting thing, for me, is just how the web as a whole can go about collectively managing or facilitating this generation/extraction in a way which balances faithfulness to the original with interoperable meaning (Dublin Core and FOAF are truly wonderful things). That is why I do feel that -- especially in this SW/LD community -- HTML is a bit of a sideshow. Yes, it is, but I think Sarven uses it as a simple starting point i.e., a point of least distraction, so to speak. HTML is a splendid thing for all the reasons that you know and I know, but if it's seen as central, if all questions turn into what does that look like in HTML?, if it's so in-our-face that we can't see round it, then we miss the interesting questions. Yes! So it's not that I've a particular downer on HTML, or a particular enthusiasm for PDF, but I think that what does that look like in PDF? and what does that look like in FITS? (the format of choice in my area) are more interesting. Yes. (or put another way, I don't think that
Re: scientific publishing process (was Re: Cost and access)
PLOS is an interesting case. The HTML for PLOS articles is relatively readable. However, the HTML that the PLOS setup produces is failing at math, even for articles from August 2014. As well, sometimes when I zoom in or out (so that I can see the math better) Firefox stops displaying the paper, and I have to reload the whole page. Strangely, PLOS accepts low-resolution figures, which in one paper I looked at are quite difficult to read. However, maybe the PLOS method can be improved to the point where the HTML is competitive with PDF. peter This makes me think of PLoS. For example, PLoS has a published format guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard (http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML http://www.plosone..org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Interestingly as well, they have provided this framework in an open source form: http://www.ambraproject.org/ Clearly the publication process can support a semantic solution and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given even the debate concerning file formats here. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools. Cheers, Mark p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE Collections of Items), not the semantic structure contained within the Item contents (articles, citations, formulas, data tables, figures, ideas). In both platforms, these capabilities are in their infancy, lacking any rendering other than to offer the original file for download, they ultimately suffer from the absence of semantic structure in the content going into them. -- Mark R. Diggory
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-06 06:59, Ivan Herman wrote: Of course, I could expect a Web technology related crows to use HTML source editing directly but the experience by Daniel and myself with the World Wide Web conference(!) is that people do not want to do that. (Researchers in, say, Web Search have proven to be unable or unwilling to edit HTML source. It was a real surprise...). Ie, the authoring tool offers are still limited. Can you please elaborate on that? When was that and what tools were available or used? Do you have any documentation on the landscape from that time that we can use or learn from? My understanding is that, you've experienced some issues about a decade ago and your reasoning is clouded by that. Do you think that it would be fair to revisit the situation based on today's landscape and see how it will play out? From my perspective, we should have a bit more faith in the SW community because then we might actually strive to deliver, as opposed to walking away from the problem. Like I said in my previous emails, (which I'm sure you've read), the current workshops on SW/LD research publishing did not deliver. Why do you have so much faith for waiting out and hope that they will deliver? They might, and I hope they do. But, I'm not putting all my chips on that option alone. I would rather see grass-roots efforts in parallel e.g., http://csarven.ca/call-for-linked-research What's the number of human hours on CfP on Linked Science + Semantic Publishing so far? How was the delivery of machine and human-friendly research changed or evolved? What's visible or countable? On that front, what can we do right now that wasn't possible 5-10 years ago? In the meantime, if the conferences, workshops can get back on track and motivate people (at least), we would not only see more value drawn out of the SW research, but also growing funding opportunities, and faster progress across the field. I am disappointed by the fact that instead of addressing the core issue can the conferences allow or encourage the Web stack? we are discussing distractions e.g., perfection in authoring tools. Every user has their own preferences i.e., some will code, some will use tool X. What you are suggesting is that, lets wait it out because the developments may reveal the perfect authorship tooling. If that was ever the case, we'd see it in the general market, not something that might one day emerge out of SW/LD workshops. I will bet that if the requirements evolve towards Webby submissions, within 3-5 years time, we'd see a notable change in how we collect, document and mine scientific research in SW. This is not just being hopeful. I believe that if all of the newcomers into the (academic) research scene start from HTML (and friends) instead of LaTeX/Word (and friends), we wouldn't be having this discussion. If the newcomes are told to deal with LaTeX/Word (regardless of hand coding or using a WYSIWYG editor) today, they are going to do exactly that. That basically pushes the date further for complete switch over to Webby tools because majority of those researchers would have to be flushed out of the system, before the next wave of Webby users can have their chance. Even if we have all of the perfect or appropriate tooling (which I think is the wrong thing to aim for) right now, it will still take a few years to flush out or have the current LaTeX/Word users to evolve. I would rather see the smallest change happen right now than nothing at all. *AGAIN*, technology is not the problem. #DIY -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: One problem with allowing HTML submission is ensuring that reviewers can correctly view the submission as the authors intended it to be viewed. How would you feel if your paper was rejected because one of the reviewers could not view portions of it? At least with PDF there is a reasonably good chance that every paper can be correctly viewed by all its reviewers, even if they have to print it out. I don't think that the same claim can be made for HTML-based systems. I don't think this is a valid point. It is certainly possible to write HTML that will not be look good on every machine, but these days, it is easier to write HTML that does. The same is true with PDF. Font problems used to be routine. And, as other people have said, it's very hard to write a PDF that looks good on anything other than paper. Further, why should there be any technical preference for HTML at all? (Yes, HTML is an open standard and PDF is a closed one, but is there anything else besides that?) Web conference vitally use the web in their reviewing and publishing processes. Doesn't that show their allegiance to the web? Would the use of HTML make a conference more webby? PDF is, I think, open these days. But, yes, I do think that conferences should dog food. I mean, what would you think if W3C produced all of their documents in PDF? Would that make sense? Phil
Re: scientific publishing process (was Re: Cost and access)
Luca Matteis lmatt...@gmail.com writes: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) Yes, you can. Most of the publishers use XML at some point in their process, and latex gets exported to that. I am quite happy to keep LaTeX as a user interface, because it's very nice, and the tools for it are mature for academic documents (in practice, this means cross-referencing and bibliographies). So, as well as providing a LNCS stylesheet, we'd need a htlatex cf.cfg, and one CSS and it's done. Be good to have another CSS for on-screen viewing; LNCS's back of a postage stamp is very poor for that. Phil
Re: scientific publishing process (was Re: Cost and access)
Sarven Capadisli i...@csarven.ca writes: I will bet that if the requirements evolve towards Webby submissions, within 3-5 years time, we'd see a notable change in how we collect, document and mine scientific research in SW. This is not just being hopeful. I believe that if all of the newcomers into the (academic) research scene start from HTML (and friends) instead of LaTeX/Word (and friends), we wouldn't be having this discussion. If the newcomes are told to deal with LaTeX/Word (regardless of hand coding or using a WYSIWYG editor) today, they are going to do exactly that. I would look at an environment which has less external force. The free software engineering community produces it's documents in a very wide-range of formats. If you peruse github, the key characteristics are, I think: that they are text formats because they are easy to version with source and are hackable; and mostly they dump to HTML. PDFs are very rare these days. It would be fun to see what the most used are. Markdown is a big contender, as we as language specific formats (python and reStructuredText for example). I don't believe that HTML is a good authoring format any more than PDF is. I don't think see this as huge problem. HTML needs to be part of the tool-chain, not all of it. Phil
Re: scientific publishing process (was Re: Cost and access)
On 10/6/14 7:43 AM, Phillip Lord wrote: I don't believe that HTML is a good authoring format any more than PDF is. I don't think see this as huge problem. HTML needs to be part of the tool-chain, not all of it. +1 -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Hello, My apologies if this is a repost (errors were encountered and my last post bounced from the listserv)... On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. Best, Luca There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word ( https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side ( https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex ( http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard ( http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website ( http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Clearly the publication process can support solutions and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given that both file formats and efforts to produce the ideal solution are inherently localized, competitive and diverse, not collaborative in nature. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools, not the one that was perfectly designed to meet all tomorrows hypothetical requirements. Cheers, Mark Diggory p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE collections of files), not the semantic structure contained within those files (ideas, citations, formulas, data tables, figures). In both cases, these capabilities are in their infancy and without any strict format and content driven publication workflow, and lacking any rendering other than to offer the file for download, they ultimately suffer from the same need for a common Semantic Document format that can be leveraged for rendering, referencing and indexing. -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com
Re: scientific publishing process (was Re: Cost and access)
On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote: Hello Community, On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word ( https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side ( https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex ( http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard ( http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website ( http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233 ). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Clearly the publication process can support solutions and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given that both file formats and efforts to produce the ideal solution are inherently localized, competitive and diverse, not collaborative in nature. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools, not the one that was perfectly designed to meet all tomorrows hypothetical requirements. Cheers, Mark Diggory p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE collections of files), not the semantic structure contained within those files (ideas, citations, formulas, data tables, figures). In both cases, these capabilities are in their infancy and without any strict format and content driven publication workflow, and lacking any rendering other than to offer the file for download, they ultimately suffer from the same need for a common Semantic Document format that can be leveraged for rendering, referencing and indexing. -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com
Re: scientific publishing process (was Re: Cost and access)
Hello Community, On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word ( https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side ( https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex ( http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard ( http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website ( http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Clearly the publication process can support solutions and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given that both file formats and efforts to produce the ideal solution are inherently localized, competitive and diverse, not collaborative in nature. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools, not the one that was perfectly designed to meet all tomorrows hypothetical requirements. Cheers, Mark Diggory p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE collections of files), not the semantic structure contained within those files (ideas, citations, formulas, data tables, figures). In both cases, these capabilities are in their infancy and without any strict format and content driven publication workflow, and lacking any rendering other than to offer the file for download, they ultimately suffer from the same need for a common Semantic Document format that can be leveraged for rendering, referencing and indexing. -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com
Re: scientific publishing process (was Re: Cost and access)
Frankly I don't see the reason for the hate on PDF files. I do a lot of reading on a tablet these days because I can take it to the gym or on a walk or in the car. Network reliability is not universal when I leave the house (even if I had a $10 a GB LTE plan) so downloaded PDFs are my document format of choice. There might be a lot of hypothetical problems with PDFs, and I am sure there is a better way to view files on a small screen, but practically I have no trouble reading papers from arXiv.org, books from oreilly.com, be these produced by a TeX-derived or Word-derived toolchains or a toolchain that involves a real page layout tool for that matter. On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com wrote: On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote: Hello Community, On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word ( https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side ( https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex ( http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard ( http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website ( http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233 ). This results In semantically meaningful XML that is transformed to HTML http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML Clearly the publication process can support solutions and when its in the best interest of the publisher. They will adopt and drive their own markup processes to meet external demand. Providing tools that both the publisher and the author may use independently could simplify such an effort, but is not a main driver in achieving that final result you see in PLoS. This is especially the case given that both file formats and efforts to produce the ideal solution are inherently localized, competitive and diverse, not collaborative in nature. For PLoS, the solution that is currently successful is the one that worked to solve todays immediate local need with todays tools, not the one that was perfectly designed to meet all tomorrows hypothetical requirements. Cheers, Mark Diggory p.s. Finally, on the reference of moving repositories such as EPrints and DSpace towards supporting semantic markup of their contents. Being somewhat of a participant in LoD on the DSpace side, I note that these efforts are inherently just Repository Centric, describing the the structure of the repository (IE collections of files), not the semantic structure contained within those files (ideas, citations, formulas, data tables, figures). In both cases, these capabilities are in their infancy and without any strict format and content driven publication workflow, and lacking any rendering other than to offer the file for download, they ultimately suffer from the same need for a common Semantic Document format that can be leveraged for rendering, referencing and indexing. -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 04:15 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: One problem with allowing HTML submission is ensuring that reviewers can correctly view the submission as the authors intended it to be viewed. How would you feel if your paper was rejected because one of the reviewers could not view portions of it? At least with PDF there is a reasonably good chance that every paper can be correctly viewed by all its reviewers, even if they have to print it out. I don't think that the same claim can be made for HTML-based systems. I don't think this is a valid point. It is certainly possible to write HTML that will not be look good on every machine, but these days, it is easier to write HTML that does. The same is true with PDF. Font problems used to be routine. And, as other people have said, it's very hard to write a PDF that looks good on anything other than paper. My aesthetics are different. I routinely view PDFs on my laptop, and find that they indeed look great. As I said before, I prefer PDF to HTML for viewing of just about any technical material on my computers. Yes, on limited displays two-column PDF may not be viewable at all. Single-column PDF should look good on displays with resolution of HD or better. When I view HTML documents, even the ones I have written, I have to do a lot of adjusting to get something that looks even half-decent on the screen. And when I print HTML documents, the result is invariably bad, and often very bad. However, my point was not about looking good. It was about being able to see the paper in the way that the author intended. My experience is that this is generally possible with PDF, but generally not possible with HTML. I do write papers with considerable math in them, so my experience may not be typical, but whenever I have tried to produce HTML versions of my papers, I have ended up quite frustrated because even I cannot get them to display the way I want them to. It may be that there are now good tools for producing HTML that carries the intent of the author. htlatex has been mentioned in this thread. A solution that uses htlatex would have the benefit of building on much of the work that has been done to make latex a reasonable technology for producing papers. If someone wants to create the necessary infrastructure to make htlatex work as well as pdflatex does, then feel free. Further, why should there be any technical preference for HTML at all? (Yes, HTML is an open standard and PDF is a closed one, but is there anything else besides that?) Web conference vitally use the web in their reviewing and publishing processes. Doesn't that show their allegiance to the web? Would the use of HTML make a conference more webby? PDF is, I think, open these days. But, yes, I do think that conferences should dog food. I mean, what would you think if W3C produced all of their documents in PDF? Would that make sense? Actually, I would have been very happy if W3C had produced all its technical documents in PDF. It would have made my life much easier. Phil peter
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 04:27 AM, Phillip Lord wrote: [On using htlatex for conferences.] So, as well as providing a LNCS stylesheet, we'd need a htlatex cf.cfg, and one CSS and it's done. Be good to have another CSS for on-screen viewing; LNCS's back of a postage stamp is very poor for that. Phil I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. Many non-scalable images were included, even for simple math. My carefully designed layout for examples was modified in ways that made the examples harder to understand. The footnotes did not show up at all in the printed version. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. peter
Re: scientific publishing process (was Re: Cost and access)
On 10/6/14 10:25 AM, Paul Houle wrote: Frankly I don't see the reason for the hate on PDF files. I do a lot of reading on a tablet these days because I can take it to the gym or on a walk or in the car. Network reliability is not universal when I leave the house (even if I had a $10 a GB LTE plan) so downloaded PDFs are my document format of choice. There might be a lot of hypothetical problems with PDFs, and I am sure there is a better way to view files on a small screen, but practically I have no trouble reading papers from arXiv.org, books from oreilly.com http://oreilly.com, be these produced by a TeX-derived or Word-derived toolchains or a toolchain that involves a real page layout tool for that matter. Paul, As I see it, the issue here is more to do with PDF being the only option, rather than no PDFs at all. Put differently, we are not using our horses for course technology (the Web that emerges from AWWW exploitation) to produce horses for course conference artifacts. Instead, we continue to impose (overtly or covertly) specific options that are contradictory, and of diminishing value. Conferences (associated with themes like Semantic Web and Linked Open Data) should accept submissions that provide open access to relevant research data. In a sense, imagine if PDFs where submitted without bibliographic references. Basically, that's what happening here with research data circa. 2014, where we have a functioning Web of Linked (Open) Data, which is based on AWWW. Loosely coupling the print-friendly documents (PDFs, Latex etc.), http-browser friendly documents (HTML), and actual raw data references (which take the form of 5-Star Linked Open Data ) is a practical staring point. Adding experiment workflow (which is also becoming the norm in the bio informatics realm) is a nice bonus, as already demonstrated by examples provided by Hugh Glaser (see: this weekend's thread). Kingsley On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com mailto:mdigg...@atmire.com wrote: On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com mailto:mdigg...@atmire.com wrote: Hello Community, On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com mailto:lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org mailto:i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word (https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard (http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233). This results In semantically meaningful XML that is transformed to HTML
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: However, my point was not about looking good. It was about being able to see the paper in the way that the author intended. Yes, I understand this. It's not something that I consider at all important, which perhaps represents our different view points. Readers have different preferences. I prefer reading in inverse video; I like to be able to change font size to zoom in and out. I quite like fixed width fonts. Other people like the two column thing. Other people want things read to them. Who cares what the authors intend? I mean, they are not reading the paper, are they? I do write papers with considerable math in them, so my experience may not be typical, but whenever I have tried to produce HTML versions of my papers, I have ended up quite frustrated because even I cannot get them to display the way I want them to. I've been using mathjax on my website for a long time and it seems to work well, although I am not maths heavy. It may be that there are now good tools for producing HTML that carries the intent of the author. htlatex has been mentioned in this thread. A solution that uses htlatex would have the benefit of building on much of the work that has been done to make latex a reasonable technology for producing papers. If someone wants to create the necessary infrastructure to make htlatex work as well as pdflatex does, then feel free. It's more to make htlatex work as well as lncs.sty works. htlatex produces reasonable, if dull, HTML of the bat. Phil
Re: scientific publishing process (was Re: Cost and access)
On Mon, Oct 6, 2014 at 5:29 PM, Phillip Lord phillip.l...@newcastle.ac.uk wrote: Who cares what the authors intend? I mean, they are not reading the paper, are they? Authors might have adjusted things that way specifically to deliver their message. I think being able to have consistent layouts *as the authors intend it* is a very important thing. It's also important on the Web: people want their site to look feel in a very specific and consistent way.
Re: scientific publishing process (was Re: Cost and access)
This is an incredibly rich and interestingly conversation. I think there are two separate themes: 1. What is required and/or asked-for by the conference organizers... a. ...that is needed for the review process b. ...that is needed to implement value-added services for the conference c. ...that contributes to the body of work 2. What is required and/or asked for by the publisher? All of (1) is about the meat of the contributions, including establishing a long-term legacy. (2) is about (presumably) prestigious output. What added services could esp. Easychair provide that would go beyond 1.a. and contribute to 1.b. and 1.c., etc.? Are there any Easychair committers watching this thread? ;) John On Mon, Oct 6, 2014 at 11:17 AM, Kingsley Idehen kide...@openlinksw.com wrote: On 10/6/14 10:25 AM, Paul Houle wrote: Frankly I don't see the reason for the hate on PDF files. I do a lot of reading on a tablet these days because I can take it to the gym or on a walk or in the car. Network reliability is not universal when I leave the house (even if I had a $10 a GB LTE plan) so downloaded PDFs are my document format of choice. There might be a lot of hypothetical problems with PDFs, and I am sure there is a better way to view files on a small screen, but practically I have no trouble reading papers from arXiv.org, books from oreilly.com, be these produced by a TeX-derived or Word-derived toolchains or a toolchain that involves a real page layout tool for that matter. Paul, As I see it, the issue here is more to do with PDF being the only option, rather than no PDFs at all. Put differently, we are not using our horses for course technology (the Web that emerges from AWWW exploitation) to produce horses for course conference artifacts. Instead, we continue to impose (overtly or covertly) specific options that are contradictory, and of diminishing value. Conferences (associated with themes like Semantic Web and Linked Open Data) should accept submissions that provide open access to relevant research data. In a sense, imagine if PDFs where submitted without bibliographic references. Basically, that's what happening here with research data circa. 2014, where we have a functioning Web of Linked (Open) Data, which is based on AWWW. Loosely coupling the print-friendly documents (PDFs, Latex etc.), http-browser friendly documents (HTML), and actual raw data references (which take the form of 5-Star Linked Open Data ) is a practical staring point. Adding experiment workflow (which is also becoming the norm in the bio informatics realm) is a nice bonus, as already demonstrated by examples provided by Hugh Glaser (see: this weekend's thread). Kingsley On Sun, Oct 5, 2014 at 5:43 PM, Mark Diggory mdigg...@atmire.com wrote: On Sun, Oct 5, 2014 at 2:39 PM, Mark Diggory mdigg...@atmire.com wrote: Hello Community, On Sun, Oct 5, 2014 at 1:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. There are tools, for example, theres already a bit of work to provide a plugin for semantic markup in Microsoft Word ( https://ucsdbiolit.codeplex.com/) and similar efforts on the Latex side (https://trac.kwarc.info/sTeX/) But, this is not a question of technology available to authors, but of requirements defined by publishers. If authors are too busy for this effort, then publishers facilitate that added value when it is in their best interest. For example, PLoS has a published format guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a workflow for semantically structuring their resulting output and their final output is well structured and available in XML based on a known standard ( http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the published HTML on their website (
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Phil
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let them produce this solution. If the group can get pre-approval of some conference, then more power to them. However, I'm not going to vote for any pre-approval of some future solution when the current situation is satisficing. Phil peter
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 08:29 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: However, my point was not about looking good. It was about being able to see the paper in the way that the author intended. Yes, I understand this. It's not something that I consider at all important, which perhaps represents our different view points. Readers have different preferences. I prefer reading in inverse video; I like to be able to change font size to zoom in and out. I quite like fixed width fonts. Other people like the two column thing. Other people want things read to them. Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Similarly for reading papers, if the rendering of the paper interferes with the authors' message, that is a failure of the process. I do write papers with considerable math in them, so my experience may not be typical, but whenever I have tried to produce HTML versions of my papers, I have ended up quite frustrated because even I cannot get them to display the way I want them to. I've been using mathjax on my website for a long time and it seems to work well, although I am not maths heavy. It may be that there are now good tools for producing HTML that carries the intent of the author. htlatex has been mentioned in this thread. A solution that uses htlatex would have the benefit of building on much of the work that has been done to make latex a reasonable technology for producing papers. If someone wants to create the necessary infrastructure to make htlatex work as well as pdflatex does, then feel free. It's more to make htlatex work as well as lncs.sty works. htlatex produces reasonable, if dull, HTML of the bat. My experience is that htlatex produces very bad output. Phil peter
Re: scientific publishing process (was Re: Cost and access)
Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let them produce this solution. If the group can get pre-approval of some conference, then more power to them. However, I'm not going to vote for any pre-approval of some future solution when the current situation is satisficing. Phil peter
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. Yeah, you have to tell it to do mathml. The problem is that older versions of the browsers don't render mathml, and image rendering was the only option. I don't know what the way to do this right would be, I just know that the There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. Indeed. I believe that there are plans in future versions of HTML to introduce a pre tag which prefers indentation and line breaks. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. I *only* use PDF when reviewing. I never use it for viewing anything else. I only use it for reviewing since I am forced to. Experiences differ, so I find this a far from compelling argument. This is why it is important that web conferences allow HTML, which is where the argument started. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? Well, we've been through this before, so I'll not repeat myself. Phil
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? Of course, this is an extreme example, although not an unrealistic one. It is fundamentally any different from my desire as I get older to be able to change font size and refill paragraphs with ease. I see a difference of scale, that is all. Similarly for reading papers, if the rendering of the paper interferes with the authors' message, that is a failure of the process. Yes, I agree. Which is why, I believe, that the rendering of a paper should be up to the reader. Phil
Re: scientific publishing process (was Re: Cost and access)
It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. No, I don't think that viewing this issue from the reviewer perspective is too narrow. Reviewers form a vital part of the scientific publishing process. Anything that makes their jobs harder or the results that they produce worse is going to have to have very large benefits over the current setup. In any case, I haven't been looking at the reviewer perspective only, even in the message quoted below. peter PS: This is *not* to say that I think that the reviewing process is anywhere near ideal. On the contrary, I think that the reviewing process has many problems, particularly as it is performed in CS conferences. On 10/06/2014 09:19 AM, Martynas Jusevičius wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let them produce this solution. If the group can get pre-approval of some conference, then more power to them. However, I'm not going to vote for any pre-approval of some future solution when the current situation is satisficing. Phil peter
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 09:28 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. Yeah, you have to tell it to do mathml. The problem is that older versions of the browsers don't render mathml, and image rendering was the only option. Well, then someone is going to have to tell people how to do this. What I saw for htlatex was that it just did the right thing. I don't know what the way to do this right would be, I just know that the There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. Indeed. I believe that there are plans in future versions of HTML to introduce a pre tag which prefers indentation and line breaks. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. I *only* use PDF when reviewing. I never use it for viewing anything else. I only use it for reviewing since I am forced to. Experiences differ, so I find this a far from compelling argument. It may not be a compelling argument when choosing between two new alternatives, but it is much more compelling argument against change. This is why it is important that web conferences allow HTML, which is where the argument started. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? Well, we've been through this before, so I'll not repeat myself. Phil Yes, and I haven't seen any benefits using the current setup. peter
Re: scientific publishing process (was Re: Cost and access)
Following the same logic, we still could have been using paper submissions? All you have to do is to scan them to turn them into PDFs. It's been a while since I was in the university, but wasn't dissemination an important part of science? What about dogfooding after all? Martynas On Mon, Oct 6, 2014 at 6:48 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. No, I don't think that viewing this issue from the reviewer perspective is too narrow. Reviewers form a vital part of the scientific publishing process. Anything that makes their jobs harder or the results that they produce worse is going to have to have very large benefits over the current setup. In any case, I haven't been looking at the reviewer perspective only, even in the message quoted below. peter PS: This is *not* to say that I think that the reviewing process is anywhere near ideal. On the contrary, I think that the reviewing process has many problems, particularly as it is performed in CS conferences. On 10/06/2014 09:19 AM, Martynas Jusevičius wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let them produce this solution. If the group can get pre-approval of some conference, then more power to them. However, I'm not going to vote for any pre-approval of some future
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. Of course, this is an extreme example, although not an unrealistic one. It is fundamentally any different from my desire as I get older to be able to change font size and refill paragraphs with ease. I see a difference of scale, that is all. I see these as completely different. There are some aspects of rendering that generally do not interfere with intent. There are other aspects of rendering that can easily interfere with intent. Similarly for reading papers, if the rendering of the paper interferes with the authors' message, that is a failure of the process. Yes, I agree. Which is why, I believe, that the rendering of a paper should be up to the reader As this is why I believe that the authors' should be able to specify the rendering of their paper to the extent that they feel is needed to convey the intent of the paper. . Phil peter
Re: scientific publishing process (was Re: Cost and access)
I would be much more generic here, show me how to query a bunch of PDFs with anything... of course, the answer will go like you can extract the text and do A and the B and then get a relatively decent text depending on A B and C. then someone else will chime in and say and this is just because people dont know how to generate PDFs, if one generates a PDF using ADOBE tools like A B and C then the PDF will be perfect for text mining and bla bla bla PDF is ok for a consistent layout, HTML is great for what it was created. but neither of those formats, AFAIK were conceived, engineered for scientific papers, executable, self describing, embedded within the web of data, etc. On Mon, Oct 6, 2014 at 9:19 AM, Martynas Jusevičius marty...@graphity.org wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let them produce this solution. If the group can get pre-approval of some conference, then more power to them. However, I'm not going to vote for any pre-approval of some future solution when the current situation is satisficing. Phil peter -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac
Re: scientific publishing process (was Re: Cost and access)
It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. in the age of the web of data why should I restrict my search just to metadata? I want the full content, open access or not once I have the document I should be able to mine the content of the document. I dont want to limit my search just to simple metadata. On Mon, Oct 6, 2014 at 9:48 AM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. No, I don't think that viewing this issue from the reviewer perspective is too narrow. Reviewers form a vital part of the scientific publishing process. Anything that makes their jobs harder or the results that they produce worse is going to have to have very large benefits over the current setup. In any case, I haven't been looking at the reviewer perspective only, even in the message quoted below. peter PS: This is *not* to say that I think that the reviewing process is anywhere near ideal. On the contrary, I think that the reviewing process has many problems, particularly as it is performed in CS conferences. On 10/06/2014 09:19 AM, Martynas Jusevičius wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done, having the htlatex toolchain mess it up is a failure. That said, the result was better than I expected. If someone upgrades htlatex to work well I'm quite willing to use it, but I expect that a lot of work is going to be needed. Which gets us back to the chicken and egg situation. I would probably do this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll end up with the PDF output anyway. Well, I'm with ESWC and ISWC here. The review process should be designed to make reviewing easy for reviewers. Until viewing HTML output is as trouble-free as viewing PDF output, then PDF should be the required format. This is why it is important that web conferences allow HTML, which is where the argument started. If you want something that prints just right, PDF is the thing for you. If you you want to read your papers in the bath, likewise, PDF is the thing for you. And that's fine by me (so long as you don't mind me reading your papers in the bath!). But it needs to not be the only option. Why? What are the benefits of HTML reviewing, right now? What are the benefits of HTML publishing, right now? If there were HTML-based tools that worked well for preparing, reviewing, and reading scientific papers, then maybe conferences would use them. However, conference organizers and reviewers have limited time, and are thus going for the simplest solution that works well. If some group thinks that a good HTML-based solution is possible, then let
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:28 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. Yeah, you have to tell it to do mathml. The problem is that older versions of the browsers don't render mathml, and image rendering was the only option. Well, then someone is going to have to tell people how to do this. What I saw for htlatex was that it just did the right thing. So, htlatex is part of TeX4Ht which does HTML. If you do xhmlatex then you get XHTML with, indeed, math mode in MathML. So, for example, this output comes with the default xhmlatex. math xmlns=http://www.w3.org/1998/Math/MathML; display=inline mi e/mi mo class=MathClass-rel=/mo mi m/mimsupmrow mi c/mi/mrowmrow mn2/mn/mrow/msup /math tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil
Re: scientific publishing process (was Re: Cost and access)
I don't think that scanning a printout retains any metadata that was in the electronic source so, no, this would not follow using the same logic. I do agree that dissemination of results is one of the most important parts of the scientific process. The argument here is, I think, what is the best way to support dissemination. Eating your own dog food, is a separate matter, I think. Eating your own dog good may help with uptake, but on the other hand it may interfere with dissemination, by making preparation of papers harder or making them harder to review or read. peter On 10/06/2014 10:09 AM, Martynas Jusevičius wrote: Following the same logic, we still could have been using paper submissions? All you have to do is to scan them to turn them into PDFs. It's been a while since I was in the university, but wasn't dissemination an important part of science? What about dogfooding after all? Martynas On Mon, Oct 6, 2014 at 6:48 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. No, I don't think that viewing this issue from the reviewer perspective is too narrow. Reviewers form a vital part of the scientific publishing process. Anything that makes their jobs harder or the results that they produce worse is going to have to have very large benefits over the current setup. In any case, I haven't been looking at the reviewer perspective only, even in the message quoted below. peter PS: This is *not* to say that I think that the reviewing process is anywhere near ideal. On the contrary, I think that the reviewing process has many problems, particularly as it is performed in CS conferences. On 10/06/2014 09:19 AM, Martynas Jusevičius wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas [...]
Re: scientific publishing process (was Re: Cost and access)
Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. It *is* exactly what you are talking about. If I want to render your document to speech, then why should I not? What I am saying is that, you, the author, should not wish to constrain the rendering, only really the content. Effectively, if you are using latex, you are already doing this, since latex defines the layout and not you. But, I think we are talking in too abstract a term here. Should you be able to constrain indentation for code blocks? Yes, of course, you should. But, a quick look at the web shows that people do this all the time. Similarly for reading papers, if the rendering of the paper interferes with the authors' message, that is a failure of the process. Yes, I agree. Which is why, I believe, that the rendering of a paper should be up to the reader As this is why I believe that the authors' should be able to specify the rendering of their paper to the extent that they feel is needed to convey the intent of the paper. For scientific papers, I think this really is not very far. I mean, a scientific paper is not a fashion store; it's a story designed to persuade with data. I would like to see papers which are in the hands of the reader as much as possible. Citation format should be for the reader. Math presentation. Graphs should be interactive and zoomable, with the data underneath as CSV. All of these are possible and routine with HTML now. I want to be free to choose the organisation of my papers so that I can convey what I want. At the moment, I cannot. The PDF is not reasonable for all, maybe not even most of this. But some. Phil
Re: scientific publishing process (was Re: Cost and access)
Sure. So extract the text from the PDF and query that. It also would be nice to have access to the LaTeX sources. What HTML publishing *might* have that is better than the above is to more easily embed some extra information into papers that can be queried. Is this just metadata that could also be easily injected into PDFs? If given this capability will a significant number of authors use it? Is it instead better to have a separate document that has the information and not use HTML for publishing? peter On 10/06/2014 10:42 AM, Alexander Garcia Castro wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. in the age of the web of data why should I restrict my search just to metadata? I want the full content, open access or not once I have the document I should be able to mine the content of the document. I dont want to limit my search just to simple metadata. On Mon, Oct 6, 2014 at 9:48 AM, Peter F. Patel-Schneider pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. No, I don't think that viewing this issue from the reviewer perspective is too narrow. Reviewers form a vital part of the scientific publishing process. Anything that makes their jobs harder or the results that they produce worse is going to have to have very large benefits over the current setup. In any case, I haven't been looking at the reviewer perspective only, even in the message quoted below. peter PS: This is *not* to say that I think that the reviewing process is anywhere near ideal. On the contrary, I think that the reviewing process has many problems, particularly as it is performed in CS conferences. On 10/06/2014 09:19 AM, Martynas Jusevičius wrote: Dear Peter, please show me how to query PDFs with SPARQL. Then I'll believe there are no benefits of XHTML+RDFa over PDF. Addressing the issue from the reviewer perspective only is too narrow, don't you think? Martynas On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote: On 10/06/2014 08:38 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com writes: I would be totally astonished if using htlatex as the main way to produce conference papers were as simple as this. I just tried htlatex on my ISWC paper, and the result was, to put it mildly, horrible. (One of my AAAI papers was about the same, the other one caused an undefined control sequence and only produced one page of output.) Several parts of the paper were rendered in fixed-width fonts. There was no attempt to limit line length. Footnotes were in separate files. The footnote thing is pretty strange, I have to agree. Although footnotes are a fairly alien concept wrt to the web. Probably hover overs would be a reasonable presentation for this. Many non-scalable images were included, even for simple math. It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. I don't know what the way to do this right would be, I just know that the version of htlatex for Fedora 20 fails to reasonably handle the math in this paper. My carefully designed layout for examples was modified in ways that made the examples harder to understand. Perhaps this is a key difference between us. I don't care about the layout, and want someone to do it for me; it's one of the reasons I use latex as well. There are many cases where line breaks and indentation are important for understanding. Getting this sort of presentation right in latex is a pain for starters, but when it has been done,
Re: scientific publishing process (was Re: Cost and access)
On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. Peter, Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my direct guidance, there are issues with your claim above: 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) 2. It isn't solely about metadata -- we also have raw data inside these documents confined to Tables, paragraphs of sentences 3. If querying a PDF was marginally simple, I would be demonstrating that using a SPARQL results URL in response to this post :-) Possible != Simple and Productive. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 10:44 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:28 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: It does MathML I think, which is then rendered client side. Or you could drop math-mode straight through and render client side with mathjax. Well, somehow png files are being produced for some math, which is a failure. Yeah, you have to tell it to do mathml. The problem is that older versions of the browsers don't render mathml, and image rendering was the only option. Well, then someone is going to have to tell people how to do this. What I saw for htlatex was that it just did the right thing. So, htlatex is part of TeX4Ht which does HTML. If you do xhmlatex then you get XHTML with, indeed, math mode in MathML. So, for example, this output comes with the default xhmlatex. math xmlns=http://www.w3.org/1998/Math/MathML; display=inline mi e/mi mo class=MathClass-rel=/mo mi m/mimsupmrow mi c/mi/mrowmrow mn2/mn/mrow/msup /math tex4ht takes the slight strange approach of having an strange and incomprehensible command line, and then lots of scripts which do default options, of which xhmlatex is one. In my installation, they've only put the basic ones into the path, so I ran this with /usr/share/tex4ht/xhmlatex. Phil So someone has to package this up so that it can be easily used. Before then, how can it be required for conferences? I have tex4ht installed, but there is no xhmlatex file to be found. I managed to find what appears to be a good command line htlatex schema-org-analysis.tex xhtml,mathml -cunihtf -cvalidate This looks better when viewed, but the resultant HTML is unintelligible. There is definitely more work needed here before this can be considered as a potential solution. peter
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 11:00 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. It *is* exactly what you are talking about. Well, maybe I was not being clear, but I thought that I was talking about rendering changes interfering with comprehension of the authors' intent. peter [...]
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 11:00 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: On 10/06/2014 09:32 AM, Phillip Lord wrote: Peter F. Patel-Schneider pfpschnei...@gmail.com writes: Who cares what the authors intend? I mean, they are not reading the paper, are they? For reviewing, what the authors intend is extremely important. Having different rendering of the paper interfere with the authors' message is something that should be avoided at all costs. Really? So, for example, you think that a reviewer with impared vision should, for example, be forced to review a paper using the authors rendering, regardless of whether they can read it or not? No, but this is not what I was talking about. I was talking about interfering with the authors' message via changes from the rendering that the authors' set up. It *is* exactly what you are talking about. If I want to render your document to speech, then why should I not? What I am saying is that, you, the author, should not wish to constrain the rendering, only really the content. Effectively, if you are using latex, you are already doing this, since latex defines the layout and not you. But, I think we are talking in too abstract a term here. Should you be able to constrain indentation for code blocks? Yes, of course, you should. But, a quick look at the web shows that people do this all the time. Sure, and htlatex appears to interfere with this indentation. At least it does in my ISWC paper. Similarly for reading papers, if the rendering of the paper interferes with the authors' message, that is a failure of the process. Yes, I agree. Which is why, I believe, that the rendering of a paper should be up to the reader As this is why I believe that the authors' should be able to specify the rendering of their paper to the extent that they feel is needed to convey the intent of the paper. For scientific papers, I think this really is not very far. I mean, a scientific paper is not a fashion store; it's a story designed to persuade with data. I would like to see papers which are in the hands of the reader as much as possible. Citation format should be for the reader. Math presentation. Graphs should be interactive and zoomable, with the data underneath as CSV. All of these are possible and routine with HTML now. I want to be free to choose the organisation of my papers so that I can convey what I want. At the moment, I cannot. The PDF is not reasonable for all, maybe not even most of this. But some. Phil So, you believe that there is an excellent set of tools for preparing, reviewing, and reading scientific publishing. Package them up and make them widely available. If they are good, people will use them. Convince those who run conferences. If these people are convinced, then they will allow their use in conferences or maybe even require their use. I'm not convinced by what I'm seeing right now, however. peter
Re: scientific publishing process (was Re: Cost and access)
On 10/06/2014 11:03 AM, Kingsley Idehen wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. Peter, Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my direct guidance, there are issues with your claim above: Huh? Every single PDF reader that I use can extract the PDF metadata and display it. The metadata that I see in PDF documents uses a core set of properties that are easy to transform into RDF. Of course, this core set is very small (title, author, and a few other things) so you don't get all that much out of the core set. 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) Well, the extractors would be specific to PDF, but that's hardly surprising, I think. 2. It isn't solely about metadata -- we also have raw data inside these documents confined to Tables, paragraphs of sentences Well, sure, but is extracting information directly from the figures or tables or text being considered here? I sure would like this to be possible. How would it work in an HTML context? 3. If querying a PDF was marginally simple, I would be demonstrating that using a SPARQL results URL in response to this post :-) I'm not saying that it is so simple. You do have to find the metadata block in the PDF and then look for the /Title, /Author, ... stuff. Possible != Simple and Productive. Yes, but there are lots of tools that display PDF metadata, so there are some who believe that the benefit is greater than the cost. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered. If these costs are eliminated or at least minimized then this good is much more likely to be realized. peter
Re: scientific publishing process (was Re: Cost and access)
Luca Matteis lmatt...@gmail.com writes: On Mon, Oct 6, 2014 at 5:29 PM, Phillip Lord wrote: Who cares what the authors intend? I mean, they are not reading the paper, are they? Authors might have adjusted things that way specifically to deliver their message. I think being able to have consistent layouts *as the authors intend it* is a very important thing. It's also important on the Web: people want their site to look feel in a very specific and consistent way. Well, it’s also why we now have the things like the Stylish and Greasemonkey add-ons for Firefox, and the http://userstyles.org/ resource on the Web (not to mention the whole world of “unusual” Web browsers, such as Lynx.) That is: the /readers/ too want to tailor that “look and feel” to /their/ tastes, to get rid of the poor design choices of the Web publishers, – and to thus improve their “Web reading experience.” -- FSF associate member #7257 http://boycottsystemd.org/ … 3013 B6A0 230E 334A
Re: scientific publishing process (was Re: Cost and access)
On 10/6/14 2:19 PM, Alexander Garcia Castro wrote: querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. Yes, I believe I indicated that in my response to Peter i.e., it isn't simple or productive. just querying metadata is not enough. Yes, I said that too i.e., we want access to raw data. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. Modulo any confusing typos in my earlier posts, I don't see where we are disagreeing :-) Kingsley On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen kide...@openlinksw.com mailto:kide...@openlinksw.com wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. Peter, Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my direct guidance, there are issues with your claim above: 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) 2. It isn't solely about metadata -- we also have raw data inside these documents confined to Tables, paragraphs of sentences 3. If querying a PDF was marginally simple, I would be demonstrating that using a SPARQL results URL in response to this post :-) Possible != Simple and Productive. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen http://www.openlinksw.com/blog/%7Ekidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Greetings. On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote: querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. just querying metadata is not enough. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. The following might add something to this conversation. It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF. Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet. This has the document metadata, the abstract, and an illustrative bit of argumentation. Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF. Incorporating \begin{tabular} tables would be rather tricky, but not impossible. I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic. It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start. Generating and querying this PDF seems pretty simple to me. $ cat test-xmp.tex \documentclass{article} \usepackage{xmp-management} \title{This is a test file} \author{Norman Gray} \date{2014 October 6} \begin{document} \maketitle \abstract{It's easy to include metadata in \LaTeX\ files. That's because there's plenty of metadata in there already.} There is text and metatext within files. \section{Further details} In this section we could potentially discuss moving information around. I think we can assert that \claim{it is easy to move information around}, and, further, that \claim{making metadata readily available is a Good Thing}. I hope that clears that up. \end{document} $ cat xmp-management.sty \ProvidesPackage{xmp-management}[2014/10/06] \newwrite\xmp@ttlfile \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl \let\xmp@open\relax} \long\def\xmp@stmt#1#2{% \xmp@open \write\xmp@ttlfile{ #1 #2.}} \let\xmp@origtitle\title \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}} \let\xmp@origauthor\author \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}} \let\xmp@origdate\date \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}} \long\def\abstract#1{ \xmp@stmt{dc:abstract}{#1} \begin{quotation}\textbf{Abstract:} #1\end{quotation}} \def\claim#1{ \xmp@stmt{xmpinfo:claim}{#1} \emph{#1}} \let\xmp@origsection\section \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1} \xmp@origsection{#1}} \usepackage{xmpincl} \AtBeginDocument{\includexmp{info}} $ pdflatex test-xmp This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) restricted \write18 enabled. entering extended mode (./test-xmp.tex LaTeX2e 2011/06/27 [...BLAH...] Output written on test-xmp.pdf (1 page, 75667 bytes). Transcript written on test-xmp.log. $ cat test-xmp.ttl dc:title This is a test file. dc:creator Norman Gray. dc:created 2014 October 6. dc:abstract It's easy to include metadata in \LaTeX \ files. \par That's because there's plenty of metadata in there already.. xmpinfo:has_section Further details. xmpinfo:claim it is easy to move information around. xmpinfo:claim making metadata readily available is a Good Thing. $ make info.xmp sed 's/\\//g' test-xmp.ttl | \ cat prefix.ttl - | \ rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \ sed '/\?xpacket/d' info.xmp.tmp mv info.xmp.tmp info.xmp $ pdflatex test-xmp This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) restricted \write18 enabled. entering extended mode (./test-xmp.tex LaTeX2e 2011/06/27 [...BLAH...] Output written on test-xmp.pdf (1 page, 77069 bytes). Transcript written on test-xmp.log. $ make extract-xmp cc -Wall -o extract-xmp extract-xmp.c $ ./extract-xmp test-xmp.pdf rdf:RDF xmlns:cc=http://creativecommons.org/ns#; xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:xapRights=http://ns.adobe.com/xap/1.0/rights/; xmlns:xmpinfo=http://example.org/xmpinfo; xml:base=file:test-xmp.pdf rdf:Description rdf:about= cc:license
Re: scientific publishing process (was Re: Cost and access)
Sorry to jump into this once again but when it comes to typesetting nothing really comes close to Latex/PDF: http://tex.stackexchange.com/questions/120271/alternatives-to-latex - not even HTML/CSS/JavaScript On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote: querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. just querying metadata is not enough. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. The following might add something to this conversation. It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF. Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet. This has the document metadata, the abstract, and an illustrative bit of argumentation. Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF. Incorporating \begin{tabular} tables would be rather tricky, but not impossible. I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic. It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start. Generating and querying this PDF seems pretty simple to me. $ cat test-xmp.tex \documentclass{article} \usepackage{xmp-management} \title{This is a test file} \author{Norman Gray} \date{2014 October 6} \begin{document} \maketitle \abstract{It's easy to include metadata in \LaTeX\ files. That's because there's plenty of metadata in there already.} There is text and metatext within files. \section{Further details} In this section we could potentially discuss moving information around. I think we can assert that \claim{it is easy to move information around}, and, further, that \claim{making metadata readily available is a Good Thing}. I hope that clears that up. \end{document} $ cat xmp-management.sty \ProvidesPackage{xmp-management}[2014/10/06] \newwrite\xmp@ttlfile \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl \let\xmp@open\relax} \long\def\xmp@stmt#1#2{% \xmp@open \write\xmp@ttlfile{ #1 #2.}} \let\xmp@origtitle\title \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}} \let\xmp@origauthor\author \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}} \let\xmp@origdate\date \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}} \long\def\abstract#1{ \xmp@stmt{dc:abstract}{#1} \begin{quotation}\textbf{Abstract:} #1\end{quotation}} \def\claim#1{ \xmp@stmt{xmpinfo:claim}{#1} \emph{#1}} \let\xmp@origsection\section \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1} \xmp@origsection{#1}} \usepackage{xmpincl} \AtBeginDocument{\includexmp{info}} $ pdflatex test-xmp This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) restricted \write18 enabled. entering extended mode (./test-xmp.tex LaTeX2e 2011/06/27 [...BLAH...] Output written on test-xmp.pdf (1 page, 75667 bytes). Transcript written on test-xmp.log. $ cat test-xmp.ttl dc:title This is a test file. dc:creator Norman Gray. dc:created 2014 October 6. dc:abstract It's easy to include metadata in \LaTeX \ files. \par That's because there's plenty of metadata in there already.. xmpinfo:has_section Further details. xmpinfo:claim it is easy to move information around. xmpinfo:claim making metadata readily available is a Good Thing. $ make info.xmp sed 's/\\//g' test-xmp.ttl | \ cat prefix.ttl - | \ rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \ sed '/\?xpacket/d' info.xmp.tmp mv info.xmp.tmp info.xmp $ pdflatex test-xmp This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) restricted \write18 enabled. entering extended mode (./test-xmp.tex LaTeX2e 2011/06/27 [...BLAH...] Output written on test-xmp.pdf (1 page, 77069 bytes). Transcript written on test-xmp.log. $ make extract-xmp cc -Wall -o
Re: scientific publishing process (was Re: Cost and access)
Neat. This could be extended to putting a full table of contents into the metadata, and in lots of other ways. The other nice thing about it is that it would be possible to push the same data through a LaTeX to HTML toolchain for those who want HTML output. peter On 10/06/2014 03:18 PM, Norman Gray wrote: Greetings. On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote: querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. just querying metadata is not enough. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. The following might add something to this conversation. It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF. Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet. This has the document metadata, the abstract, and an illustrative bit of argumentation. Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF. Incorporating \begin{tabular} tables would be rather tricky, but not impossible. I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic. It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start. Generating and querying this PDF seems pretty simple to me. [...]
Re: scientific publishing process (was Re: Cost and access)
On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote: On 10/06/2014 11:03 AM, Kingsley Idehen wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: It's not hard to query PDFs with SPARQL. All you have to do is extract the metadata from the document and turn it into RDF, if needed. Lots of programs extract and display this metadata already. Peter, Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my direct guidance, there are issues with your claim above: Huh? Every single PDF reader that I use can extract the PDF metadata and display it. Again, this isn't about metadata. The metadata that I see in PDF documents uses a core set of properties that are easy to transform into RDF. Metadata isn't the issue at hand. Of course, this core set is very small (title, author, and a few other things) so you don't get all that much out of the core set. See my comments above :) 1. The extractors are platform specific -- AWWW is about platform agnosticism (I don't want to mandate an OS for experiencing the power of Linked Open Data transformers / rdfizers) Well, the extractors would be specific to PDF, but that's hardly surprising, I think. 2. It isn't solely about metadata -- we also have raw data inside these documents confined to Tables, paragraphs of sentences Well, sure, but is extracting information directly from the figures or tables or text being considered here? I sure would like this to be possible. How would it work in an HTML context? Each table is a Class. Each table record is an instance of the Class represented by the table. Each table field is a property of a Class represented by the table Each table field value's data type can be used to discern the range of each Class property. Depending on what the sentences and paragraphs are about you can make an RDF statement per sentence. 3. If querying a PDF was marginally simple, I would be demonstrating that using a SPARQL results URL in response to this post :-) I'm not saying that it is so simple. You do have to find the metadata block in the PDF and then look for the /Title, /Author, ... stuff. But it could be simple if PDF didn't have the issues I outlined in regards to extraction technology. Funnily enough, there's a massive opportunity for Adobe to solve this problem, especially as they've now ventured heavily into cloud enabling their technologies, If they provide APIs from the cloud, this problem could become much simpler to address in regards to productive solutions where PDFs become less of the data silos that they are today. Possible != Simple and Productive. Yes, but there are lots of tools that display PDF metadata, so there are some who believe that the benefit is greater than the cost. Metadata isn't the fundamental quest here. We want to leverage the productivity and simplicity that AWWW brings to data representation, access, interaction, and integration. Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered. If these costs are eliminated or at least minimized then this good is much more likely to be realized. With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs. peter -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Hi Adobe lurkers, Kingsley has just handed you a valuable means to keep users tied to your technologies: On 10/6/2014 8:18 PM, Kingsley Idehen wrote: On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote: On 10/06/2014 11:03 AM, Kingsley Idehen wrote: On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: I'm not saying that it is so simple. You do have to find the metadata block in the PDF and then look for the /Title, /Author, ... stuff. But it could be simple if PDF didn't have the issues I outlined in regards to extraction technology. Funnily enough, there's a massive opportunity for Adobe to solve this problem, especially as they've now ventured heavily into cloud enabling their technologies, If they provide APIs from the cloud, this problem could become much simpler to address in regards to productive solutions where PDFs become less of the data silos that they are today. Of course, it probably makes sense for Adobe to do the work, but there is also enough known in open source about PDFs for a third party to do this as well. Good idea, K! Mike
Re: scientific publishing process (was Re: Cost and access)
Hello Paul, On Sat, Oct 04, 2014 at 06:47:19PM -0500, Paul Tyson wrote: I certainly was not suggesting this. It would indeed be silly to publish large collections of empirical quantitative propositions in RDF. Yes. And describing such collections with RDF on a level above basic metadata is not so silly but very difficult in many cases - as I tried to show with my example. Connecting those propositions to significant conclusions through sound arguments is the more important problem. They will attempt to do so, presumably, by creating monographs in an electronic source format that has more or less structure to it. The structure will support many useful operations, including formatting the content for different media, hyperlinking to other resources, indexing, and metadata gleaning. The structure will most likely *not* support any programmatic operations to expose the logical form of the arguments in such a way that another person could extract them and put them into his own logic machine to confirm, deny, strengthen, or weaken the arguments. Take for example a research paper whose argument proceeded along the lines of All men are mortal; Socrates is a man; therefore Socrates is mortal. Along comes a skeptic who purports to have evidence that Socrates is not a man. He publishes the evidence in such a way that other users can if they wish insert the conclusion from such evidence in place of the minor premise in the original researcher's argument. Then the conclusion cannot be affirmed. The original researcher must either find a different form of argument to prove his conclusion, overturn the skeptic's evidence (by further argument, also machine-processable), or withdraw his conclusion. This simple model illustrates how human knowledge has progressed for millenia, mediated solely by oral, written, and visual and diagrammatic communication. I am suggesting we enlist computers to do something more for us in this realm than just speeding up the millenia-old mechanisms. Can you express this argument with triples? I would not be able to do that. Maybe if I devoted my life to it - starting with the famous the cat sat on a mat example. The end result would be incomprehensible to others and absolutely useless. I even doubt that science works the way you describe it. Mathematics works this way and there are good reasons that formal proofs are absolute exeptions in this field ca. 2014. Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel pgpInHXgZpTbr.pgp Description: PGP signature
Re: scientific publishing process (was Re: Cost and access)
On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
Further to Hugh's comment about the non-techhy world I found this interesting quote on the Web. The web is more a social creation than a technical one. I designed it for a social effect — to help people work together — and not as a technical toy. The ultimate goal of the Web is to support and improve our weblike existence in the world. I like this blue sky thinking and it seems to suggest (to me) that sometimes constantly moving technical engineering is not always productive or collaborative. Dominic (I will have a look at e-prints :-)) From: Hugh Glaser h...@glasers.org To: Daniel Schwabe dschw...@inf.puc-rio.br Cc: SW-forum Web semantic-...@w3.org; Linking Open Data public-lod@w3.org; Phillip Lord phillip.l...@newcastle.ac.uk; Eric Prud'hommeaux e...@w3.org; Peter F. Patel-Schneider pfpschnei...@gmail.com; Bernadette Hyland bhyl...@3roundstones.com Sent: Saturday, October 4, 2014 12:14 PM Subject: Re: scientific publishing process (was Re: Cost and access) Executive summary: 1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents; 2) Start to contribute to the Open Source community. Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Longer version. I too have a deep sense of deja vu all over yet again :-) But I have learned something - on-one seems to collaborate with people outside the tecchy world. Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox. And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML. Anyway… I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment. a) Document format; b) Metadata about the publication, both superficial and deep; c) Data, systems and workflow about the research. But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world. There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes. I know about a number of things, but there will be others. (a) and (b) Repositories (because that is what we are talking about) http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it. Eg http://eprints.soton.ac.uk/id/eprint/271458 I even have an existing endpoint with all the ePrints RDF in it - http://foreign.rkbexplorer.com, with currently 24G 182854666 triples, so such software can be used. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do. It is mature technology with 100s of person-years invested. And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users. It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid. And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there! Or a more challenging thing would be to make http://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)! (c) Workflows and Datasets I have mentioned http://www.myexperiment.org before, but can’t remember if I have mentioned http://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for example https://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Hugh On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2
Re: scientific publishing process (was Re: Cost and access)
This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template. The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...). But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-) Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on XML first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools. Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. Adding my 2¢ to Daniel's:-) Ivan P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh... P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space... [1] https://atlas.oreilly.com [2] http://metrodigi.com [3] https://www.inkling.com On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily
Re: scientific publishing process (was Re: Cost and access)
I think I mentioned previously, Ivan, but perhaps not on this thread - Hugh McGuire has developed a Wordpress tool called PressBooks which allows you to write a book in HTML and export it as an EPUB file. He even supports schema.org markup in a separate plugin. (http://www.pressbooks.com) On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote: This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template. The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...). But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-) Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on XML first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools. Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. Adding my 2¢ to Daniel's:-) Ivan P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh... P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space... [1] https://atlas.oreilly.com [2] http://metrodigi.com [3] https://www.inkling.com On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of
Re: scientific publishing process (was Re: Cost and access)
metadata, sure. it is a must. BUT good and thought for the web of data. not designed for paper based collections. From my experience it is not so much about representing everything from the paper as triplets. there will be statements that won't be representable, also, such approach may not be efficient. why don't we just go a little bit further up from the lowest hanging fruit and start talking about self describing documents? well annotated documents with well structured metadata that are interoperable. this is easy, achievable, requires little tooling, does not put any burden on the author, delivers interoperability beyond just simple hyperlinks, it is much more elegant than adhering to HTML, etc. On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote: On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac
Re: scientific publishing process (was Re: Cost and access)
+1 John http://Bresl.in On 5 Oct 2014, at 15:39, Ivan Herman i...@w3.org wrote: This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template. The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...). But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-) Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on XML first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools. Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. Adding my 2¢ to Daniel's:-) Ivan P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh... P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space... [1] https://atlas.oreilly.com [2] http://metrodigi.com [3] https://www.inkling.com On 04 Oct 2014, at 04:14 , Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. And this is simply looking at
Re: scientific publishing process (was Re: Cost and access)
On 05 Oct 2014, at 16:47 , Laura Dawson laura.daw...@bowker.com wrote: I think I mentioned previously, Ivan, but perhaps not on this thread - Hugh McGuire has developed a Wordpress tool called PressBooks which allows you to write a book in HTML and export it as an EPUB file. He even supports schema.org markup in a separate plugin. (http://www.pressbooks.com) Indeed, I forgot! The problem with this service (but also for the others I guess) is that, at least through the standard offers on the sites), they may not be appropriate for a workshop, that would require leaving access to a large(r) numbers of submitters in the submission phase, followed by a selection process to end up in a small number of the submissions in the final book. This does not really fit in the business models. It should be up to the scholarly publishers to pick this up... (But I guess we digress greatly from the main topic of this mailing list, ie, semantic web...) Ivan On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote: This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template. The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...). But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-) Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on XML first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools. Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. Adding my 2¢ to Daniel's:-) Ivan P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh... P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space... [1] https://atlas.oreilly.com [2]
Re: scientific publishing process (was Re: Cost and access)
Hi Peter Yes, these tags are semantic, in the context of a document. One could declare a document section instead of saying that there's a container. This way one can easily make a table of contents of several documents. Not semantic in the sense they describe the knowledge in that document - that's what RDF, OWL are for. cheers -- diogo patrão On Fri, Oct 3, 2014 at 7:04 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com wrote: Hmm. Are these semantic? All these seem to do is to signal parts of a document. What I would consider to be semantic would be a way of extracting the mathematical content of a document. peter On 10/03/2014 02:32 PM, Diogo FC Patrao wrote: html5 has so-called semantic tags, like header, section. -- diogo patrão On Fri, Oct 3, 2014 at 6:01 PM, john.nj.dav...@bt.com mailto:john.nj.dav...@bt.com wrote: Yes, but what makes HTML better for being webby than PDF? Because it is a mark-up language (albeit largely syntactic) which makes it much more amenable to machine processing? -Original Message- From: Peter F. Patel-Schneider [mailto:pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com] Sent: 03 October 2014 21:15 To: Diogo FC Patrao Cc: Phillip Lord; semantic-...@w3.org mailto:semantic-...@w3.org; public-lod@w3.org mailto:public-lod@w3.org Subject: Re: scientific publishing process (was Re: Cost and access) On 10/03/2014 10:25 AM, Diogo FC Patrao wrote: On Fri, Oct 3, 2014 at 1:38 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote: One problem with allowing HTML submission is ensuring that reviewers can correctly view the submission as the authors intended it to be viewed. How would you feel if your paper was rejected because one of the reviewers could not view portions of it? At least with PDF there is a reasonably good chance that every paper can be correctly viewed by all its reviewers, even if they have to print it out. I don't think that the same claim can be made for HTML-based systems. The majority of journals I'm familiar with mandates a certain format for submission: font size, figure format, etc. So, in a HTML format submission, there should be rules as well, a standard CSS and the right elements and classes. Not different from getting a word(c) or latex template. This might help. However, someone has to do this, and ensure that the result is generally viewable. Web conference vitally use the web in their reviewing and publishing processes. Doesn't that show their allegiance to the web? Would the use of HTML make a conference more webby? As someone said, this is leading by example. Yes, but what makes HTML better for being webby than PDF? dfcp peter
Re: scientific publishing process (was Re: Cost and access)
Hi Alexander, On 5 Oct 2014, at 15:57, Alexander Garcia Castro alexgarc...@gmail.com wrote: metadata, sure. it is a must. BUT good and thought for the web of data. not designed for paper based collections. From my experience it is not so much about representing everything from the paper as triplets. there will be statements that won't be representable, also, such approach may not be efficient. why don't we just go a little bit further up from the lowest hanging fruit and start talking about self describing documents? well annotated documents with well structured metadata that are interoperable. this is easy, achievable, requires little tooling, does not put any burden on the author, delivers interoperability beyond just simple hyperlinks, it is much more elegant than adhering to HTML, etc. You lost me here. Who or what does the well annotated documents and well structured metadata”? If it isn’t any burden for the authors. Easy and little tooling - I wonder what methods and tools you have in mind? These have proved to be hard problems - otherwise we wouldn’t be having this painful discussion. Best Hugh On Sun, Oct 5, 2014 at 3:19 AM, Hugh Glaser h...@glasers.org wrote: On 5 Oct 2014, at 11:07, Michael Brunnbauer bru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652 -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: scientific publishing process (was Re: Cost and access)
Hi Ivan, On 5 Oct 2014, at 16:42, Ivan Herman i...@w3.org wrote: On 05 Oct 2014, at 16:47 , Laura Dawson laura.daw...@bowker.com wrote: I think I mentioned previously, Ivan, but perhaps not on this thread - Hugh McGuire has developed a Wordpress tool called PressBooks which allows you to write a book in HTML and export it as an EPUB file. He even supports schema.org markup in a separate plugin. (http://www.pressbooks.com) Indeed, I forgot! The problem with this service (but also for the others I guess) is that, at least through the standard offers on the sites), they may not be appropriate for a workshop, that would require leaving access to a large(r) numbers of submitters in the submission phase, followed by a selection process to end up in a small number of the submissions in the final book. This does not really fit in the business models. It should be up to the scholarly publishers to pick this up… Yes, we must keep remembering that the documents are simply one bit of a social machine, long before they get anywhere near (the unlikely event of them) being published. (But I guess we digress greatly from the main topic of this mailing list, ie, semantic web…) We did that quite a while ago, I think :-) But in the end you just gotta go with the flow, man. Best Hugh Ivan On 10/5/14, 10:34 AM, Ivan Herman i...@w3.org wrote: This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template. The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...). But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-) Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on XML first (or HTML first) workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools. Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. Adding my 2¢ to Daniel's:-) Ivan P.S. For LaTeX users: I guess the main advantage of LaTeX
Re: scientific publishing process (was Re: Cost and access)
On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. Best, Luca
Re: scientific publishing process (was Re: Cost and access)
On 10/5/14 6:19 AM, Hugh Glaser wrote: On 5 Oct 2014, at 11:07, Michael Brunnbauerbru...@netestate.de wrote: ... Basic metadata is good. Publishing datasets with the paper is good. Having typed links in the paper is good. But I would not demand to go further. +1 ++1 - the dataset publishing can include the workflow, tools etc, and metadata about that. +1 For context. Hence, my +1 for Hugh's detailed example which also veers towards building on a variety of existing efforts rather than ripping and replacing etc.. The data behind these papers doesn't need to be locked in tables, in PDFs. Neither do the descriptions of the data in question (the so called metadata), or the workflows involved. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 10/5/14 9:55 AM, Dominic Oldman wrote: Further to Hugh's comment about the non-techhy world I found this interesting quote on the Web. The web is more a social creation than a technical one. I designed it for a social effect — to help people work together — and not as a technical toy. The ultimate goal of the Web is to support and improve our weblike existence in the world. I like this blue sky thinking and it seems to suggest (to me) that sometimes constantly moving technical engineering is not always productive or collaborative. Dominic (I will have a look at e-prints :-)) Yes! The Web is fundamentally about collaboration (which is social) and data flow (even when this data is subject to data access policies and access control lists etc..). -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Word adds all sorts of horrible tags to things and makes the HTML virtually unrender-able. On 10/5/14, 4:19 PM, Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. Best, Luca
Re: scientific publishing process (was Re: Cost and access)
On 05 Oct 2014, at 22:19 , Luca Matteis lmatt...@gmail.com wrote: On Sun, Oct 5, 2014 at 4:34 PM, Ivan Herman i...@w3.org wrote: The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. But are they still very poor? I mean, I think there are more tools for rendering HTML than there are for rendering Latex. In fact there are probably more tools for rendering HTML than anything else out there, because HTML is used more than anything else. Because HTML powers the Web! You can write in Word, and export in HTML. You can write in Markdown and export in HTML. You can probably write in Latex and export in HTML as well :) The tools are not the problem. The problem to me is the printing afterwords. Conferences/workshops need to print the publications. Printing consistent Latex/PDF templates is a lot easier than printing inconsistent (layout wise) HTML pages. Interestingly, my experience is just about the opposite. Sorry:-) Yes, tools to _render_ HTML are around. But the issue is the _production_ of those pages (and, to make one step further alongside my original mail, to produce an ePub once the HTML pages are around). Word (as Laura remarked) produces nearly useless HTML; OpenOffice/LibreOffice is not much better I am afraid. Markdown is fine indeed, and markdown editors like Mou produce proper HTML, but the markup (sic!) facilities of markdown are limited. It is all right for simple books, but I suspect it would be more of a problem for scientific articles. (But yes, that is an avenue to explore.) WYSIWYG HTML editors exist by now, but I am not sure they are satisfactory either (I use BlueGriffon often, but I still have to switch back and forth between source mode and WYSIWYG mode, which beats the purpose). Of course, I could expect a Web technology related crows to use HTML source editing directly but the experience by Daniel and myself with the World Wide Web conference(!) is that people do not want to do that. (Researchers in, say, Web Search have proven to be unable or unwilling to edit HTML source. It was a real surprise...). Ie, the authoring tool offers are still limited. On the other hand... how long do we want to care about printing? The WWW conference (to stay with that example) has given up on printed proceedings for a while. The proceedings are published by the ACM and offered through their digital library, and the individual papers are available on-line on the conference site. I know that ISWC and (I believe) ESWC still produce printed Springer Proceedings but I wonder how long; who needs those in print? I must admit that I have not picked up a printed proceedings or journal article for many years, I look for the online versions instead. Of course, I may print a single paper because I want to read it while, for example, on the train, but then I do not really care about the way it looks. And, with tablets, even this usage is becoming less significant. That being said, producing a proper PDF from HTML is again not a problem, CSS has a number of page/print specific terms and is being actively worked on in this respect. Cheers Ivan Best, Luca Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me signature.asc Description: Message signed with OpenPGP using GPGMail
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-04 04:14, Daniel Schwabe wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. Hi Daniel, here is my long reply as usual and I hope you'll give it a shot :) I've offered *a* solution that is compatible with the existing workflow without asking for any extra work from the OC/PCs, with the exception that the Web-native technologies for the submissions are officially encouraged. They will get their PDF in the end to cater the existing pipeline. In the meantime, the community retains higher quality research documents. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually semantic in some sense is still, in my view, a research topic, not a routine reality. Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing. I disagree that we don't have sufficient or robust tools to author and publish web pages. I find it ironic that we are still debating on this issue as if we are in the early-mid 90s. Or ignoring [2], or the possibility to use a service which offers [3] to publish a (pardon me for saying) but a friggin' web page. If it is about coding, I find it unreasonable or unprofessional to think that a Computer/Web Scientist in 2014 that's publicly funded to do their academic endeavors is incapable of groking HTML. But, somehow LaTeX is presumed to be okay for the new post-graduate that's coming in. Really? Or is the real reason that no one is asking them to do otherwise? They can randomly pick a WYSIWYG editor tool or an existing publishing service. No one is forcing anyone to hand-code anything. Just as no one is forced to hand code LaTeX. We have the tools and even services to help us do all of that. Both from and outside of SW. We had them for a long time. What was lacking was a continuous green light to use them. That light stopped flashing as you've mentioned. But again, our core problems are not technical in nature. I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences. I disagree. The fact that workshops or tracks on linked science or semantic publishing didn't deliver is a clear sign that they have the wrong process at the root. When those workshops ask for submissions to be in PDF, that's the definition of irony. There is no useful machine-friendly research objects! Opportunity lost at every single CfP. Yet, we eloquently describe hypothetical systems or tools that will one day do all the magic for us instead of taking a good look at what's right in front of us. So, lets talk about putting the cart before the horse. A lot of time and energy (e.g., public funding) that could have been better used simply by actually *having the data*. And, then figuring out how to utilize that. There is no data, so what's there to analyze or learn from? Some research trying to figure out what to do with trivial and limited metadata e.g., title, abstract, authors, subjects? Is data.semanticweb.org (dog food) the best we can show for our dogfooding ability? I can't search/query for research knowledge on topic T, that used variables X, Y, which implemented a workflow step S, that's cited by or used those exact parameters, that happens to use the datasets that I'm planning to use in my research. Reproducibility: 0 Comparability: 0 Discovery: 0 Reuse: 0 H-Index: +1? Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)). Nothing is stopping us from doing things in parallel and we are in fact. Close-by efforts from workshops to force11, public-dwbp-wg, public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, besides the whole SW/LD stack, which benefits scientific research communication and
Re: scientific publishing process (was Re: Cost and access)
Hello Paul, On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote: Yes. We are setting the bar too low. The field of knowledge computing will only reach maturity when authors can publish their theses in such a manner that one can programmatically extract the concepts, propositions, and arguments; I thought Kingsley is the only one seriously suggesting that we communicate in triples. Let's take one step back to the proposal of making research datasets machine readable with RDF. Please go to http://crcns.org/NWB Have a look at an example dataset: http://crcns.org/data-sets/hc/hc-3/about-hc-3 The total size of the data is about 433 GB compressed Even if you do not use triples for all of that (which would be insane), specifying a structured data container is a very difficult task. So instead of talking about setting the bar higher, why not just help the people over there with their problem? Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel pgpwtKuCa75P2.pgp Description: PGP signature
Re: scientific publishing process (was Re: Cost and access)
Executive summary: 1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents; 2) Start to contribute to the Open Source community. Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Longer version. I too have a deep sense of deja vu all over yet again :-) But I have learned something - on-one seems to collaborate with people outside the tecchy world. Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox. And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML. Anyway… I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment. a) Document format; b) Metadata about the publication, both superficial and deep; c) Data, systems and workflow about the research. But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world. There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes. I know about a number of things, but there will be others. (a) and (b) Repositories (because that is what we are talking about) http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it. Eg http://eprints.soton.ac.uk/id/eprint/271458 I even have an existing endpoint with all the ePrints RDF in it - http://foreign.rkbexplorer.com, with currently 24G 182854666 triples, so such software can be used. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do. It is mature technology with 100s of person-years invested. And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users. It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid. And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there! Or a more challenging thing would be to make http://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)! (c) Workflows and Datasets I have mentioned http://www.myexperiment.org before, but can’t remember if I have mentioned http://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for example https://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Hugh On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually semantic in some
Re: scientific publishing process (was Re: Cost and access)
PDFs are surprisingly flexible and open containers for transporting around Stuff hi, i'm feeling tempted to add something provocative ;-) PDFs are surprisingly mature in disguising all the 'bla bla' and make it look nice... = http://tractatus-online.appspot.com/Tractatus/jonathan/index.html wkr turnguard | Jürgen Jakobitsch, | Software Developer | Semantic Web Company GmbH | Mariahilfer Straße 70 / Neubaugasse 1, Top 8 | A - 1070 Wien, Austria | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22 COMPANY INFORMATION | web : http://www.semantic-web.at/ | foaf : http://company.semantic-web.at/person/juergen_jakobitsch PERSONAL INFORMATION | web : http://www.turnguard.com | foaf : http://www.turnguard.com/turnguard | g+: https://plus.google.com/111233759991616358206/posts | skype : jakobitsch-punkt | xmlns:tg = http://www.turnguard.com/turnguard#; 2014-10-04 14:47 GMT+02:00 Norman Gray nor...@astro.gla.ac.uk: Bernadette, hello. On 2014 Oct 4, at 00:36, Bernadette Hyland bhyl...@3roundstones.com wrote: ... a really useful message which pulls several of these threads together. The following is a rather fragmentary response. As a reference point, I tend to think publication = LaTeX - PDF. To pre-dispel a misconception, here, I'm not being a cheerleader for PDF below, but a fair fraction of the antagonism directed towards PDF in this thread is, I think, misplaced -- PDF is not the problem. We'd do ourselves a huge favor if we showed (STM) publishing executives why this Linked Data stuff matters anyway. They know. A surprisingly large fraction of the Article Processing Charge we pay to them goes on extracting, managing and sharing metadata. That includes DOIs, Crossref feeds, science direct, and so on and so on, and so (it seems) on. It also includes conversion to XML: if you submit a LaTeX file to a big publisher, the first thing they'll do is convert it to XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and preserve that; several of them then re-generate LaTeX for final production. To a large extent, I suspect publishers now regard metadata management as their Job -- in the sense of their contribution to the scholarly endeavour -- and they could do without the dead trees. If you can offer them a way of making metadata _insertion_ easier, which is cost effective, can be scaled up, and which a _broad_ range of authors will accept (the hard bit), they'll rip your arm off. 1) PDF works well for (STM) publishers who require fixed page display; Yes, and for authors. Given an alternative between an HTML version of a paper and a PDF version, I will _always_ choose the PDF, because it's zero-hassle, more reliably faithful to the author's original, more readable, and I can read it in the bath. 2) PDF doesn't take advantage of the advances we've made in machine readability; If by this you mean RDF, then yes, the naive ways of generating PDFs are not RDF-aware. So we shouldn't be naive... XMP is an ISO standard (as PDF is, and like it originating from Adobe) and is a type of RDF (well, an irritatingly 90% profile of RDF, but let that pass). Though it's not trivial, it's not hard to generate an XMP packet and get it into a PDF, and once there, the metadata job is mostly done. 3) In fact, PDFs suck on eBook readers which are all about flexible page layout; and Sure, but they're not intended for e-book readers, so of course they're poor at that. 4) We already have the necessary Web Standards to address the problem, so no need to recreate the wheel. If, again, you mean RDF, then I agree completely. -- Produce a Web-based tool that allows researchers to share their [privately | publicly ] funded knowledge and produces a variety of outputs: LaTeX, PDF and carries with it a machine readable representation. Well, not web-based: I'd want something I can run on my own machine. Do people agree with the following SOLUTION approach? The international standards to solve this exist. Standards from W3C and the International Digital Publishing Forum (IDPF).[2] Use (X)HTML for generalized document creation/rendering. Use CSS for styling. Use MathML for formulas. Use JS for action. Use RDF to model the metadata within HTML. PDF and XMP are both ISO standards, too. LaTeX isn't a Standard standard, but it's pretty damn stable. MathML one would _not_ want to type. The only ways of generating MathML, that I'm slightly familiar with, start with TeX syntax. There are presumably GUI-based ones, too *shudder*. I propose a 'walk before we run' approach but do better than basic metadata (i.e., title, author name, institution, abstract). Link to other scholarly communities/projects such as Vivo.[3] I generate Atom feeds for my PDF lecture notes. The feed content is extracted from the XMP and from the /Author, /Title, etc, metadata within the PDF. That metadata gets there
Re: scientific publishing process (was Re: Cost and access)
On 10/4/14 7:14 AM, Hugh Glaser wrote: Executive summary: 1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents; 2) Start to contribute to the Open Source community. Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Longer version. I too have a deep sense of deja vu all over yet again:-) But I have learned something - on-one seems to collaborate with people outside the tecchy world. Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox. And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML. Anyway… I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment. a) Document format; b) Metadata about the publication, both superficial and deep; c) Data, systems and workflow about the research. But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world. There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes. I know about a number of things, but there will be others. (a) and (b) Repositories (because that is what we are talking about) http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it. Eghttp://eprints.soton.ac.uk/id/eprint/271458 I even have an existing endpoint with all the ePrints RDF in it -http://foreign.rkbexplorer.com, with currently 24G 182854666 triples, so such software can be used. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do. It is mature technology with 100s of person-years invested. And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users. It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid. And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there! Or a more challenging thing would be to makehttp://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)! (c) Workflows and Datasets I have mentionedhttp://www.myexperiment.org before, but can’t remember if I have mentionedhttp://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for examplehttps://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Hugh +1 -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature