Re: Publication of scientific research
On 2013-04-25 15:42, Daniel Schwabe wrote: Sarven and all, I don't have the answers to your questions. But I find it interesting that we could at least do a survey with authors. But we would really have to at least mention some *reasonable* tools that are available, otherwise I'm afraid their positions won't change from before. I will discuss this within IW3C2 and see if we can include a question about this in one of the pre- or post- WWW conferences surveys. In the meantime, perhaps SWSA (who promotes ISWC) might want to follow up on this idea as well. Cheers D Hi Daniel, If you have any follow-up information on that, would you mind sharing? Sorry to bring this up a year and a half later, but I'm still interested. Thanks, -Sarven On Apr 25, 2013, at 10:29 - 25/04/13, Sarven Capadisli i...@csarven.ca wrote: On 04/24/2013 09:39 PM, Daniel Schwabe wrote: Some years ago, IW3C2, which promotes the WWW conference, and of which I am a member, is very interested in furthering the use of Web standards, for all the reasons that have already been mentioned in this discussion, decided to ask authors to submit papers in (X)Html. After all, WWW is a *Web* conference! (This was before RDF and its associated tools were available.) The bottom line was that authors REFUSED do submit in this format, partly because of lack of tools, partly because they were just comfortable with the existing tools. There were so many that it would have simply ruined the conference if the organization simply refused these submissions. The objection was so strong that IW3C2 eventually had to change its mind, and keep it they way it was, and currently is. Clearly, for some specialized communities, certain alternative formats may be acceptable - ontologies, in the context of sepublica, make perfect sense as an acceptable submission format. But when dealing with a more general audience, I do not believe we have the power to FORCE people to adopt any single specialized format - as everything else, these things emerge from a community consensus over time, even if first spearheaded by a smaller core group. Before that happens, we need to have a very clear value proposition and, most of all, good tools for people to accept and change. Most people will not change their ways is not convinced that it's worth the additional effort - and having really good tools is a sine qua non requirement for this. On the other hand, efforts continue to at least provide metadata in RDF, which has been surprisingly harder to produce year after year without requiring hand coding and customization each time. But we will get there, I hope. Just my 2c... Hi Daniel, thank you for that invaluable background. I'll ask the community: what is the real lesson from this and how can we improve? What's more important: keeping the conference running or some ideals? Was that reaction from authors expected? Will it ever be different? What would have happened if IW3C2 stood at its place? What would happen if conferences take a stand - where will authors migrate? What would be the short and long term consequences? Not that I challenge this, but are we sure that it is the lack of good tools that's holding things back? What would make the authors happy? Was there a survey on this? -Sarven Daniel Schwabe Dept. de Informatica, PUC-Rio Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225 Fax: +55-21-3527 1530 Rio de Janeiro, RJ 22453-900, Brasil http://www.inf.puc-rio.br/~dschwabe smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 2014-10-04 04:14, Daniel Schwabe wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. Hi Daniel, here is my long reply as usual and I hope you'll give it a shot :) I've offered *a* solution that is compatible with the existing workflow without asking for any extra work from the OC/PCs, with the exception that the Web-native technologies for the submissions are officially encouraged. They will get their PDF in the end to cater the existing pipeline. In the meantime, the community retains higher quality research documents. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually semantic in some sense is still, in my view, a research topic, not a routine reality. Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing. I disagree that we don't have sufficient or robust tools to author and publish web pages. I find it ironic that we are still debating on this issue as if we are in the early-mid 90s. Or ignoring [2], or the possibility to use a service which offers [3] to publish a (pardon me for saying) but a friggin' web page. If it is about coding, I find it unreasonable or unprofessional to think that a Computer/Web Scientist in 2014 that's publicly funded to do their academic endeavors is incapable of groking HTML. But, somehow LaTeX is presumed to be okay for the new post-graduate that's coming in. Really? Or is the real reason that no one is asking them to do otherwise? They can randomly pick a WYSIWYG editor tool or an existing publishing service. No one is forcing anyone to hand-code anything. Just as no one is forced to hand code LaTeX. We have the tools and even services to help us do all of that. Both from and outside of SW. We had them for a long time. What was lacking was a continuous green light to use them. That light stopped flashing as you've mentioned. But again, our core problems are not technical in nature. I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences. I disagree. The fact that workshops or tracks on linked science or semantic publishing didn't deliver is a clear sign that they have the wrong process at the root. When those workshops ask for submissions to be in PDF, that's the definition of irony. There is no useful machine-friendly research objects! Opportunity lost at every single CfP. Yet, we eloquently describe hypothetical systems or tools that will one day do all the magic for us instead of taking a good look at what's right in front of us. So, lets talk about putting the cart before the horse. A lot of time and energy (e.g., public funding) that could have been better used simply by actually *having the data*. And, then figuring out how to utilize that. There is no data, so what's there to analyze or learn from? Some research trying to figure out what to do with trivial and limited metadata e.g., title, abstract, authors, subjects? Is data.semanticweb.org (dog food) the best we can show for our dogfooding ability? I can't search/query for research knowledge on topic T, that used variables X, Y, which implemented a workflow step S, that's cited by or used those exact parameters, that happens to use the datasets that I'm planning to use in my research. Reproducibility: 0 Comparability: 0 Discovery: 0 Reuse: 0 H-Index: +1? Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)). Nothing is stopping us from doing things in parallel and we are in fact. Close-by efforts from workshops to force11, public-dwbp-wg, public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, besides the whole SW/LD stack, which benefits scientific research communication and
Re: scientific publishing process (was Re: Cost and access)
Hello Paul, On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote: Yes. We are setting the bar too low. The field of knowledge computing will only reach maturity when authors can publish their theses in such a manner that one can programmatically extract the concepts, propositions, and arguments; I thought Kingsley is the only one seriously suggesting that we communicate in triples. Let's take one step back to the proposal of making research datasets machine readable with RDF. Please go to http://crcns.org/NWB Have a look at an example dataset: http://crcns.org/data-sets/hc/hc-3/about-hc-3 The total size of the data is about 433 GB compressed Even if you do not use triples for all of that (which would be insane), specifying a structured data container is a very difficult task. So instead of talking about setting the bar higher, why not just help the people over there with their problem? Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel pgpwtKuCa75P2.pgp Description: PGP signature
Re: Publication of scientific research
On 2013-04-29 19:29, Andrea Splendiani wrote: Hi, ok. Let's see if we can offer xhtml+RDFa as an additional format, and see how people react. I'll spread the idea a bit. best, Andrea Hi Andrea, Care to share the feedback that you've received? Thanks, -Sarven Il giorno 27/apr/2013, alle ore 23:05, Sarven Capadisli i...@csarven.ca ha scritto: On 04/27/2013 02:31 AM, Andrea Splendiani wrote: I'm involved in the organization of a couple of conferences and workshops. You do need a template, as without this it's hard to have homogenous submissions (even for simple things as page, or html equivalent, length). Other than this, the main issue I could see is that proceedings may require pdf anyway. But we could make a partial step and ask for abstract in xhtml+rdfa, to be included in the online program, and full papers in xthml+RDFa as an optional submission. It's a small step, but a first step. I'm very short on time, but if there is a template, I can see if the idea finds some interest. I have to admit that asking for PDFs sounds a bit retro! Challenge accepted! Here is the egg in XHTML+RDFa with CSS for LNCS and ACM SIG: https://github.com/csarven/linked-research See live: http://linked-research.270a.info/ Will you provide the chicken? :D I got it as close to LNCS template as possible for now. ACM SIG is on its way - (change stylesheet from lncs.css to acmsig.css to see current state). I know it is not perfect, however I think it is a decent start. Best viewed in Firefox/Chrom(e|ium). Print to paper, or PDF from your browser to see alternative views. What do you think? Now, before some of you pixel-perfectionist folks yell at me, please first chillax and create an issue at GitHub. Or better yet, contribute with pull requests. It is using Apache License 2.0. But, all feedback is most welcome! I still don't think this is main challenge. We need go aheads from conferences, then we can hack up the best templates and stylesheets that this universe has ever seen. -Sarven smime.p7s Description: S/MIME Cryptographic Signature
Visibility of the data (was Re: Formats and icing)
On 2014-10-02 00:48, Sarven Capadisli wrote: On 2014-10-01 21:51, Kingsley Idehen wrote: On 10/1/14 2:42 PM, Sarven Capadisli wrote: can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research What about: HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ? Basically, we have to get to: HTML+CSS+(Any RDF Notation) . Sure, why not! Actually, I'd like to make a brief comment on this. While I agree with (and enjoy) your eloquent explanations on RDF, Languages, and Notations, and that any RDF Notation is entirely reasonable (because we can go from one to another at relative ease), we shouldn't overlook one important dimension: *Visibility* of the data. Perhaps this is left better as a best practice than anything else, but in my opinion: RDFa is ideal when dealing with HTML for research knowledge because if applied correctly, it will declare all of the visible portions of the research process and knowledge. It is to make the information available as first-class data as opposed to metadata. It is less likely to be left behind or go stale because it is visible to the human at all times. This is in contrast to JSON-LD or Turtle where they will be treated as dark metadata, or at least create duplicate information subject to desynchronize. While JSON-LD and Turtle have their strengths, they are unnecessary when concerning the most relevant parts of the document which is already visible, e.g., concepts, hypothesis, methodological steps, variables, figures, tables, evaluation, conclusions. Again, this is not meant to force anyone to use a particular RDF notation. Getting HTML+CSS in the picture is a huge win itself as far as I'm concerned :) Then applying RDF notation is a nice reasonable step forward. * I am conveniently leaving out Microdata from this discussion because I don't feel it is still relevant. -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Executive summary: 1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents; 2) Start to contribute to the Open Source community. Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Longer version. I too have a deep sense of deja vu all over yet again :-) But I have learned something - on-one seems to collaborate with people outside the tecchy world. Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox. And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML. Anyway… I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment. a) Document format; b) Metadata about the publication, both superficial and deep; c) Data, systems and workflow about the research. But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world. There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes. I know about a number of things, but there will be others. (a) and (b) Repositories (because that is what we are talking about) http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it. Eg http://eprints.soton.ac.uk/id/eprint/271458 I even have an existing endpoint with all the ePrints RDF in it - http://foreign.rkbexplorer.com, with currently 24G 182854666 triples, so such software can be used. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do. It is mature technology with 100s of person-years invested. And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users. It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid. And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there! Or a more challenging thing would be to make http://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)! (c) Workflows and Datasets I have mentioned http://www.myexperiment.org before, but can’t remember if I have mentioned http://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for example https://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Hugh On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote: As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before. Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions. Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since. And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually semantic in some
Re: Publication of scientific research
what was the proposal/document that did not meet the standard of 12 pages? can u share it with me? On Tue, Apr 30, 2013 at 4:01 AM, Phillip Lord phillip.l...@newcastle.ac.uk wrote: Sarven Capadisli i...@csarven.ca writes: I made a simple proposal [1]. Zero direct responses. Why? You proposal didn't meet the standard 12 pages in LNCS, so everyone discarded it as having no weight. I like your proposal. I think it is very good. I especially like the idea of submitting a URL. The URLs could be harvested and stored at archive.org, which would help to address the digital preservation issue. Phil -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac
Re: scientific publishing process (was Re: Cost and access)
PDFs are surprisingly flexible and open containers for transporting around Stuff hi, i'm feeling tempted to add something provocative ;-) PDFs are surprisingly mature in disguising all the 'bla bla' and make it look nice... = http://tractatus-online.appspot.com/Tractatus/jonathan/index.html wkr turnguard | Jürgen Jakobitsch, | Software Developer | Semantic Web Company GmbH | Mariahilfer Straße 70 / Neubaugasse 1, Top 8 | A - 1070 Wien, Austria | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22 COMPANY INFORMATION | web : http://www.semantic-web.at/ | foaf : http://company.semantic-web.at/person/juergen_jakobitsch PERSONAL INFORMATION | web : http://www.turnguard.com | foaf : http://www.turnguard.com/turnguard | g+: https://plus.google.com/111233759991616358206/posts | skype : jakobitsch-punkt | xmlns:tg = http://www.turnguard.com/turnguard#; 2014-10-04 14:47 GMT+02:00 Norman Gray nor...@astro.gla.ac.uk: Bernadette, hello. On 2014 Oct 4, at 00:36, Bernadette Hyland bhyl...@3roundstones.com wrote: ... a really useful message which pulls several of these threads together. The following is a rather fragmentary response. As a reference point, I tend to think publication = LaTeX - PDF. To pre-dispel a misconception, here, I'm not being a cheerleader for PDF below, but a fair fraction of the antagonism directed towards PDF in this thread is, I think, misplaced -- PDF is not the problem. We'd do ourselves a huge favor if we showed (STM) publishing executives why this Linked Data stuff matters anyway. They know. A surprisingly large fraction of the Article Processing Charge we pay to them goes on extracting, managing and sharing metadata. That includes DOIs, Crossref feeds, science direct, and so on and so on, and so (it seems) on. It also includes conversion to XML: if you submit a LaTeX file to a big publisher, the first thing they'll do is convert it to XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and preserve that; several of them then re-generate LaTeX for final production. To a large extent, I suspect publishers now regard metadata management as their Job -- in the sense of their contribution to the scholarly endeavour -- and they could do without the dead trees. If you can offer them a way of making metadata _insertion_ easier, which is cost effective, can be scaled up, and which a _broad_ range of authors will accept (the hard bit), they'll rip your arm off. 1) PDF works well for (STM) publishers who require fixed page display; Yes, and for authors. Given an alternative between an HTML version of a paper and a PDF version, I will _always_ choose the PDF, because it's zero-hassle, more reliably faithful to the author's original, more readable, and I can read it in the bath. 2) PDF doesn't take advantage of the advances we've made in machine readability; If by this you mean RDF, then yes, the naive ways of generating PDFs are not RDF-aware. So we shouldn't be naive... XMP is an ISO standard (as PDF is, and like it originating from Adobe) and is a type of RDF (well, an irritatingly 90% profile of RDF, but let that pass). Though it's not trivial, it's not hard to generate an XMP packet and get it into a PDF, and once there, the metadata job is mostly done. 3) In fact, PDFs suck on eBook readers which are all about flexible page layout; and Sure, but they're not intended for e-book readers, so of course they're poor at that. 4) We already have the necessary Web Standards to address the problem, so no need to recreate the wheel. If, again, you mean RDF, then I agree completely. -- Produce a Web-based tool that allows researchers to share their [privately | publicly ] funded knowledge and produces a variety of outputs: LaTeX, PDF and carries with it a machine readable representation. Well, not web-based: I'd want something I can run on my own machine. Do people agree with the following SOLUTION approach? The international standards to solve this exist. Standards from W3C and the International Digital Publishing Forum (IDPF).[2] Use (X)HTML for generalized document creation/rendering. Use CSS for styling. Use MathML for formulas. Use JS for action. Use RDF to model the metadata within HTML. PDF and XMP are both ISO standards, too. LaTeX isn't a Standard standard, but it's pretty damn stable. MathML one would _not_ want to type. The only ways of generating MathML, that I'm slightly familiar with, start with TeX syntax. There are presumably GUI-based ones, too *shudder*. I propose a 'walk before we run' approach but do better than basic metadata (i.e., title, author name, institution, abstract). Link to other scholarly communities/projects such as Vivo.[3] I generate Atom feeds for my PDF lecture notes. The feed content is extracted from the XMP and from the /Author, /Title, etc, metadata within the PDF. That metadata gets there
Re: Visibility of the data (was Re: Formats and icing)
On 10/4/14 7:07 AM, Sarven Capadisli wrote: On 2014-10-02 00:48, Sarven Capadisli wrote: On 2014-10-01 21:51, Kingsley Idehen wrote: On 10/1/14 2:42 PM, Sarven Capadisli wrote: can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research What about: HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ? Basically, we have to get to: HTML+CSS+(Any RDF Notation) . Sure, why not! Actually, I'd like to make a brief comment on this. While I agree with (and enjoy) your eloquent explanations on RDF, Languages, and Notations, and that any RDF Notation is entirely reasonable (because we can go from one to another at relative ease), we shouldn't overlook one important dimension: *Visibility* of the data. Perhaps this is left better as a best practice than anything else, but in my opinion: RDFa is ideal when dealing with HTML for research knowledge because if applied correctly, it will declare all of the visible portions of the research process and knowledge. The trouble with being notation specific is that it always inadvertently opens up a distracting war. It doesn't work, and will never work. You have to apply the wisdom of Solomon in the realm of RDF --- something we (as a community) have failed to do, repeatedly, over the years. RDF is a notation agnostic language, due to its abstract nature. We should really take more advantage this RDF virtue. It is to make the information available as first-class data as opposed to metadata. Metadata isn't the issue a hand here. Raw data is the issue, it shouldn't ever be confined to any kind of silo, in regards to the Web. It is less likely to be left behind or go stale because it is visible to the human at all times. RDF (various notations) based structured data islands in HTML all share this quality. It isn't unique to RDFa, at all. I simply see RDFa as more convenient in certain scenarios e.g., that you want to markup structured data inline (you allude to this usage scenario further down). This is in contrast to JSON-LD or Turtle where they will be treated as dark metadata, or at least create duplicate information subject to desynchronize. No, I think you aren't fully reperesenting the nature of HTML+JSON-LD and HTML+Turtle, in your comment above. You can effectively use them as raw data islands in HTML documents too, just as you can Microdata and RDFa. If everyone chooses to use RDFa then fine, my point is that imposing (overtly or covertly) any RDF notation never works. In short, that's been RDF's problem since the days of RDF/XML. If RDFa is the most productive notation, in a given scenario, its virtues will be obvious to those seeking to make their research data more accessible via HTML documents. While JSON-LD and Turtle have their strengths, they are unnecessary when concerning the most relevant parts of the document which is already visible, e.g., concepts, hypothesis, methodological steps, variables, figures, tables, evaluation, conclusions. I don't know how you are arriving at that conclusion when everything you mentioned above is an entity that's ultimately describable using RDF statements, in any notation. Again, this is not meant to force anyone to use a particular RDF notation. It's better to provide guidelines for different approaches and then let folks choose what works for them. An alternative to that is a form of imposition, no matter how much its sugar-coated, unfortunately :) Getting HTML+CSS in the picture is a huge win itself as far as I'm concerned :) That's clear, from your vantage point, but we have to think about everyone else, too. Then applying RDF notation is a nice reasonable step forward. * I am conveniently leaving out Microdata from this discussion because I don't feel it is still relevant. As I've already stated, we shouldn't really be talking about any specific RDF notation, when the goal is to set data free from these data silos using RDF. -Sarven http://csarven.ca/#i -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
On 10/4/14 7:14 AM, Hugh Glaser wrote: Executive summary: 1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents; 2) Start to contribute to the Open Source community. Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Longer version. I too have a deep sense of deja vu all over yet again:-) But I have learned something - on-one seems to collaborate with people outside the tecchy world. Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox. And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML. Anyway… I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment. a) Document format; b) Metadata about the publication, both superficial and deep; c) Data, systems and workflow about the research. But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world. There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes. I know about a number of things, but there will be others. (a) and (b) Repositories (because that is what we are talking about) http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it. Eghttp://eprints.soton.ac.uk/id/eprint/271458 I even have an existing endpoint with all the ePrints RDF in it -http://foreign.rkbexplorer.com, with currently 24G 182854666 triples, so such software can be used. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do. It is mature technology with 100s of person-years invested. And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users. It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid. And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there! Or a more challenging thing would be to makehttp://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)! (c) Workflows and Datasets I have mentionedhttp://www.myexperiment.org before, but can’t remember if I have mentionedhttp://www.wf4ever-project.org Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc. They are seriously mature, certainly compared with what we might build - see, for examplehttps://github.com/wf4ever/ro And exactly the same as the Repositories. What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series? …ditto… Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.? Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone. Hugh +1 -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: scientific publishing process (was Re: Cost and access)
Hi Michael, On Sat, 2014-10-04 at 11:19 +0200, Michael Brunnbauer wrote: Hello Paul, On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote: Yes. We are setting the bar too low. The field of knowledge computing will only reach maturity when authors can publish their theses in such a manner that one can programmatically extract the concepts, propositions, and arguments; I thought Kingsley is the only one seriously suggesting that we communicate in triples. Let's take one step back to the proposal of making research datasets machine readable with RDF. I certainly was not suggesting this. It would indeed be silly to publish large collections of empirical quantitative propositions in RDF. Nor do I think Kingsley would endorse such efforts (but he can speak for himself on that). I mostly admire and agree with Kingsley's indefatigable efforts to show how easy it is to harvest the low-hanging fruit of semantic web/linked data technologies. I just don't want that to be mistaken for the desired end state. Please go to http://crcns.org/NWB Have a look at an example dataset: http://crcns.org/data-sets/hc/hc-3/about-hc-3 The total size of the data is about 433 GB compressed Even if you do not use triples for all of that (which would be insane), specifying a structured data container is a very difficult task. So instead of talking about setting the bar higher, why not just help the people over there with their problem? Creating, tracking, and publishing empirical quantitative propositions is not their biggest impediment to contributing to human knowledge. Connecting those propositions to significant conclusions through sound arguments is the more important problem. They will attempt to do so, presumably, by creating monographs in an electronic source format that has more or less structure to it. The structure will support many useful operations, including formatting the content for different media, hyperlinking to other resources, indexing, and metadata gleaning. The structure will most likely *not* support any programmatic operations to expose the logical form of the arguments in such a way that another person could extract them and put them into his own logic machine to confirm, deny, strengthen, or weaken the arguments. Take for example a research paper whose argument proceeded along the lines of All men are mortal; Socrates is a man; therefore Socrates is mortal. Along comes a skeptic who purports to have evidence that Socrates is not a man. He publishes the evidence in such a way that other users can if they wish insert the conclusion from such evidence in place of the minor premise in the original researcher's argument. Then the conclusion cannot be affirmed. The original researcher must either find a different form of argument to prove his conclusion, overturn the skeptic's evidence (by further argument, also machine-processable), or withdraw his conclusion. This simple model illustrates how human knowledge has progressed for millenia, mediated solely by oral, written, and visual and diagrammatic communication. I am suggesting we enlist computers to do something more for us in this realm than just speeding up the millenia-old mechanisms. Of course we don't need a program to help us determine whether or not Socrates is mortal. But what about the task of affirming or denying the proposition, Unchecked anthropogenic climate change will destroy human civilization. Gigabytes of data do not constitute logical argument. A sound chain of reasoning from empirical evidence and agreed universals is wanted. Yes, this can be done in academic prose supplemented by charts and diagrams, and backed by digital files containing lots of numbers. But, as Kingsley would say, that is not the best way ca. 2014. Regards, --Paul