Re: Publication of scientific research

2014-10-04 Thread Sarven Capadisli

On 2013-04-25 15:42, Daniel Schwabe wrote:

Sarven and all,
I don't have the answers to your questions. But I find it interesting that we 
could at least do a survey with authors. But we would really have to at least 
mention some *reasonable* tools that are available, otherwise I'm afraid their 
positions won't change from before.
I will discuss this within IW3C2 and see if we can include a question about 
this in one of the pre- or post- WWW conferences surveys.
In  the meantime, perhaps SWSA (who promotes ISWC) might want to follow up on 
this idea as well.
Cheers
D


Hi Daniel,

If you have any follow-up information on that, would you mind sharing?

Sorry to bring this up a year and a half later, but I'm still interested.

Thanks,

-Sarven



On Apr 25, 2013, at 10:29  - 25/04/13, Sarven Capadisli i...@csarven.ca wrote:


On 04/24/2013 09:39 PM, Daniel Schwabe wrote:

Some years ago, IW3C2, which promotes the WWW conference, and  of which I am a 
member, is very interested in furthering the use of Web standards, for all the 
reasons that have already been mentioned in this discussion, decided to ask 
authors to submit papers in (X)Html. After all, WWW is a *Web* conference! 
(This was before RDF and its associated tools were available.)
The bottom line was that authors REFUSED do submit in this format, partly 
because of lack of tools, partly because they were just comfortable with the 
existing tools. There were so many that it would have simply ruined the 
conference if the organization simply refused these submissions.
The objection was so strong that IW3C2 eventually had to change its mind, and 
keep it they way it was, and currently is.
Clearly, for some specialized communities, certain alternative formats may be 
acceptable - ontologies, in the context of sepublica, make perfect sense as an 
acceptable submission format. But when dealing with a more general audience, I 
do not believe we have the power to FORCE people to adopt any single 
specialized format - as everything else, these things emerge from a community 
consensus over time, even if first spearheaded by a smaller core group.
Before that happens, we need to have a very clear value proposition and, most 
of all, good tools for people to accept and change. Most people will not change 
their ways is not convinced that it's worth the additional effort - and having 
really good tools is a sine qua non requirement for this.
On the other hand, efforts continue to at least provide metadata in RDF, which 
has been surprisingly harder to produce year after year without requiring hand 
coding and customization each time. But we will get there, I hope.
Just my 2c...


Hi Daniel, thank you for that invaluable background.

I'll ask the community: what is the real lesson from this and how can we 
improve?

What's more important: keeping the conference running or some ideals?

Was that reaction from authors expected? Will it ever be different?

What would have happened if IW3C2 stood at its place? What would happen if 
conferences take a stand - where will authors migrate?

What would be the short and long term consequences?

Not that I challenge this, but are we sure that it is the lack of good tools 
that's holding things back? What would make the authors happy? Was there a 
survey on this?

-Sarven




Daniel Schwabe  Dept. de Informatica, PUC-Rio
Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225
Fax: +55-21-3527 1530   Rio de Janeiro, RJ 22453-900, Brasil
http://www.inf.puc-rio.br/~dschwabe












smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Sarven Capadisli

On 2014-10-04 04:14, Daniel Schwabe wrote:

As is often the case on the Internet, this discussion gives me a terrible sense 
of dejá vu. We've had this discussion many times before.
Some years back the IW3C2 (the steering committee for the WWW conference 
series, of which I am part) first tried to require HTML for the WWW conference 
paper submissions, then was forced to make it optional because authors simply 
refused to write in HTML, and eventually dropped it because NO ONE (ok, very 
very few hardy souls) actually sent in HTML submissions.
Our conclusion at the time was that the tools simply were not there, and it was 
too much of a PITA for people to produce HTML instead of using the text editors 
they are used to. Things don't seem to have changed much since.


Hi Daniel, here is my long reply as usual and I hope you'll give it a 
shot :)


I've offered *a* solution that is compatible with the existing workflow 
without asking for any extra work from the OC/PCs, with the exception 
that the Web-native technologies for the submissions are officially 
encouraged. They will get their PDF in the end to cater the existing 
pipeline. In the meantime, the community retains higher quality research 
documents.



And this is simply looking at formatting the pages, never mind the whole issue of 
actually producing hypertext (ie., turning the article's text into linked hypertext), 
beyond the easily automated ones (e.g., links to authors, references to papers, etc..). 
Producing good hypertext, and consuming it, is much harder than writing plain text. And 
most authors are not trained in producing this kind of content. Making this actually 
semantic in some sense is still, in my view, a research topic, not a routine 
reality.
Until we have robust tools that make it as easy for authors to write papers 
with the advantages afforded by PDF, without its shortcomings, I do not see 
this changing.


I disagree that we don't have sufficient or robust tools to author and 
publish web pages. I find it ironic that we are still debating on this 
issue as if we are in the early-mid 90s. Or ignoring [2], or the 
possibility to use a service which offers [3] to publish a (pardon me 
for saying) but a friggin' web page.


If it is about coding, I find it unreasonable or unprofessional to 
think that a Computer/Web Scientist in 2014 that's publicly funded to do 
their academic endeavors is incapable of groking HTML. But, somehow 
LaTeX is presumed to be okay for the new post-graduate that's coming in. 
Really? Or is the real reason that no one is asking them to do otherwise?


They can randomly pick a WYSIWYG editor tool or an existing publishing 
service. No one is forcing anyone to hand-code anything. Just as no one 
is forced to hand code LaTeX.


We have the tools and even services to help us do all of that. Both from 
and outside of SW. We had them for a long time. What was lacking was a 
continuous green light to use them. That light stopped flashing as 
you've mentioned.


But again, our core problems are not technical in nature.


I would love to see experiments (e.g., certain workshops) to try it out before 
making this a requirement for whole conferences.


I disagree. The fact that workshops or tracks on linked science or 
semantic publishing didn't deliver is a clear sign that they have the 
wrong process at the root. When those workshops ask for submissions to 
be in PDF, that's the definition of irony. There is no useful 
machine-friendly research objects! Opportunity lost at every single CfP.


Yet, we eloquently describe hypothetical systems or tools that will one 
day do all the magic for us instead of taking a good look at what's 
right in front of us.


So, lets talk about putting the cart before the horse. A lot of time and 
energy (e.g., public funding) that could have been better used simply by 
actually *having the data*. And, then figuring out how to utilize that. 
There is no data, so what's there to analyze or learn from? Some 
research trying to figure out what to do with trivial and limited 
metadata e.g., title, abstract, authors, subjects? Is 
data.semanticweb.org (dog food) the best we can show for our 
dogfooding ability?


I can't search/query for research knowledge on topic T, that used 
variables X, Y, which implemented a workflow step S, that's cited by or 
used those exact parameters, that happens to use the datasets that I'm 
planning to use in my research.


Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?


Bernadette's suggestions are a good step in this direction, although I suspect 
it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).


Nothing is stopping us from doing things in parallel and we are in fact. 
Close-by efforts from workshops to force11, public-dwbp-wg, 
public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, 
besides the whole SW/LD stack, which benefits scientific research 
communication and 

Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Michael Brunnbauer

Hello Paul,

On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote:
 Yes. We are setting the bar too low. The field of knowledge computing
 will only reach maturity when authors can publish their theses in such a
 manner that one can programmatically extract the concepts, propositions,
 and arguments;

I thought Kingsley is the only one seriously suggesting that we communicate in
triples. Let's take one step back to the proposal of making research datasets
machine readable with RDF.

Please go to http://crcns.org/NWB

Have a look at an example dataset:

 http://crcns.org/data-sets/hc/hc-3/about-hc-3

The total size of the data is about 433 GB compressed

Even if you do not use triples for all of that (which would be insane),
specifying a structured data container is a very difficult task.

So instead of talking about setting the bar higher, why not just help the 
people over there with their problem?

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpwtKuCa75P2.pgp
Description: PGP signature


Re: Publication of scientific research

2014-10-04 Thread Sarven Capadisli

On 2013-04-29 19:29, Andrea Splendiani wrote:

Hi,

ok. Let's see if we can offer xhtml+RDFa as an additional format, and see how 
people react. I'll spread the idea a bit.

best,
Andrea


Hi Andrea,

Care to share the feedback that you've received?

Thanks,

-Sarven


Il giorno 27/apr/2013, alle ore 23:05, Sarven Capadisli i...@csarven.ca ha 
scritto:


On 04/27/2013 02:31 AM, Andrea Splendiani wrote:

I'm involved in the organization of a couple of conferences and workshops.
You do need a template, as without this it's hard to have homogenous 
submissions (even for simple things as page, or html equivalent, length).
Other than this, the main issue I could see is that proceedings may require pdf 
anyway.
But we could make a partial step and ask for abstract in xhtml+rdfa, to be 
included in the online program, and full papers in xthml+RDFa as an optional 
submission.
It's a small step, but a first step.
I'm very short on time, but if there is a template, I can see if the idea finds 
some interest.
I have to admit that asking for PDFs sounds a bit retro!


Challenge accepted!

Here is the egg in XHTML+RDFa with CSS for LNCS and ACM SIG:

https://github.com/csarven/linked-research

See live:

http://linked-research.270a.info/

Will you provide the chicken? :D

I got it as close to LNCS template as possible for now. ACM SIG is on its way - 
(change stylesheet from lncs.css to acmsig.css to see current state).

I know it is not perfect, however I think it is a decent start. Best viewed in 
Firefox/Chrom(e|ium). Print to paper, or PDF from your browser to see 
alternative views. What do you think?

Now, before some of you pixel-perfectionist folks yell at me, please first 
chillax and create an issue at GitHub. Or better yet, contribute with pull 
requests. It is using Apache License 2.0. But, all feedback is most welcome!

I still don't think this is main challenge. We need go aheads from 
conferences, then we can hack up the best templates and stylesheets that this universe 
has ever seen.

-Sarven











smime.p7s
Description: S/MIME Cryptographic Signature


Visibility of the data (was Re: Formats and icing)

2014-10-04 Thread Sarven Capadisli

On 2014-10-02 00:48, Sarven Capadisli wrote:

On 2014-10-01 21:51, Kingsley Idehen wrote:

On 10/1/14 2:42 PM, Sarven Capadisli wrote:

can't use them along with schema.org.

I favour plain HTML+CSS+RDFa to get things going e.g.:

https://github.com/csarven/linked-research


What about:

HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ?

Basically, we have to get to:

HTML+CSS+(Any RDF Notation) .


Sure, why not!


Actually, I'd like to make a brief comment on this. While I agree with 
(and enjoy) your eloquent explanations on RDF, Languages, and Notations, 
and that any RDF Notation is entirely reasonable (because we can go 
from one to another at relative ease), we shouldn't overlook one 
important dimension:


*Visibility* of the data.

Perhaps this is left better as a best practice than anything else, but 
in my opinion:


RDFa is ideal when dealing with HTML for research knowledge because if 
applied correctly, it will declare all of the visible portions of the 
research process and knowledge. It is to make the information available 
as first-class data as opposed to metadata. It is less likely to be left 
behind or go stale because it is visible to the human at all times.


This is in contrast to JSON-LD or Turtle where they will be treated as 
dark metadata, or at least create duplicate information subject to 
desynchronize. While JSON-LD and Turtle have their strengths, they are 
unnecessary when concerning the most relevant parts of the document 
which is already visible, e.g., concepts, hypothesis, methodological 
steps, variables, figures, tables, evaluation, conclusions.


Again, this is not meant to force anyone to use a particular RDF 
notation. Getting HTML+CSS in the picture is a huge win itself as far as 
I'm concerned :) Then applying RDF notation is a nice reasonable step 
forward.


* I am conveniently leaving out Microdata from this discussion because I 
don't feel it is still relevant.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Hugh Glaser
Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment 
instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Longer version.
I too have a deep sense of deja vu all over yet again :-)

But I have learned something - on-one seems to collaborate with people outside 
the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc 
(unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even 
any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply 
important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few 
tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that 
could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org is an Open Source Linked Data publishing platform for 
publications that handles the document (in any format) and the shallow 
metadata, but could easily have deep as well if people generated it.
Eg http://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it - 
http://foreign.rkbexplorer.com, with currently 24G  182854666 triples, so such 
software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and 
there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar 
communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be 
a trivialish task to move the great DogFood efforts over to it. DogFood really 
is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open 
Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to make http://www.dspace.org do what we 
want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentioned http://www.myexperiment.org before, but can’t remember if I 
have mentioned http://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for example https://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Hugh

 On 4 Oct 2014, at 03:14, Daniel Schwabe dschw...@inf.puc-rio.br wrote:
 
 As is often the case on the Internet, this discussion gives me a terrible 
 sense of dejá vu. We've had this discussion many times before.
 Some years back the IW3C2 (the steering committee for the WWW conference 
 series, of which I am part) first tried to require HTML for the WWW 
 conference paper submissions, then was forced to make it optional because 
 authors simply refused to write in HTML, and eventually dropped it because NO 
 ONE (ok, very very few hardy souls) actually sent in HTML submissions.
 Our conclusion at the time was that the tools simply were not there, and it 
 was too much of a PITA for people to produce HTML instead of using the text 
 editors they are used to. Things don't seem to have changed much since.
 And this is simply looking at formatting the pages, never mind the whole 
 issue of actually producing hypertext (ie., turning the article's text into 
 linked hypertext), beyond the easily automated ones (e.g., links to authors, 
 references to papers, etc..). Producing good hypertext, and consuming it, is 
 much harder than writing plain text. And most authors are not trained in 
 producing this kind of content. Making this actually semantic in some 

Re: Publication of scientific research

2014-10-04 Thread Alexander Garcia Castro
what was the proposal/document that did not meet the standard of 12 pages?
can u share it with me?

On Tue, Apr 30, 2013 at 4:01 AM, Phillip Lord phillip.l...@newcastle.ac.uk
wrote:

 Sarven Capadisli i...@csarven.ca writes:
  I made a simple proposal [1]. Zero direct responses. Why?
 

 You proposal didn't meet the standard 12 pages in LNCS, so everyone
 discarded it as having no weight.

 I like your proposal. I think it is very good. I especially like the
 idea of submitting a URL. The URLs could be harvested and stored at
 archive.org, which would help to address the digital preservation issue.

 Phil




-- 
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Jürgen Jakobitsch
PDFs are surprisingly flexible and open containers for transporting around
Stuff

hi, i'm feeling tempted to add something provocative ;-)

PDFs are surprisingly mature in disguising all the 'bla bla' and make it
look nice...

= http://tractatus-online.appspot.com/Tractatus/jonathan/index.html

wkr turnguard




| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web   : http://www.semantic-web.at/
| foaf  : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web   : http://www.turnguard.com
| foaf  : http://www.turnguard.com/turnguard
| g+: https://plus.google.com/111233759991616358206/posts
| skype : jakobitsch-punkt
| xmlns:tg  = http://www.turnguard.com/turnguard#;

2014-10-04 14:47 GMT+02:00 Norman Gray nor...@astro.gla.ac.uk:


 Bernadette, hello.

 On 2014 Oct 4, at 00:36, Bernadette Hyland bhyl...@3roundstones.com
 wrote:

 ... a really useful message which pulls several of these threads
 together.  The following is a rather fragmentary response.

 As a reference point, I tend to think publication = LaTeX - PDF.  To
 pre-dispel a misconception, here, I'm not being a cheerleader for PDF
 below, but a fair fraction of the antagonism directed towards PDF in this
 thread is, I think, misplaced -- PDF is not the problem.

  We'd do ourselves a huge favor if we showed (STM) publishing executives
 why this Linked Data stuff matters anyway.

 They know.  A surprisingly large fraction of the Article Processing Charge
 we pay to them goes on extracting, managing and sharing metadata.  That
 includes DOIs, Crossref feeds, science direct, and so on and so on, and so
 (it seems) on.  It also includes conversion to XML: if you submit a LaTeX
 file to a big publisher, the first thing they'll do is convert it to
 XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and
 preserve that; several of them then re-generate LaTeX for final production.

 To a large extent, I suspect publishers now regard metadata management as
 their Job -- in the sense of their contribution to the scholarly endeavour
 -- and they could do without the dead trees.  If you can offer them a way
 of making metadata _insertion_ easier, which is cost effective, can be
 scaled up, and which a _broad_ range of authors will accept (the hard bit),
 they'll rip your arm off.

  1) PDF works well for (STM) publishers who require fixed page display;

 Yes, and for authors.  Given an alternative between an HTML version of a
 paper and a PDF version, I will _always_ choose the PDF, because it's
 zero-hassle, more reliably faithful to the author's original, more
 readable, and I can read it in the bath.

  2) PDF doesn't take advantage of the advances we've made in machine
 readability;

 If by this you mean RDF, then yes, the naive ways of generating PDFs are
 not RDF-aware.  So we shouldn't be naive...

 XMP is an ISO standard (as PDF is, and like it originating from Adobe) and
 is a type of RDF (well, an irritatingly 90% profile of RDF, but let that
 pass).  Though it's not trivial, it's not hard to generate an XMP packet
 and get it into a PDF, and once there, the metadata job is mostly done.

  3) In fact, PDFs suck on eBook readers which are all about flexible page
 layout; and

 Sure, but they're not intended for e-book readers, so of course they're
 poor at that.

  4) We already have the necessary Web Standards to address the problem,
 so no need to recreate the wheel.

 If, again, you mean RDF, then I agree completely.

  -- Produce a Web-based tool that allows researchers to share their
 [privately | publicly ] funded knowledge and produces a variety of outputs:
 LaTeX, PDF and carries with it a machine readable representation.

 Well, not web-based: I'd want something I can run on my own machine.

  Do people agree with the following SOLUTION approach?
 
  The international standards to solve this exist. Standards from W3C and
 the International Digital Publishing Forum (IDPF).[2]  Use (X)HTML for
 generalized document creation/rendering. Use CSS for styling. Use MathML
 for formulas. Use JS for action. Use RDF to model the metadata within HTML.

 PDF and XMP are both ISO standards, too.  LaTeX isn't a Standard standard,
 but it's pretty damn stable.

 MathML one would _not_ want to type.  The only ways of generating MathML,
 that I'm slightly familiar with, start with TeX syntax.  There are
 presumably GUI-based ones, too *shudder*.

  I propose a 'walk before we run' approach but do better than basic
 metadata (i.e., title, author name, institution, abstract).  Link to other
 scholarly communities/projects such as Vivo.[3]

 I generate Atom feeds for my PDF lecture notes.  The feed content is
 extracted from the XMP and from the /Author, /Title, etc, metadata within
 the PDF.  That metadata gets there 

Re: Visibility of the data (was Re: Formats and icing)

2014-10-04 Thread Kingsley Idehen

On 10/4/14 7:07 AM, Sarven Capadisli wrote:

On 2014-10-02 00:48, Sarven Capadisli wrote:

On 2014-10-01 21:51, Kingsley Idehen wrote:

On 10/1/14 2:42 PM, Sarven Capadisli wrote:

can't use them along with schema.org.

I favour plain HTML+CSS+RDFa to get things going e.g.:

https://github.com/csarven/linked-research


What about:

HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ?

Basically, we have to get to:

HTML+CSS+(Any RDF Notation) .


Sure, why not!


Actually, I'd like to make a brief comment on this. While I agree with 
(and enjoy) your eloquent explanations on RDF, Languages, and 
Notations, and that any RDF Notation is entirely reasonable (because 
we can go from one to another at relative ease), we shouldn't overlook 
one important dimension:


*Visibility* of the data.

Perhaps this is left better as a best practice than anything else, 
but in my opinion:


RDFa is ideal when dealing with HTML for research knowledge because if 
applied correctly, it will declare all of the visible portions of 
the research process and knowledge. 


The trouble with being notation specific is that it always inadvertently 
opens up a distracting war. It doesn't work, and will never work. You 
have to apply the wisdom of Solomon in the realm of RDF  --- something 
we (as a community) have failed to do, repeatedly, over the years.


RDF is a notation agnostic language, due to its abstract nature. We 
should really take more advantage this RDF virtue.


It is to make the information available as first-class data as opposed 
to metadata.


Metadata isn't the issue a hand here. Raw data is the issue, it 
shouldn't ever be confined to any kind of silo, in regards to the Web.


It is less likely to be left behind or go stale because it is visible 
to the human at all times.


RDF (various notations) based structured data islands in HTML all share 
this quality. It isn't unique to RDFa, at all. I simply see RDFa as more 
convenient in certain scenarios e.g., that you want to markup structured 
data inline (you allude to this usage scenario further down).




This is in contrast to JSON-LD or Turtle where they will be treated as 
dark metadata, or at least create duplicate information subject to 
desynchronize. 


No, I think you aren't fully reperesenting the nature of HTML+JSON-LD 
and HTML+Turtle, in your comment above. You can effectively use them as 
raw data islands in HTML documents too, just as you can Microdata and RDFa.


If everyone chooses to use RDFa then fine, my point is that imposing 
(overtly or covertly) any RDF notation never works. In short, that's 
been RDF's problem since the days of RDF/XML.


If RDFa is the most productive notation, in a given scenario, its 
virtues will be obvious to those seeking to make their research data 
more accessible via HTML documents.


While JSON-LD and Turtle have their strengths, they are unnecessary 
when concerning the most relevant parts of the document which is 
already visible, e.g., concepts, hypothesis, methodological steps, 
variables, figures, tables, evaluation, conclusions.


I don't know how you are arriving at that conclusion when everything you 
mentioned above is an entity that's ultimately describable using RDF 
statements, in any notation.


Again, this is not meant to force anyone to use a particular RDF 
notation. 


It's better to provide guidelines for different approaches and then let 
folks choose what works for them. An alternative to that is a form of 
imposition, no matter how much its sugar-coated, unfortunately :)


Getting HTML+CSS in the picture is a huge win itself as far as I'm 
concerned :) 


That's clear, from your vantage point, but we have to think about 
everyone else, too.



Then applying RDF notation is a nice reasonable step forward.

* I am conveniently leaving out Microdata from this discussion because 
I don't feel it is still relevant.


As I've already stated, we shouldn't really be talking about any 
specific RDF notation, when the goal is to set data free from these data 
silos using RDF.





-Sarven
http://csarven.ca/#i




--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Kingsley Idehen

On 10/4/14 7:14 AM, Hugh Glaser wrote:

Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment 
instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Longer version.
I too have a deep sense of deja vu all over yet again:-)

But I have learned something - on-one seems to collaborate with people outside 
the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc 
(unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even 
any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply 
important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few 
tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that 
could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org  is an Open Source Linked Data publishing platform for 
publications that handles the document (in any format) and the shallow 
metadata, but could easily have deep as well if people generated it.
Eghttp://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it 
-http://foreign.rkbexplorer.com, with currently 24G  182854666 triples, so 
such software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and 
there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar 
communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be 
a trivialish task to move the great DogFood efforts over to it. DogFood really 
is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open 
Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to makehttp://www.dspace.org  do what we 
want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentionedhttp://www.myexperiment.org  before, but can’t remember if I 
have mentionedhttp://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows 
and datasets etc.
They are seriously mature, certainly compared with what we might build - see, 
for examplehttps://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web 
conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to 
usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, 
then let’s choose and join suitable existing activity and make it better for 
everyone.

Hugh


+1


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Paul Tyson
Hi Michael,

On Sat, 2014-10-04 at 11:19 +0200, Michael Brunnbauer wrote:
 Hello Paul,
 
 On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote:
  Yes. We are setting the bar too low. The field of knowledge computing
  will only reach maturity when authors can publish their theses in such a
  manner that one can programmatically extract the concepts, propositions,
  and arguments;
 
 I thought Kingsley is the only one seriously suggesting that we communicate in
 triples. Let's take one step back to the proposal of making research datasets
 machine readable with RDF.

I certainly was not suggesting this. It would indeed be silly to publish
large collections of empirical quantitative propositions in RDF.

Nor do I think Kingsley would endorse such efforts (but he can speak for
himself on that). I mostly admire and agree with Kingsley's
indefatigable efforts to show how easy it is to harvest the low-hanging
fruit of semantic web/linked data technologies. I just don't want that
to be mistaken for the desired end state.

 
 Please go to http://crcns.org/NWB
 
 Have a look at an example dataset:
 
  http://crcns.org/data-sets/hc/hc-3/about-hc-3
 
 The total size of the data is about 433 GB compressed
 
 Even if you do not use triples for all of that (which would be insane),
 specifying a structured data container is a very difficult task.
 
 So instead of talking about setting the bar higher, why not just help the 
 people over there with their problem?

Creating, tracking, and publishing empirical quantitative propositions
is not their biggest impediment to contributing to human knowledge.

Connecting those propositions to significant conclusions through sound
arguments is the more important problem. They will attempt to do so,
presumably, by creating monographs in an electronic source format that
has more or less structure to it. The structure will support many useful
operations, including formatting the content for different media,
hyperlinking to other resources, indexing, and metadata gleaning. The
structure will most likely *not* support any programmatic operations to
expose the logical form of the arguments in such a way that another
person could extract them and put them into his own logic machine to
confirm, deny, strengthen, or weaken the arguments.

Take for example a research paper whose argument proceeded along the
lines of All men are mortal; Socrates is a man; therefore Socrates is
mortal. Along comes a skeptic who purports to have evidence that
Socrates is not a man. He publishes the evidence in such a way that
other users can if they wish insert the conclusion from such evidence in
place of the minor premise in the original researcher's argument. Then
the conclusion cannot be affirmed. The original researcher must either
find a different form of argument to prove his conclusion, overturn the
skeptic's evidence (by further argument, also machine-processable), or
withdraw his conclusion.

This simple model illustrates how human knowledge has progressed for
millenia, mediated solely by oral, written, and visual and diagrammatic
communication. I am suggesting we enlist computers to do something more
for us in this realm than just speeding up the millenia-old mechanisms.

Of course we don't need a program to help us determine whether or not
Socrates is mortal. But what about the task of affirming or denying the
proposition, Unchecked anthropogenic climate change will destroy human
civilization. Gigabytes of data do not constitute logical argument. A
sound chain of reasoning from empirical evidence and agreed universals
is wanted. Yes, this can be done in academic prose supplemented by
charts and diagrams, and backed by digital files containing lots of
numbers. But, as Kingsley would say, that is not the best way ca. 2014.

Regards,
--Paul