Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Couture Marc
Heather wrote : "An author wishing to pre-authorize translations but only under 
particular conditions [...] should [...] grant additional permissions [...] 
with a CC+ license."

First, note that CC+ it's not a CC license, but a CC protocol (or tool). The 
distinction is important because what's in the "+" doesn't involve Creative 
Commons at all: it only redirects users to where these permissions are 
described and(or) can be obtained.

But, if one believes that one of the goals of CC licenses was to simplify the 
life of users who need to understand the permissions granted to them, by 
limiting the number of permissions and normalizing their formulation, this goes 
exactly against this goal. Statements of supplementary permissions will raise 
new uncertainties, above those already present in the standard CC conditions 
(-NC, above all). For instance, I would like to see a definition of 
"appropriate professional translator" that wouldn't raise uncertainties, 
considering that translation will involve different countries, with their 
traditions, educational systems, professional organizations, etc.

A more appropriate use of CC+ in this specific example would be to (1) indicate 
that translations are welcomed and (2) provide the way to seek permission (for 
instance, giving the email of the person who can grant it, or confirming that 
it's the author, who is normally identified through attribution). A potential 
translator could then explain why he or she can do the job adequately.

By the way, although CC+ has been around for more than 9 years; does anyone 
know it it's widely used. A Google search didn't bring me much information. 
Creative Commons asks users of this protocol to let them know, but there seems 
to be only a handful of them, not related to the issues discussed here 
https://wiki.creativecommons.org/wiki/Category:License_and_CCPlus 
https://wiki.creativecommons.org/wiki/Category:CC%2B

Marc Couture


De : goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] De la part de 
Heather Morrison
Envoyé : 24 janvier 2017 09:27
À : Global Open Access List (Successor of AmSci)
Objet : Re: [GOAL] How much of the content in open repositories is able to meet 
the definition of open access?

hi Fiona,

It seems we have been thinking along the same lines - I have a similar proposal 
that tries to address the same issue.

An author wishing to pre-authorize translations but only under particular 
conditions, e.g. that the translation is done by an appropriately qualified 
translator and a disclaimer is used, should use a restrictive license (either 
All Rights Reserved but free-to-read, CC-BY-NC-ND or CC-BY-ND) but grant 
additional permissions. In the case of CC licenses, this can be done with a CC+ 
license.

As explained on the Creative Commons page: "You have the option of granting 
permissions
 above and beyond what the license allows; for example, allowing licensees to 
translate ND-licensed material. If so, consider using 
CC+ to indicate the additional 
permissions offered."

The reason you need to start with the more restrictive license and then grant 
additional permissions is because you cannot use a more open license and then 
attach additional restrictions - this defeats the purpose of open licensing.

Here is a boilerplate approach to put on a terms and conditions website to 
explain these permissions:

Translations can be made without explicitly seeking permission under the 
following conditions:

Professional qualifications of translator: [insert definition of what you 
consider to be an appropriate professional]

A disclaimer must be prominently placed on the work as follows [insert 
disclaimer language and any other terms such as placement]

Certification by the original publisher - provide instructions for the 
translator in case they wish to have the translation certified.

If the author (or publisher) does not want to grant blanket commercial rights 
but is willing to grant some rights that others might consider commercial, this 
can also be specified here. For example, if the author or publisher of a book 
expects royalties if a downstream for-profit publisher actually makes money, 
details might be specified here so people know what to expect, e.g. after costs 
of producing the translation are covered, royalties of x are due to y.

I don't suggest that this is the final answer on how to handle translations but 
hope that this is a useful discussion.

best,

Heather Morrison

On 2017-01-24, at 8:37 AM, Fiona Bradley 
>
 wrote:


Hi Heather,

I think there's too much variation in copyright arrangements and agreements for 
me to comment on that but indeed should authors prefer and there's no other 
arrangements in place stating otherwise you could put authors in place 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Peter Murray-Rust
The statement:
"Copyright is only invoked if you want to actually copy an original table
for inclusion in a publication"

is completely wrong.

The question of whether it is legal to point to another work depends on the
jurisdiction. It is Ancillary Copyright
see
http://www.communia-association.org/2016/08/25/eu-commission-yes-will-create-new-ancillary-copyright-news-publishers-please-stop-calling-link-tax/
and
https://en.wikipedia.org/wiki/Ancillary_copyright_for_press_publishers
where laws have been passed (e.g. in Spain) to copyright and tax
hyperlinks. There have been indications of  pressure from some scholarly
publishers to do the same for scholarly articles
http://ancillarycopyright.eu/news/2016-06-15/beware-neighbouring-right-publishers-ancillary-copyright-steroids-potential-consequences-general-nei
which could require scholars to pay publishers for the right to link to
their sites.

This is not hypothetical - it is being fought bitterly in Europe.



On Tue, Jan 24, 2017 at 4:08 PM, Heather Morrison <
heather.morri...@uottawa.ca> wrote:

> hi Peter,
>
> On 2017-01-24, at 10:10 AM, Peter Murray-Rust 
>  wrote:
>
>
>
> On Tue, Jan 24, 2017 at 2:10 PM, Heather Morrison <
> heather.morri...@uottawa.ca> wrote:
>
>> Another critique that may be more relevant to this argument: I challenge
>> PMR's contention that it is necessary to limit this kind of research to
>> works that are licensed CC-BY. If you gather data from a great many
>> different tables and analyze it, what you will be publishing is your own
>> work.
>>
>> This is no different from doing a great deal of reading and thinking and
>> writing a new work that draws on this knowledge, with appropriate citations
>> to the works that you have read.
>>
>> Copyright is only invoked if you want to actually copy an original table
>> for inclusion in a publication. If you are drawing on data from thousands
>> of tables it is not clear how often this will happen. If what you want to
>> copy is an insubstantial amount this would be covered under fair dealing.
>> If the work is free-to-read, whether All Rights Reserved or under an open
>> license, you can point readers to the original. At worst, this is a minor
>> inconvenience.
>>
>
> This is completely wrong. The problem is that this is a legal issue and
> copyright law, by default, covers all aspects of copying. Copying material
> into a machine for the purpose of mining involves copyright. Whether it
> seems reasonable or fair is irrelevant. If you carry out mining then you
> should be prepared to answer in court.
>
>
> "This is completely wrong" is a rather broad statement. Can you explain
> how my statement "if the work is free to read…you can point readers to the
> original". Are you arguing that it is illegal to point people to a
> free-to-read work?
>
> As Marc Couture noted on the GOAL list yesterday, with respect to internet
> search engine's mining and reproduction of portions of work, Google has won
> a lawsuit:
> http://mailman.ecs.soton.ac.uk/pipermail/goal/2017-January/004340.html
>
> From the Wikipedia entry: "*Field v. Google, Inc.*, 412 F.Supp. 2d 1106
> (D. Nev. 2006)  is
> a case where Google Inc. 
> successfully defended a lawsuit for copyright infringement
> . Field argued that
> Google infringed his exclusive right to reproduce his copyrighted works
> when it "cached " his website
> and made a copy of it available on its search engine. Google raised
> multiple defenses: fair use , implied
> license , estoppel
> , and Digital Millennium
> Copyright Act
>  safe
> harbor protection. The court granted Google's motion for summary judgment
>  and denied Field's
> motion for summary judgment."
>
> I myself am not familiar with this case. Is the Wikipedia entry wrong?
>
>
> The problem is compounded by:
> * it is jurisdiction-dependent. Fair-use only exists in certain domains.
> It is not the same as fair dealing which is generally weaker. What is
> permissible in the US may not be in UK and vice versa.
>
>
> Agreed. I argue that universal strong fair use / fair dealing is something
> we need to fight for, not something to take for granted.
>
> * It is extremely complex. Guessing the law will not be useful.
>
>
> I am aware. I teach and publish in the area of information policy and
> participate in government consultations.
>
> * Much of the law has not been tested in court. "Non-commercial" is not
> what you or I would like it to mean. It is what a court finds when I or
> others are summoned before it.
>
>
> I do not argue that "non-commercial" has 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Heather Morrison
hi Peter,

On 2017-01-24, at 10:10 AM, Peter Murray-Rust 
>
 wrote:



On Tue, Jan 24, 2017 at 2:10 PM, Heather Morrison 
> wrote:
Another critique that may be more relevant to this argument: I challenge PMR's 
contention that it is necessary to limit this kind of research to works that 
are licensed CC-BY. If you gather data from a great many different tables and 
analyze it, what you will be publishing is your own work.

This is no different from doing a great deal of reading and thinking and 
writing a new work that draws on this knowledge, with appropriate citations to 
the works that you have read.

Copyright is only invoked if you want to actually copy an original table for 
inclusion in a publication. If you are drawing on data from thousands of tables 
it is not clear how often this will happen. If what you want to copy is an 
insubstantial amount this would be covered under fair dealing. If the work is 
free-to-read, whether All Rights Reserved or under an open license, you can 
point readers to the original. At worst, this is a minor inconvenience.

This is completely wrong. The problem is that this is a legal issue and 
copyright law, by default, covers all aspects of copying. Copying material into 
a machine for the purpose of mining involves copyright. Whether it seems 
reasonable or fair is irrelevant. If you carry out mining then you should be 
prepared to answer in court.

"This is completely wrong" is a rather broad statement. Can you explain how my 
statement "if the work is free to read…you can point readers to the original". 
Are you arguing that it is illegal to point people to a free-to-read work?

As Marc Couture noted on the GOAL list yesterday, with respect to internet 
search engine's mining and reproduction of portions of work, Google has won a 
lawsuit:
http://mailman.ecs.soton.ac.uk/pipermail/goal/2017-January/004340.html

>From the Wikipedia entry: "Field v. Google, Inc., 412 F.Supp. 2d 1106 (D. Nev. 
>2006) is a case where 
>Google Inc. successfully defended a 
>lawsuit for copyright 
>infringement. Field 
>argued that Google infringed his exclusive right to reproduce his copyrighted 
>works when it "cached" his website 
>and made a copy of it available on its search engine. Google raised multiple 
>defenses: fair use, implied 
>license, 
>estoppel, and Digital Millennium 
>Copyright Act 
>safe harbor protection. The court granted Google's motion for summary 
>judgment and denied Field's 
>motion for summary judgment."

I myself am not familiar with this case. Is the Wikipedia entry wrong?


The problem is compounded by:
* it is jurisdiction-dependent. Fair-use only exists in certain domains. It is 
not the same as fair dealing which is generally weaker. What is permissible in 
the US may not be in UK and vice versa.

Agreed. I argue that universal strong fair use / fair dealing is something we 
need to fight for, not something to take for granted.

* It is extremely complex. Guessing the law will not be useful.

I am aware. I teach and publish in the area of information policy and 
participate in government consultations.

* Much of the law has not been tested in court. "Non-commercial" is not what 
you or I would like it to mean. It is what a court finds when I or others are 
summoned before it.

I do not argue that "non-commercial" has a specific meaning. Your argument (if 
I understand correctly) is that non-commercial is overly broad and by not 
granting commercial rights we may be restricting uses that one might actually 
like to permit. I agree with this analysis, just not the solution. That is, I 
am comfortable with some vagueness in the terminology and consider it more 
important not to grant blanket commercial rights. In other words, I think we 
agree on the facts, just not what to do about them.


I have been involved in this for over 4 years in the UK and in Europe 
(Parliament and Commission). There is no consensus on what should be allowed 
and what will ultimately be decided by the Commission and Member States. I have 
taken legal opinion on some of this and consulted with other experts and the 
answers are often unclear.

The legality of Text and Data Mining is formally unrelated to whether the miner 
publishes the results or not.


If you prefer to limit your research to works that are CC-BY licensed, it is 
your right to make this choice. Many other researchers, myself included, work 
with a wide range of data and do not choose to 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Peter Murray-Rust
On Tue, Jan 24, 2017 at 2:10 PM, Heather Morrison <
heather.morri...@uottawa.ca> wrote:

> Another critique that may be more relevant to this argument: I challenge
> PMR's contention that it is necessary to limit this kind of research to
> works that are licensed CC-BY. If you gather data from a great many
> different tables and analyze it, what you will be publishing is your own
> work.
>
> This is no different from doing a great deal of reading and thinking and
> writing a new work that draws on this knowledge, with appropriate citations
> to the works that you have read.
>
> Copyright is only invoked if you want to actually copy an original table
> for inclusion in a publication. If you are drawing on data from thousands
> of tables it is not clear how often this will happen. If what you want to
> copy is an insubstantial amount this would be covered under fair dealing.
> If the work is free-to-read, whether All Rights Reserved or under an open
> license, you can point readers to the original. At worst, this is a minor
> inconvenience.
>

This is completely wrong. The problem is that this is a legal issue and
copyright law, by default, covers all aspects of copying. Copying material
into a machine for the purpose of mining involves copyright. Whether it
seems reasonable or fair is irrelevant. If you carry out mining then you
should be prepared to answer in court.

The problem is compounded by:
* it is jurisdiction-dependent. Fair-use only exists in certain domains. It
is not the same as fair dealing which is generally weaker. What is
permissible in the US may not be in UK and vice versa.
* It is extremely complex. Guessing the law will not be useful.
* Much of the law has not been tested in court. "Non-commercial" is not
what you or I would like it to mean. It is what a court finds when I or
others are summoned before it.

I have been involved in this for over 4 years in the UK and in Europe
(Parliament and Commission). There is no consensus on what should be
allowed and what will ultimately be decided by the Commission and Member
States. I have taken legal opinion on some of this and consulted with other
experts and the answers are often unclear.

The legality of Text and Data Mining is formally unrelated to whether the
miner publishes the results or not.


> If you prefer to limit your research to works that are CC-BY licensed, it
> is your right to make this choice. Many other researchers, myself included,
> work with a wide range of data and do not choose to limit what we gather to
> works that are licensed CC-BY. One example from my own research: if a
> publisher has a table listing APCs, I screen scrape the table, pop the data
> into a spreadsheet, and work with it.
>

The primary issue for Text and data Mining is automated analysis of many
tables. This is an inconsistency in the law that we are trying to get
legislators to change.


> Even publishers that use CC-BY for articles usually have All Rights
> Reserved for pages that contain this type of information.
>

Do you have metrics for this. Because this is incompatible with the licence
and should be challenged - as I frequently do.


> If I limited myself to data sources that are CC-BY I could not do this
> kind of research.
>

I agree that this is limiting and that is why it would be useful for
scientific material to be licensed CC BY.

In summary this is a complex legal question and the answers have to be
based on law not guesswork.


>
>
>
-- 
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Heather Morrison
Another critique that may be more relevant to this argument: I challenge PMR's 
contention that it is necessary to limit this kind of research to works that 
are licensed CC-BY. If you gather data from a great many different tables and 
analyze it, what you will be publishing is your own work.

This is no different from doing a great deal of reading and thinking and 
writing a new work that draws on this knowledge, with appropriate citations to 
the works that you have read.

Copyright is only invoked if you want to actually copy an original table for 
inclusion in a publication. If you are drawing on data from thousands of tables 
it is not clear how often this will happen. If what you want to copy is an 
insubstantial amount this would be covered under fair dealing. If the work is 
free-to-read, whether All Rights Reserved or under an open license, you can 
point readers to the original. At worst, this is a minor inconvenience.

If you prefer to limit your research to works that are CC-BY licensed, it is 
your right to make this choice. Many other researchers, myself included, work 
with a wide range of data and do not choose to limit what we gather to works 
that are licensed CC-BY. One example from my own research: if a publisher has a 
table listing APCs, I screen scrape the table, pop the data into a spreadsheet, 
and work with it. Even publishers that use CC-BY for articles usually have All 
Rights Reserved for pages that contain this type of information. If I limited 
myself to data sources that are CC-BY I could not do this kind of research.

best,

Heather Morrison

c 2017-01-24, at 4:17 AM, Peter Murray-Rust 
> wrote:

There are many activities where CC BY or a more liberal licence (CC 0) is the 
only way that modern science can be done.

Many knowledge-based projects in science , technology, medicine, use thousands 
of documents a day to extract and publish science. (We started one yesterday at 
https://github.com/ContentMine/cm-ucl/ to extract data from tables in PDF. This 
will aim to analyse 1000 papers per day - and that limit is set by the licences 
- if we were allowed we could index 10,000 papers/day in all disciplines.

To do reproducible science it is critical that the raw data (in this case 
scientific articles) are made publicly available so that others can reproduce 
the work. Any friction such as writing to the author, reading a non-standard 
licence, etc. makes the project impossible. We are often limited to using the 
Open subset (CC BY) in EuropePMC. We cannot afford to put a single CC NC, CC 
ND, "unlicensed freely available" manuscript in the repository in case we are 
sent a take-down notice. That would destroy the whole experiment.

These experiments are part of the science of the future. If we had been allowed 
to use them it is liklely that the Ebola outbreak in Liberia would have been 
predicted (The Liberian government's assessment, not mine). Whether it would 
have been prevented we don't know, but at least it would not have been impeded 
by copyright and paywalls.

Put simply. Unless the scientific material is CC BY or CC 0 we cannot use it 
for knowledge-driven STM. I have estimated that the opportunity cost of this 
can run into billions of dollars.

Repositories do not work for science. They are fragmented, non-interoperable 
and covered with prohibitions on automatic re-use. I have not met scientists 
who are systematically using institutional repositories of data mining.

It seems that the desire of arts, humanities are in direct conflict with the 
needs of STM. I note that there are few scientists posting on this list. Maybe 
this division should be recognised and the STM community should continue with 
its own policies og CC BY and the rest use whatever commonality they can 
achieve.

There are no simple solutions where the law is concerned. Only CC BY gives 
certainty. CC NC and CC ND may be valuable for A+H but they are very difficult 
to operate in any area of endeavour.

I was told 12 years ago on this list that I should be patient and the Green 
program would deliver universal access and then I could start mining the 
literature. I have been patient but it hasn't happened. I am told that OpenAIRE 
still doesn't expose full-text.  We should recognize it and look for 
alternative solutions.




On Mon, Jan 23, 2017 at 7:55 PM, Heather Morrison 
> wrote:
With all due respect to the people who created and shared the "how open is it" 
spectrum tool, I find some of the underlying assumptions to be problematic.

For example the extreme of closed access assumes that having to pay 
subscriptions, membership, pay per view etc. is the far end of closed. My 
perspective is that the opposite of open is closure of knowledge. Climate 
change denied, climate scientists muzzled, fired or harassed, climate change 
science defunded, climate data taken down and destroyed, 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Heather Morrison
hi Fiona,

It seems we have been thinking along the same lines - I have a similar proposal 
that tries to address the same issue.

An author wishing to pre-authorize translations but only under particular 
conditions, e.g. that the translation is done by an appropriately qualified 
translator and a disclaimer is used, should use a restrictive license (either 
All Rights Reserved but free-to-read, CC-BY-NC-ND or CC-BY-ND) but grant 
additional permissions. In the case of CC licenses, this can be done with a CC+ 
license.

As explained on the Creative Commons page: "You have the option of granting 
permissions
 above and beyond what the license allows; for example, allowing licensees to 
translate ND-licensed material. If so, consider using 
CC+ to indicate the additional 
permissions offered."

The reason you need to start with the more restrictive license and then grant 
additional permissions is because you cannot use a more open license and then 
attach additional restrictions - this defeats the purpose of open licensing.

Here is a boilerplate approach to put on a terms and conditions website to 
explain these permissions:

Translations can be made without explicitly seeking permission under the 
following conditions:

Professional qualifications of translator: [insert definition of what you 
consider to be an appropriate professional]

A disclaimer must be prominently placed on the work as follows [insert 
disclaimer language and any other terms such as placement]

Certification by the original publisher - provide instructions for the 
translator in case they wish to have the translation certified.

If the author (or publisher) does not want to grant blanket commercial rights 
but is willing to grant some rights that others might consider commercial, this 
can also be specified here. For example, if the author or publisher of a book 
expects royalties if a downstream for-profit publisher actually makes money, 
details might be specified here so people know what to expect, e.g. after costs 
of producing the translation are covered, royalties of x are due to y.

I don't suggest that this is the final answer on how to handle translations but 
hope that this is a useful discussion.

best,

Heather Morrison

On 2017-01-24, at 8:37 AM, Fiona Bradley 
>
 wrote:

Hi Heather,

I think there’s too much variation in copyright arrangements and agreements for 
me to comment on that but indeed should authors prefer and there’s no other 
arrangements in place stating otherwise you could put authors in place of 
institution/publisher in my comment.

I think license choice and translation inform but shouldn’t necessarily drive 
eachother. Personally speaking, I tend to start with an assumption of best 
intentions (eg CC-BY) but make room for exceptions. Under CC-BY or another open 
license it can still be helpful to have some basic procedures as a professional 
courtesy for translation in place (as simple as emailing the author or as 
formal as a permissions procedure) as eg it avoids situations like doubled work 
- if more than one person decides to translate a paper into the same language, 
but they are unaware of each other – have had that happen before!

Fiona

From: > on behalf of 
Heather Morrison 
>
Reply-To: "Global Open Access List (Successor of AmSci)" 
>
Date: Tuesday, 24 January 2017 at 11:59 am
To: "Global Open Access List (Successor of AmSci)" 
>
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

Fiona,

Thank you for this information about professional translation services, and the 
importance of certification and disclaimers with translated works. If the 
original author wishes to ensure that translations are done by appropriately 
trained professionals, certified, etc., this is a reason to avoid using 
licenses granting blanket downstream rights to create derivatives, ie if using 
CC licenses the No Derivatives should be used.

I refer to "author" rather than "institution/publisher" as you did. If you 
support CC licensing it seems odd to me that you would assume the 
institution/publisher should make choices relating to copyright. Do you see 
authors as the copyright holders / CC licensors or do you assume copyright 
transfer to an institution / publisher who is then the CC-BY licensor (it 
appears this is similar to what Elsevier is currently doing)?

best,

Heather Morrison


 Original message 
From: Fiona Bradley >
Date: 2017-01-24 6:43 AM (GMT-05:00)

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Fiona Bradley
Hi Heather,

I think there’s too much variation in copyright arrangements and agreements for 
me to comment on that but indeed should authors prefer and there’s no other 
arrangements in place stating otherwise you could put authors in place of 
institution/publisher in my comment.

I think license choice and translation inform but shouldn’t necessarily drive 
eachother. Personally speaking, I tend to start with an assumption of best 
intentions (eg CC-BY) but make room for exceptions. Under CC-BY or another open 
license it can still be helpful to have some basic procedures as a professional 
courtesy for translation in place (as simple as emailing the author or as 
formal as a permissions procedure) as eg it avoids situations like doubled work 
- if more than one person decides to translate a paper into the same language, 
but they are unaware of each other – have had that happen before!

Fiona

From:  on behalf of Heather Morrison 

Reply-To: "Global Open Access List (Successor of AmSci)" 
Date: Tuesday, 24 January 2017 at 11:59 am
To: "Global Open Access List (Successor of AmSci)" 
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

Fiona,

Thank you for this information about professional translation services, and the 
importance of certification and disclaimers with translated works. If the 
original author wishes to ensure that translations are done by appropriately 
trained professionals, certified, etc., this is a reason to avoid using 
licenses granting blanket downstream rights to create derivatives, ie if using 
CC licenses the No Derivatives should be used.

I refer to "author" rather than "institution/publisher" as you did. If you 
support CC licensing it seems odd to me that you would assume the 
institution/publisher should make choices relating to copyright. Do you see 
authors as the copyright holders / CC licensors or do you assume copyright 
transfer to an institution / publisher who is then the CC-BY licensor (it 
appears this is similar to what Elsevier is currently doing)?

best,

Heather Morrison


 Original message 
From: Fiona Bradley 
Date: 2017-01-24 6:43 AM (GMT-05:00)
To: "Global Open Access List (Successor of AmSci)" 
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

Hi all,

For a similar approach in terms of data, the Open Data Institute has a data 
spectrum that also looks at closed -> open pathways: 
https://theodi.org/data-spectrum

Regarding translations, in my experience having managed these processes the 
most important policies aren’t those relating to how the original work is 
licensed but whether the originating institution/publisher makes agreements 
with translators, insists on certified translations, makes disclaimers about 
whether translations are considered official or not (in the case of legal 
texts, for instance), and provides for a notice and takedown procedure. This 
risk mitigation acknowledges that however a work is licensed, whether OA or in 
formal license by the publisher, there is always the potential for issues in 
translation to occur and there needs to be procedures in place in place to 
handle that and to address where liability resides. I wouldn’t see this as a 
risk inherent to OA or reason not to license CC-BY.

In the case of medical instructions, organisations such as 
http://translatorswithoutborders.org/ and many others work extensively in this 
area to provide professional and/or certified translation, whereas journal 
articles are often translated by researchers in the field who are fluent in 
both languages but not translators.

Kind regards,
Fiona

--
Fiona Bradley
Deputy Executive Director
RLUK
Office: 020 7862 8463 Mobile: +44 7432 768 566
www.rluk.ac.uk
RLUK Twitter feed: @RL_UK
Registered Office: Senate House Library, Senate House, Malet Street, London 
WC1E 7HU
Registered Company no: 2733294
Registered Charity no: 1026543




From:  on behalf of Heather Morrison 

Reply-To: "Global Open Access List (Successor of AmSci)" 
Date: Monday, 23 January 2017 at 7:55 pm
To: "Global Open Access List (Successor of AmSci)" 
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

With all due respect to the people who created and shared the "how open is it" 
spectrum tool, I find some of the underlying assumptions to be problematic.

For example the extreme of closed access assumes that having to pay 
subscriptions, membership, pay per view etc. is the far end of closed. My 
perspective is that the opposite of open is closure of knowledge. Climate 
change denied, climate scientists muzzled, fired or 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Peter Murray-Rust
On Tue, Jan 24, 2017 at 12:19 PM, Heather Morrison <
heather.morri...@uottawa.ca> wrote:

> hi Peter,
>
> If many knowledge projects are advancing our knowledge through the means
> that you have described, surely there are others than the one you started
> yesterday? Can you provide a list or literature review of such studies?
>

There are literally thousands. In biomedicine alone there are many
conferences and competitions. An overview is given in
https://en.wikipedia.org/wiki/Biomedical_text_mining .


>
> My OA APC study uses data from different sources that do not have a common
> set of terms:
> dataverse.scholarsportal.info/dataverse
>
> I would like to note some methodological concerns with such the approach
> described by PMC
>

I assume you mean me, PMR, Not (Europe)PubMedCentral.


> (automatically gathering data from tables).Taking data from different
> studies without fully accounting for difference in methods (eg definition
> or measurement) could easily lead to false conclusions. Worse, such false
> conclusions would be highly replicable leading to false confidence in
> results, ie anyone could repeat the same mistakes and come to the same
> conclusion of unknown external validity.
>

It is very sad to be severely criticised by a scholar who has not read my
work, proposal, and website and does not understand what I am doing. There
are many cases where the data format I extract from allows precise metrics
on recall and precision of the character stream (in the current case I
expect >> 99%). You do not know my purpose - which you describe as "false
conclusions". In fact the output will be routed to expert human reviewers
and will save 90% of their time.

>
> For the 2016/17 OA APC dataset I am adding a "providence"
>

I assume you mean "provenance"


> column because the data in the 2016 APC column comes from different
> researchers with some differences in data collection. Even in a single
> dataset, to analyze one needs to understand when you are comparing apples
> with apples or macintoshes with Spartans. Automating data analysis without
> full comprehension of the data strikes me as problematic.
>

This assertion that I do not have full comprehension and that my work is
problematic is unworthy. I have pioneered automatic extraction of chemistry
and of crystallography over 40 years and have been honoured by scientific
societies for doing so. I have defined the data extraction process, shown
how it can be aggregated, provided metrics and pioneered technology that
has led to several thousand papers (by people who have built on my work).

Peter



-- 
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Heather Morrison
hi Peter,

If many knowledge projects are advancing our knowledge through the means that 
you have described, surely there are others than the one you started yesterday? 
Can you provide a list or literature review of such studies?

My OA APC study uses data from different sources that do not have a common set 
of terms:
dataverse.scholarsportal.info/dataverse

If we had to restrict data collection to CC-BY licensed works this research 
could not be done, and to the extent it could be done, publishers who do not 
want us to study them could easily opt out by not using CC-BY licenses on the 
pages where this information is found. In other words CC-BY licenses raise 
issues for data collection analysis.

I would like to note some methodological concerns with such the approach 
described by PMC (automatically gathering data from tables).Taking data from 
different studies without fully accounting for difference in methods (eg 
definition or measurement) could easily lead to false conclusions. Worse, such 
false conclusions would be highly replicable leading to false confidence in 
results, ie anyone could repeat the same mistakes and come to the same 
conclusion of unknown external validity.

For the 2016/17 OA APC dataset I am adding a "providence" column because the 
data in the 2016 APC column comes from different researchers with some 
differences in data collection. Even in a single dataset, to analyze one needs 
to understand when you are comparing apples with apples or macintoshes with 
Spartans. Automating data analysis without full comprehension of the data 
strikes me as problematic.

best,

Heather Morrison



 Original message 
From: Peter Murray-Rust 
Date: 2017-01-24 4:27 AM (GMT-05:00)
To: "Global Open Access List (Successor of AmSci)" 
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

There are many activities where CC BY or a more liberal licence (CC 0) is the 
only way that modern science can be done.

Many knowledge-based projects in science , technology, medicine, use thousands 
of documents a day to extract and publish science. (We started one yesterday at 
https://github.com/ContentMine/cm-ucl/ to extract data from tables in PDF. This 
will aim to analyse 1000 papers per day - and that limit is set by the licences 
- if we were allowed we could index 10,000 papers/day in all disciplines.

To do reproducible science it is critical that the raw data (in this case 
scientific articles) are made publicly available so that others can reproduce 
the work. Any friction such as writing to the author, reading a non-standard 
licence, etc. makes the project impossible. We are often limited to using the 
Open subset (CC BY) in EuropePMC. We cannot afford to put a single CC NC, CC 
ND, "unlicensed freely available" manuscript in the repository in case we are 
sent a take-down notice. That would destroy the whole experiment.

These experiments are part of the science of the future. If we had been allowed 
to use them it is liklely that the Ebola outbreak in Liberia would have been 
predicted (The Liberian government's assessment, not mine). Whether it would 
have been prevented we don't know, but at least it would not have been impeded 
by copyright and paywalls.

Put simply. Unless the scientific material is CC BY or CC 0 we cannot use it 
for knowledge-driven STM. I have estimated that the opportunity cost of this 
can run into billions of dollars.

Repositories do not work for science. They are fragmented, non-interoperable 
and covered with prohibitions on automatic re-use. I have not met scientists 
who are systematically using institutional repositories of data mining.

It seems that the desire of arts, humanities are in direct conflict with the 
needs of STM. I note that there are few scientists posting on this list. Maybe 
this division should be recognised and the STM community should continue with 
its own policies og CC BY and the rest use whatever commonality they can 
achieve.

There are no simple solutions where the law is concerned. Only CC BY gives 
certainty. CC NC and CC ND may be valuable for A+H but they are very difficult 
to operate in any area of endeavour.

I was told 12 years ago on this list that I should be patient and the Green 
program would deliver universal access and then I could start mining the 
literature. I have been patient but it hasn't happened. I am told that OpenAIRE 
still doesn't expose full-text.  We should recognize it and look for 
alternative solutions.




On Mon, Jan 23, 2017 at 7:55 PM, Heather Morrison 
> wrote:
With all due respect to the people who created and shared the "how open is it" 
spectrum tool, I find some of the underlying assumptions to be problematic.

For example the extreme of closed access assumes that having to pay 
subscriptions, membership, pay per view 

Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access?

2017-01-24 Thread Peter Murray-Rust
There are many activities where CC BY or a more liberal licence (CC 0) is
the only way that modern science can be done.

Many knowledge-based projects in science , technology, medicine, use
thousands of documents a day to extract and publish science. (We started
one yesterday at https://github.com/ContentMine/cm-ucl/ to extract data
from tables in PDF. This will aim to analyse 1000 papers per day - and that
limit is set by the licences - if we were allowed we could index 10,000
papers/day in all disciplines.

To do reproducible science it is critical that the raw data (in this case
scientific articles) are made publicly available so that others can
reproduce the work. Any friction such as writing to the author, reading a
non-standard licence, etc. makes the project impossible. We are often
limited to using the Open subset (CC BY) in EuropePMC. We cannot afford to
put a single CC NC, CC ND, "unlicensed freely available" manuscript in the
repository in case we are sent a take-down notice. That would destroy the
whole experiment.

These experiments are part of the science of the future. If we had been
allowed to use them it is liklely that the Ebola outbreak in Liberia would
have been predicted (The Liberian government's assessment, not mine).
Whether it would have been prevented we don't know, but at least it would
not have been impeded by copyright and paywalls.

Put simply. Unless the scientific material is CC BY or CC 0 we cannot use
it for knowledge-driven STM. I have estimated that the opportunity cost of
this can run into billions of dollars.

Repositories do not work for science. They are fragmented,
non-interoperable and covered with prohibitions on automatic re-use. I have
not met scientists who are systematically using institutional repositories
of data mining.

It seems that the desire of arts, humanities are in direct conflict with
the needs of STM. I note that there are few scientists posting on this
list. Maybe this division should be recognised and the STM community should
continue with its own policies og CC BY and the rest use whatever
commonality they can achieve.

There are no simple solutions where the law is concerned. Only CC BY gives
certainty. CC NC and CC ND may be valuable for A+H but they are very
difficult to operate in any area of endeavour.

I was told 12 years ago on this list that I should be patient and the Green
program would deliver universal access and then I could start mining the
literature. I have been patient but it hasn't happened. I am told that
OpenAIRE still doesn't expose full-text.  We should recognize it and look
for alternative solutions.




On Mon, Jan 23, 2017 at 7:55 PM, Heather Morrison <
heather.morri...@uottawa.ca> wrote:

> With all due respect to the people who created and shared the "how open is
> it" spectrum tool, I find some of the underlying assumptions to be
> problematic.
>
> For example the extreme of closed access assumes that having to pay
> subscriptions, membership, pay per view etc. is the far end of closed. My
> perspective is that the opposite of open is closure of knowledge. Climate
> change denied, climate scientists muzzled, fired or harassed, climate
> change science defunded, climate data taken down and destroyed, deliberate
> spread of misinformation.
>
> This is not a moot point. This end of the spectrum is a reality today, one
> that is far more concerning for many researchers than pay walls (not that I
> support paywalls).
>
> Fair use in listed in a row named closed access. I argue that fair use /
> fair dealing is essential to academic work and journalism, and must apply
> to all works, not just those that can be subject to academic OA policy.
>
> There is an underlying assumption about the importance and value of re-use
> / remix that omits any discussion of the pros, cons, or desirability of
> re-use / remix that I argue we should be having. Earlier today I mentioned
> some of the potential pitfalls. Now I would like to two potential pitfalls:
> mistranslation and errors in instructions for dangerous procedures.
>
> There are dangers of poor published translations to knowledge per se (ie
> introduce errors) and to the author's reputation, ie an author could easily
> be indirectly misquoted due to a poor translation. There are good reasons
> why some authors and journals hesitate to grant  downstream translations
> permissions.  Reader side translations (eg automated translation tools) are
> not the same as downstream published translations, although readers should
> be made aware of the current limitations of automated translation.
>
> If people are copying instructions for potentially dangerous procedures
>  (surgery, chemicals, engineering techniques), and they are not at least as
> expert as the original author, it might be in everyone's best interests if
> downstream readers are not invited and encouraged to manipulate the text,
> images, etc.
>
> In creative works, eg to prepare a horror flick, by all means take this
> 

Re: [GOAL] Elsevier as an open access publisher

2017-01-24 Thread Dirk Pieper

Hi,

reading the discussion about Elsevier as an "OA publisher" and the 
discussion about CC-BY as an "requirement" for OA we analysed the 
Elsevier metadata in Crossref.


Harvesting the data some days ago the most frequently used license 
information were:


675,343 : http://www.elsevier.com/open-access/userlicense/1.0/

191,530 : http://creativecommons.org/licenses/by-nc-nd/4.0/

122,013 : http://creativecommons.org/licenses/by-nc-nd/3.0/

The first one is not CC-BY but according to

https://www.elsevier.com/about/company-information/policies/open-access-licenses

the users at our universities have access to these articles, and that´s 
what counts I would say.


Out of about 15,2 million Elsevier article metadata about 989,000 
metadata records point to free accessible articles.


I don´t want to judge these numbers, but I have heard of publishers, 
that have 100% OA.


Best,

Dirk



Dirk Pieper
Bielefeld UL - Deputy Director

www.uni-bielefeld.de
base-search.net





___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal