[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread Marcin Wojnarski
Heather,
Thank you for this deep analysis. I don't feel like an expert on 
licensing issues so I will let others comment, but every new idea on how 
in general to fund academic services like Paperity is more than welcome. 
The individual who finally discovers a satisfactory solution should get 
a Nobel Prize at the very least.

Best
Marcin


On 10/12/2014 10:22 PM, Heather Morrison wrote:
 Thank you for providing the information, Marcin. Since there is a subset of 
 the open access community that demands blanket permissions for commercial 
 rights downstream (a position I strongly disagree with), it is important to 
 discuss what the potential commercial uses might be to determine whether 
 these actually advance open access or scholarly knowledge or not.

 Some comments on these options for Paperity:

 In the subscriptions model, aggregators (such as EBSCO and ProQuest), 
 typically pay journals to include their content, or in the case of open 
 access journals, at least do not charge the journals. Charging journals to 
 include them in an aggregated service changes a revenue stream to an expense 
 stream for the journals. This makes it harder to find the revenue to produce 
 journals; a barrier to publishing journals in the first place is not in the 
 interests of advancing scholarly knowledge.

 Advertising is one of the potential revenue streams for open access journals 
 (and one that some journals are currently using). If Paperity is using 
 journal content to sell advertising, then Paperity could easily be competing 
 with the journals for this revenue.

 It is lovely to hear of Paperity's good intentions starting out to be fair, 
 efficient and acceptable for everyone. But what can happen with services like 
 this down the road when there are bills to be paid, journals are less than 
 keen to pay for this service and advertisers continue to prefer Google?

 The following is addressed to my fellow open access advocates as this is a 
 good discussion about open access downstream, and these comments are not 
 intended to apply to Paperity:

 If the purpose of insisting on re-use and commercial rights downstream is 
 designed to facilitate the design of services such as Paperity, let's discuss 
 these possibilities downstream that I argue are facilitated by CC-BY and/or 
 CC-BY-SA licenses:

 - aggregator takes CC-BY content and develops a toll-access value-added 
 service

 By way of illustration of this: Elsevier's Scopus claims to include 2,800 
 gold open access journals. Scopus is a subscription-based service.

 - aggregator takes CC-BY content, initially develops an open access 
 value-added search service, then sells the service to a for-profit company 
 that changes the business model to toll access

 By way of illustration of the sales aspect, consider that Elsevier bought 
 Mendeley and Springer bought BioMedCentral. Both are still free services, but 
 offered by largely subscription-based companies; why would we assume that 
 they would never change the business model?

 - aggregator follows the Paperity suggestion of charging journals, but 
 with a twist: does not include journals that do not pay and/or returns 
 results based on payments by journals (i.e. pay-to-play)

 Are these models seen as desirable by advocates of requiring CC-BY and/or 
 CC-BY-SA licenses? Are any of these scenarios aligned with the Budapest 
 vision? If you agree that they are not, can you explain why you think these 
 are unlikely or how the licenses would prevent this from happening? For 
 example, perhaps someone can explain how it is that Elsevier is able to 
 charge to direct people to OA journals through Scopus?

 A comment on SA: although Sharealike is the most copyleft of the CC license 
 elements, it does not come with an obligation to share in the same way, 
 rather an obligation to use the same license when including re-used content. 
 One can take a work that is licensed SA and is freely available on the web 
 and include it in a work that is limited in any of a variety of fashions 
 (part of a presentation to an audience limited to those who are willing and 
 able to pay to attend; a toll access work, etc.) - as long the work 
 downstream uses the license. In other words, CC-BY-SA does not do as much to 
 protect OA downstream as one might think.

 best,

 Heather Morrison


 On 2014-10-12, at 3:20 PM, Marcin Wojnarski wrote:

 Hi Serge,

 We're working on this. Paperity started as a non-profit academic project, 
 but yes, we need to develop a business model to make it sustainable and to 
 achieve the goal of 100% OA aggregated. Most likely we'll expect 
 participating journals to support our services, which we think is a fair 
 solution when many of them charge APCs and we actually help them do their 
 job (dissemination). We're aware however that there are also many small 
 non-profit journals which don't charge APC at all, and we definitely want to 
 aggregate them all, too. So the 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread Marcin Wojnarski

Dear Stevan,

We started with Gold, because we believe that journals play a 
fundamental role in the system of scholarly communication and every 
service that tries to facilitate access to literature must start with 
journals, not only with a flat collection of papers like the one found 
in repositories. For 400 years, journals have been the backbone of the 
system, the main structural element. They provide a brand name for 
papers, create consistent editoral policy and take responsibility for 
the quality and relevance of articles they publish - these features are 
of topmost importance for readers, without them navigating through 
millions of articles becomes infeasible.


That said, we're fully aware how much great unique content there is in 
repositories and we'd like very much to merge these two streams - Gold 
and Green - in Paperity at some point. Although there are some tensions 
inside OA community between the Gold and Green camps, I think they are 
unjustified, because these routes are complementary, not competitive. As 
to indexing, it is actually much easier to be done for repositories than 
for journals, because most repos expose standardized interfaces. So we 
don't need Google Scholar for this purpose, only as I said, we believe 
that the right order is journals first.


Best
Marcin


On 10/12/2014 01:51 PM, Stevan Harnad wrote:
Harvesting Gold OA journal articles is a piece of cake. How will 
Paperity/redex harvest
Green OA articles published in non-OA journals but made OA somewhere 
on the

Web — via Google Scholar?

Sounds like a splendid idea if it can be done… But not if it is just 
Gold-biassed,
because most refereed research is not Gold, and the fastest growing 
form of

OA is Green (because of mandates, and absence of extra cost).

SH




--
Marcin Wojnarski, Founder of Paperity, www.paperity.org
www.linkedin.com/in/marcinwojnarski
www.facebook.com/Paperity
www.twitter.com/Paperity

Paperity. Open science aggregated.

___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread Stevan Harnad
On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski mwojnar...@paperity.org wrote:

 Dear Stevan,
 We started with Gold, because we believe that journals play a fundamental 
 role in the system
 of scholarly communication and every service that tries to facilitate access 
 to literature must
 start with journals, not only with a flat collection of papers like the one 
 found in repositories.

Dear Marcin,

I think there may be a fundamental misunderstanding here.

Green OA consists of self-archived journal articles and their bibliographic 
metadata — including
journal name.

And institutional repositories consist of an institution’s journal article 
output.

Nothing “flat” about those!

Were you perhaps thinking that repositories just contain unpublished preprints 
and gray
literature?

 For 400 years, journals have been the backbone of the system, the main 
 structural element.

I don’t understand why you are pointing this out: From the very outset the Open 
Access movement 
has been very specifically about opening access to journal articles. Please see 
the original BOAI statement:
http://www.budapestopenaccessinitiative.org/read

The literature that should be freely accessible online is that which scholars 
give to the world without expectation of payment. Primarily, this category 
encompasses their peer-reviewed journal articles…

 They provide a brand name for papers, create consistent editoral policy and 
 take responsibility
 for the quality and relevance of articles they publish - these features are 
 of topmost importance
 for readers, without them navigating through millions of articles becomes 
 infeasible.

Marcin, it remains clear why you are telling us this. We all know it. What I 
asked you was:

 Harvesting Gold OA journal articles is a piece of cake. How will 
 Paperity/redex harvest

 Green OA articles published in non-OA journals but made OA somewhere on the

 Web 

 That said, we're fully aware how much great unique content there is in 
 repositories and we’d
 like very much to merge these two streams - Gold and Green - in Paperity at 
 some point.

The great unique content in repositories is the very same great unique content 
that there is in journals.
Gold OA and Green OA both consist of journal articles. There are many more 
non-Gold journals
and non-Gold journal-articles than Gold ones. 

Why is Paperity focusing on Gold?

Why is all the rest only to be merged at some point”?

And how, exactly?

 Although there are some tensions inside OA community between the Gold and 
 Green camps,
 I think they are unjustified, because these routes are complementary, not 
 competitive.

You are quite right, the two roads to OA are complementary, not competitive.

But in order to complement one another they must both be clearly understood, 
and much
of the tension is about misunderstandings, for example, that OA = Gold OA while 
Green OA
is about something else (preprints, gray literature).

And another point of tension is about priorities: Which needs to come first, 
Gold or Green?

(My own reply is that it is for many important reasons Green that must come 
first: (1) because 
Green does not cost the author money, (2) because Green  can be mandated by 
institutions and 
funders, and (3) because by coming first Green will make subscriptions 
unsustainable, force
journals to cut obsolete costs, downsize to providing peer review alone, and 
convert to
to affordable, sustainable, Fair Gold instead of today’s over-priced, 
double-paid pre-Green Fools Gold.
http://j.mp/fairgoldOA

 As to indexing, it is actually much easier to be done for repositories than 
 for journals,
 because most repos expose standardized interfaces.

Then why is Paperity starting with Gold OA journal articles instead of Green OA 
journal
articles in repositories?

 So we don't need Google Scholar for this purpose, only as I said, we believe 
 that the
 right order is journals first.

What you have said it that you believe the right order is Gold OA first, but 
you have
certainly not explained why — apart from the fact that Gold OA is certainly much
easier to access and aggregate:

Gold OA journal article blibliographic data can be harvested from the journals’
websites using DOAJ to identify all the journals.

But how are you going to find all the Green OA journal articles, if not with
Google Scholar? (WoS or SCOPUS can find you all journal articles, but
but won’t tell you which ones are Green OA.)

(BASE provides some of these data; ROAR 2.0 will soon provide it all.)

Best wishes,
Stevan

 
 Best
 Marcin
 
 
 On 10/12/2014 01:51 PM, Stevan Harnad wrote:
 Harvesting Gold OA journal articles is a piece of cake. How will 
 Paperity/redex harvest
 Green OA articles published in non-OA journals but made OA somewhere on the
 Web — via Google Scholar?
 
 Sounds like a splendid idea if it can be done… But not if it is just 
 Gold-biassed,
 because most refereed research is not Gold, and the fastest growing form of
 OA is Green (because of 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread BAUIN Serge
Many thanks, indeed

Your answer is clear, and I wish you success

Cheers
Serge

De : goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] De la part de 
Marcin Wojnarski
Envoyé : dimanche 12 octobre 2014 21:20
À : Global Open Access List (Successor of AmSci)
Objet : [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of 
OA journals  papers

Hi Serge,

We're working on this. Paperity started as a non-profit academic project, but 
yes, we need to develop a business model to make it sustainable and to achieve 
the goal of 100% OA aggregated. Most likely we'll expect participating journals 
to support our services, which we think is a fair solution when many of them 
charge APCs and we actually help them do their job (dissemination). We're aware 
however that there are also many small non-profit journals which don't charge 
APC at all, and we definitely want to aggregate them all, too. So the details 
are still to be sorted out, but I'm confident that over time we'll come up with 
a good solution: one that's fair, efficient and acceptable for everybody. Of 
course, there are also more traditional solutions that we'll investigate, like 
adverts.

Cheers
Marcin

On 10/11/2014 09:07 PM, BAUIN Serge wrote:
Marcin,

May I ask what is the economic model of Paperity?
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, Marcin Wojnarski 
mwojn...@ns.onet.plmailto:mwojn...@ns.onet.pl a écrit :
Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin
On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize true peer 
reviewed papers and what you consider to be  true peer reviewed papers?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.orgmailto:goal-boun...@eprints.org 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals  papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperityhttp://paperity.org, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperityhttp://paperity.org (http://paperity.org/) includes over 
160,000 open articles, gold and hybrid, from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread Marcin Wojnarski

Stevan,

Repositories are not an authoritative source of metadata about 
paper-journal relation. Metadata is put there by authors themselves and 
it can be missing, incomplete or erroneous, in extreme cases even fake. 
Thus in practice repository collections are flat even if metadata is 
present.


If you think that finding Green articles is impossible, then you shall 
not be surprised that we focus on Gold first, right?


Best
Marcin


On 10/13/2014 02:14 PM, Stevan Harnad wrote:
On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski mwojnar...@paperity.org 
mailto:mwojnar...@paperity.org wrote:



Dear Stevan,
We started with Gold, because we believe that journals play a 
fundamental role in the system
of scholarly communication and every service that tries to facilitate 
access to literature must
start with journals, not only with a flat collection of papers like 
the one found in repositories.


Dear Marcin,

I think there may be a fundamental misunderstanding here.

Green OA consists of self-archived *journal articles* and their 
bibliographic metadata — including

journal name.

And institutional repositories consist of an institution’s *journal 
article* output.


Nothing “flat” about those!

Were you perhaps thinking that repositories just contain unpublished 
preprints and gray

literature?

For 400 years, journals have been the backbone of the system, the 
main structural element.


I don’t understand why you are pointing this out: From the very outset 
the Open Access movement
has been very specifically about opening access to *journal articles*. 
Please see the original BOAI statement:

http://www.budapestopenaccessinitiative.org/read

/The literature that should be freely accessible online is that
which scholars /
/give to the world without expectation of payment. Primarily, this
category /
/encompasses their *peer-reviewed journal articles*…/


They provide a brand name for papers, create consistent editoral 
policy and take responsibility
for the quality and relevance of articles they publish - these 
features are of topmost importance
for readers, without them navigating through millions of articles 
becomes infeasible.


Marcin, it remains clear why you are telling us this. We all know it. 
What I asked you was:



Harvesting Gold OA journal articles is a piece of cake. How will
Paperity/redex harvest
*Green OA articles published in non-OA journals* but made OA
somewhere on the
Web


That said, we're fully aware how much great unique content there is 
in repositories and we’d
like very much to merge these two streams - Gold and Green - in 
Paperity at some point.


The great unique content in repositories is the very same great unique 
content that there is in journals.
Gold OA and Green OA both consist of *journal articles*. There are 
many more non-Gold journals

and non-Gold journal-articles than Gold ones.

Why is Paperity focusing on Gold?

Why is all the rest only to be merged at some point”?

And how, exactly?

Although there are some tensions inside OA community between the Gold 
and Green camps,
I think they are unjustified, because these routes are complementary, 
not competitive.


You are quite right, the two roads to OA are complementary, not 
competitive.


But in order to complement one another they must both be clearly 
understood, and much
of the tension is about misunderstandings, for example, that OA = Gold 
OA while Green OA

is about something else (preprints, gray literature).

And another point of tension is about priorities: Which needs to come 
first, Gold or Green?


(My own reply is that it is for many important reasons Green that must 
come first: (1) because
Green does not cost the author money, (2) because Green  can be 
mandated by institutions and
funders, and (3) because by coming first Green will make subscriptions 
unsustainable, force
journals to cut obsolete costs, downsize to providing peer review 
alone, and convert to
to affordable, sustainable, Fair Gold instead of today’s over-priced, 
double-paid pre-Green Fools Gold.

http://j.mp/fairgoldOA

As to indexing, it is actually much easier to be done for 
repositories than for journals,

because most repos expose standardized interfaces.


Then why is Paperity starting with Gold OA journal articles instead of 
Green OA journal

articles in repositories?

So we don't need Google Scholar for this purpose, only as I said, we 
believe that the

right order is journals first.


What you have said it that you believe the right order is Gold OA 
first, but you have
certainly not explained why — apart from the fact that Gold OA is 
certainly much

/easier/ to access and aggregate:

Gold OA journal article blibliographic data can be harvested from the 
journals’

websites using DOAJ to identify all the journals.

But how are you going to find all the Green OA journal articles, if 
not with

Google Scholar? (WoS or SCOPUS can find you all journal articles, but
but won’t tell you 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-13 Thread Stevan Harnad
On Oct 13, 2014, at 1:06 PM, Marcin Wojnarski mwojnar...@paperity.org wrote:

 Repositories are not an authoritative source of metadata about paper-journal 
 relation.
 Metadata is put there by authors themselves and it can be missing, incomplete 
 or
 erroneous, in extreme cases even fake. Thus in practice repository 
 collections are
 flat even if metadata is present.

Are you looking for “authoritative metadata” or metadata of OA journal articles?

The majority of OA journal articles are Green, not Gold. Focussing on the Gold
because it is more “authoritative” calls to mind the joke about the drunkard who
prefers to keep looking for his keys by the lamp-post because it is brighter 
there.

 If you think that finding Green articles is impossible, then you shall not be 
 surprised that
 we focus on Gold first, right?

I certainly did not say it was impossible! (We do it all the time! So does 
Google Scholar.) 
I only said it was not as easy as it is to just go to DOAJ journal websites 
(the lamp-post)
for only the Gold.

And I think the preoccupation with “authoritative” sources of metadata is 
monumentally
misplaced. (In fact, the notion of “aggregation” is probably obsolescent too): 
we have journal
articles all over the web, and all that’s needed is a way to find them. Google 
Scholar’s
pretty good, and can potentially be made even better. But what’s missing now is 
not
a better harvester or more “authoritative” metadata, but more OA articles 
(whether
Gold or Green). Only about 30% of journal articles published today are OA (the 
majority 
of it Green). The fastest and surest (and cheapest) way to provide the 
remaining 70% is 
to mandate and provide Green.

Stevan Harnad

 On 10/13/2014 02:14 PM, Stevan Harnad wrote:
 On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski mwojnar...@paperity.org 
 wrote:
 
 Dear Stevan,
 We started with Gold, because we believe that journals play a fundamental 
 role in the system
 of scholarly communication and every service that tries to facilitate 
 access to literature must
 start with journals, not only with a flat collection of papers like the one 
 found in repositories.
 
 Dear Marcin,
 
 I think there may be a fundamental misunderstanding here.
 
 Green OA consists of self-archived journal articles and their bibliographic 
 metadata — including
 journal name.
 
 And institutional repositories consist of an institution’s journal article 
 output.
 
 Nothing “flat” about those!
 
 Were you perhaps thinking that repositories just contain unpublished 
 preprints and gray
 literature?
 
 For 400 years, journals have been the backbone of the system, the main 
 structural element.
 
 I don’t understand why you are pointing this out: From the very outset the 
 Open Access movement 
 has been very specifically about opening access to journal articles. Please 
 see the original BOAI statement:
 http://www.budapestopenaccessinitiative.org/read
 
 The literature that should be freely accessible online is that which 
 scholars 
 give to the world without expectation of payment. Primarily, this category 
 encompasses their peer-reviewed journal articles…
 
 They provide a brand name for papers, create consistent editoral policy and 
 take responsibility
 for the quality and relevance of articles they publish - these features are 
 of topmost importance
 for readers, without them navigating through millions of articles becomes 
 infeasible.
 
 Marcin, it remains clear why you are telling us this. We all know it. What I 
 asked you was:
 
 Harvesting Gold OA journal articles is a piece of cake. How will 
 Paperity/redex harvest
 
 Green OA articles published in non-OA journals but made OA somewhere on the
 
 Web 
 
 That said, we're fully aware how much great unique content there is in 
 repositories and we’d
 like very much to merge these two streams - Gold and Green - in Paperity at 
 some point.
 
 The great unique content in repositories is the very same great unique 
 content that there is in journals.
 Gold OA and Green OA both consist of journal articles. There are many more 
 non-Gold journals
 and non-Gold journal-articles than Gold ones. 
 
 Why is Paperity focusing on Gold?
 
 Why is all the rest only to be merged at some point”?
 
 And how, exactly?
 
 Although there are some tensions inside OA community between the Gold and 
 Green camps,
 I think they are unjustified, because these routes are complementary, not 
 competitive.
 
 You are quite right, the two roads to OA are complementary, not competitive.
 
 But in order to complement one another they must both be clearly understood, 
 and much
 of the tension is about misunderstandings, for example, that OA = Gold OA 
 while Green OA
 is about something else (preprints, gray literature).
 
 And another point of tension is about priorities: Which needs to come first, 
 Gold or Green?
 
 (My own reply is that it is for many important reasons Green that must come 
 first: (1) because 
 Green does not cost the author 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Dana Roth
It would be nice if 'Paperity' would maintain a listing of the publishers of 
the journals they index.
T-R does this for Web of Science Journal Citation Reports, and it is very 
helpful.

Dana L. Roth
Millikan Library / Caltech 1-32
1200 E. California Blvd. Pasadena, CA 91125
626-395-6423 fax 626-792-7540
dzr...@library.caltech.edumailto:dzr...@library.caltech.edu
http://library.caltech.edu/collections/chemistry.htm

From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
Serge [serge.ba...@cnrs.fr]
Sent: Saturday, October 11, 2014 12:07 PM
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of 
OA journals  papers

Marcin,

May I ask what is the economic model of Paperity?
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, Marcin Wojnarski 
mwojn...@ns.onet.plmailto:mwojn...@ns.onet.pl a écrit :

Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.orgmailto:goal-boun...@eprints.org 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals  papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperityhttp://paperity.org, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperityhttp://paperity.org (http://paperity.org/) includes over 
160,000 open articles, gold and hybrid, from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Paperity 
partners right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Peter Murray-Rust
On Sun, Oct 12, 2014 at 2:08 AM, Dana Roth dzr...@library.caltech.edu
wrote:

  It would be nice if 'Paperity' would maintain a listing of the
 publishers of the journals they index.
 T-R does this for Web of Science Journal Citation Reports, and it is very
 helpful.


Is this listing
(a) publicly visible - or only available to WoS subscribers?
(b) re-usable without further permission from T-R? (CC-BY or weaker?)

If it's not re-usable then we need a fully Open equivalent for indexable
journals.



 Dana L. Roth
 Millikan Library / Caltech 1-32
 1200 E. California Blvd. Pasadena, CA 91125
 626-395-6423 fax 626-792-7540
 dzr...@library.caltech.edu
 http://library.caltech.edu/collections/chemistry.htm
   --


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Stevan Harnad
Harvesting Gold OA journal articles is a piece of cake. How will Paperity/redex 
harvest
Green OA articles published in non-OA journals but made OA somewhere on the
Web — via Google Scholar?

Sounds like a splendid idea if it can be done… But not if it is just 
Gold-biassed,
because most refereed research is not Gold, and the fastest growing form of
OA is Green (because of mandates, and absence of extra cost).

SH

On Oct 11, 2014, at 9:08 PM, Dana Roth dzr...@library.caltech.edu wrote:

 It would be nice if 'Paperity' would maintain a listing of the publishers of 
 the journals they index.
 T-R does this for Web of Science Journal Citation Reports, and it is very 
 helpful.
 
 Dana L. Roth
 Millikan Library / Caltech 1-32
 1200 E. California Blvd. Pasadena, CA 91125
 626-395-6423 fax 626-792-7540
 dzr...@library.caltech.edu
 http://library.caltech.edu/collections/chemistry.htm
 From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
 Serge [serge.ba...@cnrs.fr]
 Sent: Saturday, October 11, 2014 12:07 PM
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator 
 of OA journals  papers
 
 Marcin,
 
 May I ask what is the economic model of Paperity?
 I didn't find any information about that on your web site.
 
 Cheers
 
 Serge
 
 Envoyé d'un téléphone portable, désolé pour le caractère inélégant...
 
 Le 10 oct. 2014 à 08:22, Marcin Wojnarski mwojn...@ns.onet.pl a écrit :
 
 Jeroen,
 
 Thanks, it's great to hear that you like Paperity!
 
 True peer-reviewed means published in a peer-reviewed journal, in contrast 
 to a pdf just posted somewhere on the web (think Google Scholar), which can 
 be anything: a peer-reviewed paper or not, published or not, even randomly 
 generated to resemble a scholarly article, for example to pump up G Scholar 
 citations (http://arxiv.org/abs/1212.0638).
 
 The new technology is called REgular Document EXpressions (redex). It is a 
 computer language for analyzing long and complex documents, particularly 
 written in a markup, like HTML or XML. It facilitates analysis of web 
 context where the paper occured, which is critical for maintaining the link 
 between the paper and its journal. Redex builds on top of the very 
 fundamental technology of regular expressions (regex), but redefines the 
 language entirely to make it suitable for large structured texts.
 
 Best,
 Marcin
 
 On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
 Marcin,
  
 This is a great initiative. I had been hoping BASEsearch would take on this 
 task, but it is good to see others are stepping in.
  
 Congrats on the initiative. Still, a long way to go
  
 Could you elaborate on how your technology is able to recognize “true peer 
 reviewed papers” and what you consider to be “ true peer reviewed papers”?
  
 Best,
 Jeroen Bosman
 @jeroenbosman
 Utrecht University Library
 From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf 
 Of Marcin Wojnarski
 Sent: donderdag 9 oktober 2014 14:51
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of 
 OA journals  papers
  
 (press release, apologies for cross-posting)
 
 With the beginning of the new academic year, Paperity, the first 
 multidisciplinary aggregator of Open Access journals and papers, has been 
 launched. Paperity will connect authors with readers, boost dissemination 
 of new discoveries and consolidate academia around open literature.
 Right now, Paperity (http://paperity.org/) includes over 160,000 open 
 articles, gold and hybrid, from 2,000 scholarly journals, and growing. 
 The goal of the team is to cover - with the support of journal editors and 
 publishers - 100% of Open Access literature in 3 years from now. In order 
 to achieve this, Paperity utilizes an original technology for article 
 indexing, designed by Marcin Wojnarski, a data geek from Poland and a 
 medalist of the International Mathematical Olympiad. This technology 
 indexes only true peer-reviewed scholarly papers and filters out irrelevant 
 entries, which easily make it into other aggregators and search engines.
 The amount of scholarly literature has grown enormously in the last 
 decades. Successful dissemination became a big issue. New tools are needed 
 to help readers access vast amounts of literature dispersed all over the 
 web and to help authors reach their target audience. Moreover, research is 
 interdisciplinary now and scholars need broad access to literature from 
 many fields, also from outside of their core research area. This is the 
 reason why Paperity covers all subjects, from Sciences, Technology, 
 Medicine, through Social Sciences, to Humanities and Arts.
 - There are lots of great articles out there which report new significant 
 findings, yet attract no attention, only because they are hard to find. No 
 more than top 10% of research institutions have good access

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Jan Velterop

On 12 Oct 2014, at 12:51, Stevan Harnad har...@ecs.soton.ac.uk wrote:

 Harvesting Gold OA journal articles is a piece of cake.

Indeed. Not just for Paperity, but for anybody else. It's one of the 
attractions and benefits of open access via the 'gold' route. Another is that 
most articles can be harvested in XML-format, which enables sophisticated and 
worthwhile services to be added to aggregations. And aggregations enable 
researchers to conveniently make large-scale pattern- and meta-analyses without 
first having to gather all the material from different and disparate sources. 
Few 'green' repositories that I'm aware of have XML-versions (correct me if I'm 
wrong – and should I be wrong, is there a list of such repositories?). 
Aggregations, by the way, cannot be made without clarity about rights and 
licences, since they are a form of re-use. Those rights are clear, and properly 
included in metadata, for proper 'gold', but often not for 'green' versions of 
paywalled articles in repositories.

 How will Paperity/redex harvest
 Green OA articles published in non-OA journals but made OA somewhere on the
 Web — via Google Scholar?

Indeed, how will they. Or anybody else?

JV

 
 Sounds like a splendid idea if it can be done… But not if it is just 
 Gold-biassed,
 because most refereed research is not Gold, and the fastest growing form of
 OA is Green (because of mandates, and absence of extra cost).
 
 SH
 
 On Oct 11, 2014, at 9:08 PM, Dana Roth dzr...@library.caltech.edu wrote:
 
 It would be nice if 'Paperity' would maintain a listing of the publishers of 
 the journals they index.
 T-R does this for Web of Science Journal Citation Reports, and it is very 
 helpful.
 
 Dana L. Roth
 Millikan Library / Caltech 1-32
 1200 E. California Blvd. Pasadena, CA 91125
 626-395-6423 fax 626-792-7540
 dzr...@library.caltech.edu
 http://library.caltech.edu/collections/chemistry.htm
 From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
 Serge [serge.ba...@cnrs.fr]
 Sent: Saturday, October 11, 2014 12:07 PM
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator 
 of OA journals  papers
 
 Marcin,
 
 May I ask what is the economic model of Paperity?
 I didn't find any information about that on your web site.
 
 Cheers
 
 Serge
 
 Envoyé d'un téléphone portable, désolé pour le caractère inélégant...
 
 Le 10 oct. 2014 à 08:22, Marcin Wojnarski mwojn...@ns.onet.pl a écrit :
 
 Jeroen,
 
 Thanks, it's great to hear that you like Paperity!
 
 True peer-reviewed means published in a peer-reviewed journal, in 
 contrast to a pdf just posted somewhere on the web (think Google Scholar), 
 which can be anything: a peer-reviewed paper or not, published or not, even 
 randomly generated to resemble a scholarly article, for example to pump up 
 G Scholar citations (http://arxiv.org/abs/1212.0638).
 
 The new technology is called REgular Document EXpressions (redex). It is a 
 computer language for analyzing long and complex documents, particularly 
 written in a markup, like HTML or XML. It facilitates analysis of web 
 context where the paper occured, which is critical for maintaining the link 
 between the paper and its journal. Redex builds on top of the very 
 fundamental technology of regular expressions (regex), but redefines the 
 language entirely to make it suitable for large structured texts.
 
 Best,
 Marcin
 
 On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
 Marcin,
  
 This is a great initiative. I had been hoping BASEsearch would take on 
 this task, but it is good to see others are stepping in.
  
 Congrats on the initiative. Still, a long way to go
  
 Could you elaborate on how your technology is able to recognize “true peer 
 reviewed papers” and what you consider to be “ true peer reviewed papers”?
  
 Best,
 Jeroen Bosman
 @jeroenbosman
 Utrecht University Library
 From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf 
 Of Marcin Wojnarski
 Sent: donderdag 9 oktober 2014 14:51
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of 
 OA journals  papers
  
 (press release, apologies for cross-posting)
 
 With the beginning of the new academic year, Paperity, the first 
 multidisciplinary aggregator of Open Access journals and papers, has been 
 launched. Paperity will connect authors with readers, boost dissemination 
 of new discoveries and consolidate academia around open literature.
 Right now, Paperity (http://paperity.org/) includes over 160,000 open 
 articles, gold and hybrid, from 2,000 scholarly journals, and growing. 
 The goal of the team is to cover - with the support of journal editors and 
 publishers - 100% of Open Access literature in 3 years from now. In order 
 to achieve this, Paperity utilizes an original technology for article 
 indexing, designed by Marcin Wojnarski, a data geek from Poland

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Peter Murray-Rust
On Sun, Oct 12, 2014 at 1:44 PM, Jan Velterop velte...@gmail.com wrote:


 On 12 Oct 2014, at 12:51, Stevan Harnad har...@ecs.soton.ac.uk wrote:

 Harvesting Gold OA journal articles is a piece of cake.


 Indeed. Not just for Paperity, but for anybody else. It's one of the
 attractions and benefits of open access via the 'gold' route.


Yes,

It's noteworthy that almost all modern text and data mining exercises are
carried out on the Open Access subset of the literature. In some cases this
is an attempt to get the whole Open literature - in others it's a subsubset
such as EuropePubMedCentral. (The alternatives to this are (a) to ignore
rights and mine anyway - something we are legally allowed to do in the UK
but almost nowhere else or (b) do in in private hoping you won't be found
and scared of publishing your sources as a good scholar should).

Another is that most articles can be harvested in XML-format, which enables
 sophisticated and worthwhile services to be added to aggregations.


This is true for born-Open publishers such as BioMedCentral, PLOS*, eLIfe,
PeerJ, Ubiquity ... This is a straightforward sale - author payment =
freedom for re-use. It works very well for text miners. (And please don't
tell us that mining is a minority sport which has to tread water for
another 5-10 years).

I have not systematically surveyed whether XML is offered in the Gold
Open Access journals of other major publishers nor whether the licence is
always permissive. Those people who argue that CC-NC-ND protects authors
(it doesn't) should realise that it has a massive negative impact on useful
re-use including mining.

Hybrid journals almost certainly do not offer XML. It's hard enough for
them to offer CC-BY for Open Access.

It works less well for born-Closed publishers (such as Elsevier, NPG, ACS,
etc.). Rather than having the simple

And aggregations enable researchers to conveniently make large-scale
 pattern- and meta-analyses without first having to gather all the material
 from different and disparate sources.


Yes - we have built the apparatus to do this in contentmine.org


 Few 'green' repositories that I'm aware of have XML-versions (correct me
 if I'm wrong – and should I be wrong, is there a list of such
 repositories?). Aggregations, by the way, cannot be made without clarity
 about rights and licences, since they are a form of re-use. Those rights
 are clear, and properly included in metadata, for proper 'gold', but often
 not for 'green' versions of paywalled articles in repositories.


Exactly. Most Green repositories make it very hard to re-use material.
This is primarily due to copyright - the default library approach is to say
this may be copyright and you cannot use it unless you write to the author
and get permission in writing with real ink. Then there is the technology.
University repositories are constructed on the basis that each document is
a priceless artefact that scholars will spend hours discovering and
reading. The reality of science is that most of these documents will
probably only be read by machines. Some counties (NL, FR for example) at
least aggregate some documents - such as theses - and the UK has CORE to
try to remedy the situation, but even so it's extremely difficult to index
and search repositories.

I wrote to Bernard Rentier offering to index his repository for scientific
terms but was told - sadly - that there was a new phase of investment
required before this would be possible.

Another problem with most repositories is that they insist on transforming
DOCX or LaTeX into PDF. Even for their own theses. This is an act of
barbarism. PDF has no semantics and it destroys about 50-75% of the science
in the document.

Anyway we expect to announce our own Open indexing of the literature RSN.


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Marcin Wojnarski

Hi Serge,

We're working on this. Paperity started as a non-profit academic 
project, but yes, we need to develop a business model to make it 
sustainable and to achieve the goal of 100% OA aggregated. Most likely 
we'll expect participating journals to support our services, which we 
think is a fair solution when many of them charge APCs and we actually 
help them do their job (dissemination). We're aware however that there 
are also many small non-profit journals which don't charge APC at all, 
and we definitely want to aggregate them all, too. So the details are 
still to be sorted out, but I'm confident that over time we'll come up 
with a good solution: one that's fair, efficient and acceptable for 
everybody. Of course, there are also more traditional solutions that 
we'll investigate, like adverts.


Cheers
Marcin


On 10/11/2014 09:07 PM, BAUIN Serge wrote:

Marcin,

May I ask what is the economic model of Paperity?
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, Marcin Wojnarski mwojn...@ns.onet.pl 
mailto:mwojn...@ns.onet.pl a écrit :



Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, 
published or not, even randomly generated to resemble a scholarly 
article, for example to pump up G Scholar citations 
(http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It 
is a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical 
for maintaining the link between the paper and its journal. Redex 
builds on top of the very fundamental technology of regular 
expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take 
on this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize 
“true peer reviewed papers” and what you consider to be “ true peer 
reviewed papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] 
*On Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals  papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
http://paperity.org, the first multidisciplinary aggregator of 
Open Access journals and papers, has been launched. Paperity will 
connect authors with readers, boost dissemination of new discoveries 
and consolidate academia around open literature.*


Right now, Paperity http://paperity.org (http://paperity.org/) 
includes over 160,000 open articles, gold and hybrid, from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed 
all over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their 
core research area. This is the reason why Paperity covers all 
subjects, from Sciences, Technology, Medicine, through Social 
Sciences, to Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they 
are hard to find. No more than top 10% of research institutions have 
good access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a much lower 
stand and often stay unnoticed despite high quality of their work/ – 
says Wojnarski. He adds that it is not 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Marcin Wojnarski

Thanks Dana. On our to-do list. :)
Marcin

On 10/12/2014 03:08 AM, Dana Roth wrote:
It would be nice if 'Paperity' would maintain a listing of the 
publishers of the journals they index.
T-R does this for Web of Science Journal Citation Reports, and it is 
very helpful.


Dana L. Roth
Millikan Library / Caltech 1-32
1200 E. California Blvd. Pasadena, CA 91125
626-395-6423 fax 626-792-7540
dzr...@library.caltech.edu mailto:dzr...@library.caltech.edu
http://library.caltech.edu/collections/chemistry.htm

*From:* goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf 
of BAUIN Serge [serge.ba...@cnrs.fr]

*Sent:* Saturday, October 11, 2014 12:07 PM
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Re: Paperity launched. The 1st multidisciplinary 
aggregator of OA journals  papers


Marcin,

May I ask what is the economic model of Paperity?
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, Marcin Wojnarski mwojn...@ns.onet.pl 
mailto:mwojn...@ns.onet.pl a écrit :



Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, 
published or not, even randomly generated to resemble a scholarly 
article, for example to pump up G Scholar citations 
(http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It 
is a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical 
for maintaining the link between the paper and its journal. Redex 
builds on top of the very fundamental technology of regular 
expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take 
on this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize 
“true peer reviewed papers” and what you consider to be “ true peer 
reviewed papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] 
*On Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals  papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
http://paperity.org, the first multidisciplinary aggregator of 
Open Access journals and papers, has been launched. Paperity will 
connect authors with readers, boost dissemination of new discoveries 
and consolidate academia around open literature.*


Right now, Paperity http://paperity.org (http://paperity.org/) 
includes over 160,000 open articles, gold and hybrid, from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed 
all over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their 
core research area. This is the reason why Paperity covers all 
subjects, from Sciences, Technology, Medicine, through Social 
Sciences, to Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they 
are hard to find. No more than top 10% of research institutions have 
good access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a much lower 
stand and often stay unnoticed despite high quality of their work/ – 
says Wojnarski. He adds

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-12 Thread Heather Morrison
Thank you for providing the information, Marcin. Since there is a subset of the 
open access community that demands blanket permissions for commercial rights 
downstream (a position I strongly disagree with), it is important to discuss 
what the potential commercial uses might be to determine whether these actually 
advance open access or scholarly knowledge or not.

Some comments on these options for Paperity:

In the subscriptions model, aggregators (such as EBSCO and ProQuest), typically 
pay journals to include their content, or in the case of open access journals, 
at least do not charge the journals. Charging journals to include them in an 
aggregated service changes a revenue stream to an expense stream for the 
journals. This makes it harder to find the revenue to produce journals; a 
barrier to publishing journals in the first place is not in the interests of 
advancing scholarly knowledge. 

Advertising is one of the potential revenue streams for open access journals 
(and one that some journals are currently using). If Paperity is using journal 
content to sell advertising, then Paperity could easily be competing with the 
journals for this revenue. 

It is lovely to hear of Paperity's good intentions starting out to be fair, 
efficient and acceptable for everyone. But what can happen with services like 
this down the road when there are bills to be paid, journals are less than keen 
to pay for this service and advertisers continue to prefer Google?

The following is addressed to my fellow open access advocates as this is a good 
discussion about open access downstream, and these comments are not intended to 
apply to Paperity:

If the purpose of insisting on re-use and commercial rights downstream is 
designed to facilitate the design of services such as Paperity, let's discuss 
these possibilities downstream that I argue are facilitated by CC-BY and/or 
CC-BY-SA licenses:

-   aggregator takes CC-BY content and develops a toll-access value-added 
service 

By way of illustration of this: Elsevier's Scopus claims to include 2,800 gold 
open access journals. Scopus is a subscription-based service. 

-   aggregator takes CC-BY content, initially develops an open access 
value-added search service, then sells the service to a for-profit company that 
changes the business model to toll access

By way of illustration of the sales aspect, consider that Elsevier bought 
Mendeley and Springer bought BioMedCentral. Both are still free services, but 
offered by largely subscription-based companies; why would we assume that they 
would never change the business model? 

-   aggregator follows the Paperity suggestion of charging journals, but 
with a twist: does not include journals that do not pay and/or returns results 
based on payments by journals (i.e. pay-to-play)

Are these models seen as desirable by advocates of requiring CC-BY and/or 
CC-BY-SA licenses? Are any of these scenarios aligned with the Budapest vision? 
If you agree that they are not, can you explain why you think these are 
unlikely or how the licenses would prevent this from happening? For example, 
perhaps someone can explain how it is that Elsevier is able to charge to direct 
people to OA journals through Scopus? 

A comment on SA: although Sharealike is the most copyleft of the CC license 
elements, it does not come with an obligation to share in the same way, rather 
an obligation to use the same license when including re-used content. One can 
take a work that is licensed SA and is freely available on the web and include 
it in a work that is limited in any of a variety of fashions (part of a 
presentation to an audience limited to those who are willing and able to pay to 
attend; a toll access work, etc.) - as long the work downstream uses the 
license. In other words, CC-BY-SA does not do as much to protect OA downstream 
as one might think.

best,

Heather Morrison


On 2014-10-12, at 3:20 PM, Marcin Wojnarski wrote:

 Hi Serge,
 
 We're working on this. Paperity started as a non-profit academic project, but 
 yes, we need to develop a business model to make it sustainable and to 
 achieve the goal of 100% OA aggregated. Most likely we'll expect 
 participating journals to support our services, which we think is a fair 
 solution when many of them charge APCs and we actually help them do their job 
 (dissemination). We're aware however that there are also many small 
 non-profit journals which don't charge APC at all, and we definitely want to 
 aggregate them all, too. So the details are still to be sorted out, but I'm 
 confident that over time we'll come up with a good solution: one that's fair, 
 efficient and acceptable for everybody. Of course, there are also more 
 traditional solutions that we'll investigate, like adverts.
 
 Cheers
 Marcin
 
 
 On 10/11/2014 09:07 PM, BAUIN Serge wrote:
 Marcin,
 
 May I ask what is the economic model of Paperity?
 I didn't find any information about that on your web 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-11 Thread BAUIN Serge
Marcin,

May I ask what is the economic model of Paperity?
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, Marcin Wojnarski 
mwojn...@ns.onet.plmailto:mwojn...@ns.onet.pl a écrit :

Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.orgmailto:goal-boun...@eprints.org 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals  papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperityhttp://paperity.org, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperityhttp://paperity.org (http://paperity.org/) includes over 
160,000 open articles, gold and hybrid, from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Paperity 
partners right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants to improve 
dissemination of discoveries authored by its participants – top young talents 
from all over the continent.

Paperity is the first service of this kind. The most similar existing website, 
PubMed Central, aggregates open journals, too, but is limited to life sciences 
alone. Another related service, the Directory of Open Access Journals, does 
index articles from multiple periodicals and different disciplines, but does 
not provide aggregation, only pure indexing: it shows metadata of articles, but 
for fulltext access redirects to external sites. Moreover, both PMC and DOAJ 
impose strict technical requirements on participating journals, which limits 
the scope of aggregation. Paperity adapts to 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-10 Thread Marcin Wojnarski

Jeroen,

Thanks, it's great to hear that you like Paperity!

True peer-reviewed means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, published 
or not, even randomly generated to resemble a scholarly article, for 
example to pump up G Scholar citations (http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It is 
a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical for 
maintaining the link between the paper and its journal. Redex builds on 
top of the very fundamental technology of regular expressions (regex), 
but redefines the language entirely to make it suitable for large 
structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take on 
this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true 
peer reviewed papers” and what you consider to be “ true peer reviewed 
papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] *On 
Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals  papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
http://paperity.org, the first multidisciplinary aggregator of Open 
Access journals and papers, has been launched. Paperity will connect 
authors with readers, boost dissemination of new discoveries and 
consolidate academia around open literature.*


Right now, Paperity http://paperity.org (http://paperity.org/) 
includes over 160,000 open articles, gold and hybrid, from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed all 
over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their core 
research area. This is the reason why Paperity covers all subjects, 
from Sciences, Technology, Medicine, through Social Sciences, to 
Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they are 
hard to find. No more than top 10% of research institutions have good 
access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a much lower stand 
and often stay unnoticed despite high quality of their work/ – says 
Wojnarski. He adds that it is not by accident that Paperity partners 
right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants 
to improve dissemination of discoveries authored by its participants – 
top young talents from all over the continent.


Paperity is the first service of this kind. The most similar existing 
website, PubMed Central, aggregates open journals, too, but is limited 
to life sciences alone. Another related service, the Directory of Open 
Access Journals, does index articles from multiple periodicals and 
different disciplines, but does not provide aggregation, only pure 
indexing: it shows metadata of articles, but for fulltext access 
redirects to external sites. Moreover, both PMC and DOAJ impose strict 
technical requirements on participating journals, which limits the 
scope of aggregation. Paperity adapts to whatever technology a given 
periodical employs.


Paperity website: http://paperity.org/ http://paperity.org/




--
Marcin Wojnarski, Founder of Paperity,www.paperity.org  
http://www.paperity.org
www.linkedin.com/in/marcinwojnarski  
http://www.linkedin.com/in/marcinwojnarski

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals papers

2014-10-09 Thread Bosman, J.M. (Jeroen)
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf Of 
Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals  papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperityhttp://paperity.org, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperityhttp://paperity.org (http://paperity.org/) includes over 
160,000 open articles, gold and hybrid, from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Paperity 
partners right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants to improve 
dissemination of discoveries authored by its participants – top young talents 
from all over the continent.

Paperity is the first service of this kind. The most similar existing website, 
PubMed Central, aggregates open journals, too, but is limited to life sciences 
alone. Another related service, the Directory of Open Access Journals, does 
index articles from multiple periodicals and different disciplines, but does 
not provide aggregation, only pure indexing: it shows metadata of articles, but 
for fulltext access redirects to external sites. Moreover, both PMC and DOAJ 
impose strict technical requirements on participating journals, which limits 
the scope of aggregation. Paperity adapts to whatever technology a given 
periodical employs.

Paperity website: http://paperity.org/




--

Marcin Wojnarski, Founder of Paperity, www.paperity.orghttp://www.paperity.org

www.linkedin.com/in/marcinwojnarskihttp://www.linkedin.com/in/marcinwojnarski

www.facebook.com/Paperityhttp://www.facebook.com/Paperity

www.twitter.com/Paperityhttp://www.twitter.com/Paperity



Paperity. Open science aggregated.
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal