Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Samuel Klein wrote:
 There is a wealth of work done all the time by primary source
 researchers and publishers, which could be improved on by having
 wikisource entries, translations, c.

 Related question : how appropriate would large numbers of public
 domain texts, with page scans and the best available OCR [and
 translations of same], fit with what Wikisource does now?  This is
 clearly a wiki project that needs to happen : OCR even at its best
 misses rare meaning-bearing words.   If not Wikisource, where should
 this work take place?
   
 From my perspective it fits perfectly with the vision that I had of 
Wikisource on the first day of its existence.  Tim Armstrong 
[[User:Tarmstro99]] has already done a considerable amount of valuable 
work relating to law on Wikisource.  That has been mostly a one-man 
project to deal with a massive amount of material.  Some have even 
proposed deleting all the US Code material on the grounds that we don't 
have the ability to keep it up to date. That has prompted some very 
interesting questions and ideas about how this kind of stuff might be 
handled, but taking those questions to the next level requires lots of 
work.  Most regular Wikisourcerors already have long personal to-do 
lists to keep them busy.  So the question is not really about whether 
Wikisource should host these goods, it's about recruiting volunteers to 
do the hard work.

Ec

 On Sat, Jun 20, 2009 at 11:41 AM, David Gerarddger...@gmail.com wrote:
   
 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?


 - d.

 


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Iran?

2009-06-21 Thread Ray Saintonge
Robert Rohde wrote:
 On Sat, Jun 20, 2009 at 2:07 PM, Ray Saintonge sainto...@telus.net wrote:
   
  While there may very well have been widespread fraud, that alone
 wouldn't be enough to explain away a 29 percentage point spread.  A
 strong line of national security scare-mongering is always good source
 of votes in the less educated parts of a country. We hear a lot about
 what is happening in Tehran, but very little about the rest of the country.
 
 It's easy to explain any margin you want when there are no monitors, no
 reporting of local tallies, and vote aggregation is controlled by a small
 group in one government agency.  It's basically a matter of changing numbers
 in a spreadsheet.

 Regardless of what actually happened, it is pretty clear that the process of
 voting in Iran lacks the fundamental transparency necessary to provide
 confidence in the results.
Sure, transparency is a problem, but its absence alone does not imply 
fraud.  It hurts the Iranian authorities even more if the vote count is 
accurate because nobody believes them. 

Ec

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Nikola Smolenski
Дана Saturday 20 June 2009 18:29:24 Brian написа:
 This has reminded me to complain about Google Books. Google has the world's
 best OCR (in virtue of having the largest OCR'able dataset) and also has a
 mission to scan in all the public domain books they can get their hand on.
 They recently updated their interface to, as they put it, make it easier
 to find our plain text versions of public domain books. If a book is
 available in full view, you can click the 'Plain text' button in the
 toolbar. Unfortunately the only way I've found to download the full text
 of a public domain book from Google is to flip through the book a page at a
 time, copying the text to your clipboard.

Often, these books are available in the Million Books Project too.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Google Translate now assists with humantranslations of Wikipedia articles

2009-06-21 Thread Nikola Smolenski
Дана Saturday 13 June 2009 18:20:36 picus-viridis написа:
 IMHO automatic translations into Polish are useless, as they only allow
 rough orientation in the contents of an article. It concerns  not only

How is rough orientation in the contents of an article useless?

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote:

 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?

Tim Armstrong is a sysop on Wikisource ... :-)  more below..

On Sun, Jun 21, 2009 at 4:17 PM, Ray Saintonge sainto...@telus.net wrote:

 Samuel Klein wrote:
  There is a wealth of work done all the time by primary source
  researchers and publishers, which could be improved on by having
  wikisource entries, translations, c.
 
  Related question : how appropriate would large numbers of public
  domain texts, with page scans and the best available OCR [and
  translations of same], fit with what Wikisource does now?  This is
  clearly a wiki project that needs to happen : OCR even at its best
  misses rare meaning-bearing words.   If not Wikisource, where should
  this work take place?

If it was published, Wikisource accepts it.  Notability is not a consideration.

The only other open project of comparable size is [[Distributed
Proofreaders]].  Here are our statistics:

http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics

Most of the Wikisource projects accept free translations.

http://wikisource.org/wiki/WS:COORD

The two English Wikisource featured translations are:

  http://en.wikisource.org/wiki/Balade_to_Rosemounde
  http://en.wikisource.org/wiki/J%27accuse
  (also translated into Dutch)

The two biggest translation projects that I know of are:

  http://en.wikisource.org/wiki/Romance_of_the_Three_Kingdoms
  http://en.wikisource.org/wiki/Bible_(Wikisource)

Another good one is

  http://en.wikisource.org/wiki/Max_Havelaar_(Wikisource)

We also have translations of laws, usually relating to copyright.

  
http://en.wikisource.org/wiki/Ordinance_93-027_of_30_March_1993_on_copyright,_related_rights_and_expressions_of_folklore

  From my perspective it fits perfectly with the vision that I had of
 Wikisource on the first day of its existence.  Tim Armstrong
 [[User:Tarmstro99]] has already done a considerable amount of valuable
 work relating to law on Wikisource.

Tim has been doing high impact work in this area.

   H.R. Rep. No. 94-1476

http://blogs.law.harvard.edu/infolaw/2008/06/17/an-open-access-success-story-just-in-time-for-cali/

   U.S. Statutes at Large

http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/

   http://en.wikisource.org/wiki/United_States_Statutes_at_Large

In regards the USC, the majority of it is a mess, but Title 17 is a
great example of where we are heading.

  http://en.wikisource.org/wiki/United_States_Code/Title_17

We also have transcription projects for the UK 1911 copyright act,
which has influenced so many other countries.

http://en.wikisource.org/wiki/Index:The_copyright_act,_1911,_annotated.djvu
http://en.wikisource.org/wiki/Index:A_treatise_upon_the_law_of_copyright.djvu

More can be found from our freshly minted Law index:

   http://en.wikisource.org/wiki/Wikisource:Law

Our two featured texts are:
   http://en.wikisource.org/wiki/South_Africa_Act_1909
   http://en.wikisource.org/wiki/ACLU_v._NSA_(District_Court_opinion)

 Most regular Wikisourcerors already have long personal to-do
 lists to keep them busy.  So the question is not really about whether
 Wikisource should host these goods, it's about recruiting volunteers to
 do the hard work.

If people want to help, but dont know where to start, my
recommendation is that they start proofreading the Stat. volume 1, as
this is goldmine of interesting documents, and will be an excellent
example of crowdsourcing of transcription.

http://en.wikisource.org/wiki/Index:United_States_Statutes_at_Large/Volume_1

Enjoy,
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wikitech-l] flagged revisions

2009-06-21 Thread Yaroslav M. Blanter
 But on the other end of the spectrum, on projects like the one
 I am active on (the Finnish Wikipedia)...

 Disruptive behaviour is not wired into our genes or our
 culture, but quietly co?perative behaviour has been. Our
 wikipedia is a paradise in comparison to many. For this
 reason personally I would consider it a great shame if
 we were to be granted flagged revs before say 750 000
 or one million. And even as I say this, I know there are
 the chance brothers (Fat and Slim) that this devout wish
 will be observed. I consider it a great problem that
 solutions for problems larger wikis have are nearly
 without exception foisted on smaller wikis without much
 consideration of what their real effect there will be, and
 are they really ready for it.

Could you may be motivate your opinion? Are you saying that there are no
vandals on fi.wp (which I can buy) and that novices on fi.wp first read
the rules and learn them by heart, and only then start creating articles?

Cheers
Yaroslav


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote:

 Stephen Bain wrote:
  On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com
 wrote:
 
  Except google isn't asserting any kind of copyright control over these
  books, they're just not making it convenient to download them in your
  preferred format.  Maybe not The Right Thing, but not as boneheaded as
 suing
  a party who reprints public domain material, as was the case in Feist v.
  Rural (the supreme court case you mention.)
 
  They want people to use their service. Fair enough, given that the
  scanning and OCRing happened on their dime.
 
 
 How does that give them any special rights?  There are no database
 protection laws in the US, and sweat-of-the-brow has been rejected as a
 basis for new copyrights.


You're right, it doesn't give them any *special* rights.  They have the same
rights as any other computer owner.  Specifically, they have the right to
choose who uses their computers, and how they use them.  Whether or not a
terms of service is legally binding is really not the issue. (*)  The issue
is whether or not they have a duty to make it *convenient* for you to
download the data.  Of course they don't.  Why should they be required to
help you put them out of business?  That kind of twisted logic might make
sense in the non-profit world (although I still haven't seen the WMF step up
to the plate and make it easy for people to make a full history fork, or
even to download all the images), but Google is not a non-profit
organization.  Google would be Evil if it *didn't* protect itself against
this, as it'd be breaking a promise to its shareholders.

(*) Personally, I'm of the opinion that merely accessing a website is not
sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
not have to even click agree to is a unilateral contract which can only
impose promises upon the offeror, though this is not a legal opinion but
merely my opinion of what the law should be.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:17 AM, Anthony wikim...@inbox.org wrote:

 (*) Personally, I'm of the opinion that merely accessing a website is not
 sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
 not have to even click agree to is a unilateral contract which can only
 impose promises upon the offeror, though this is not a legal opinion but
 merely my opinion of what the law should be.


You know what, after further thought I'm going to withdraw that.  First of
all, I think Google does require you to click agree before you can access
the service we're talking about.  But more importantly, I'm going to cast
doubt on my previously held opinion of whether or not a TOS should be able
to bind someone who didn't click on anything.  If I leave a bunch of Apples
on the table at work and put next to it a sign that says Apples: $.25
each...  I don't know, I'll have to think about it.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 9:17 PM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote:

  Stephen Bain wrote:
   On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com
  wrote:
  
   Except google isn't asserting any kind of copyright control over these
   books, they're just not making it convenient to download them in your
   preferred format.  Maybe not The Right Thing, but not as boneheaded as
  suing
   a party who reprints public domain material, as was the case in Feist v.
   Rural (the supreme court case you mention.)
  
   They want people to use their service. Fair enough, given that the
   scanning and OCRing happened on their dime.
  
  
  How does that give them any special rights?  There are no database
  protection laws in the US, and sweat-of-the-brow has been rejected as a
  basis for new copyrights.


 You're right, it doesn't give them any *special* rights.  They have the same
 rights as any other computer owner.  Specifically, they have the right to
 choose who uses their computers, and how they use them.  Whether or not a
 terms of service is legally binding is really not the issue. (*)  The issue
 is whether or not they have a duty to make it *convenient* for you to
 download the data.  Of course they don't.  Why should they be required to
 help you put them out of business?  That kind of twisted logic might make
 sense in the non-profit world (although I still haven't seen the WMF step up
 to the plate and make it easy for people to make a full history fork, or
 even to download all the images), but Google is not a non-profit
 organization.  Google would be Evil if it *didn't* protect itself against
 this, as it'd be breaking a promise to its shareholders.

 (*) Personally, I'm of the opinion that merely accessing a website is not
 sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
 not have to even click agree to is a unilateral contract which can only
 impose promises upon the offeror, though this is not a legal opinion but
 merely my opinion of what the law should be.

Whether Google is good or evil is off-topic, and irrelevant to boot.

There are nearly _750,000_ books from Google that are available on
archive.org, available in DJVU format with OCR.

  http://www.archive.org/details/googlebooks

Microsoft donated many texts directly to IA, but that approach only
netted 440,000 books.

  http://www.archive.org/details/msn_books

See here for more of the collections:
   http://www.archive.org/details/texts

Also worth noting, Project Gutenberg has digitised less than 30,000
books since 1971.  Distributed Proofreaders has done 15,000 of those
since 2000, so throughput is picking up.  But, there are more than
enough too keep everyone busy for a very long time.

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote:

 Whether Google is good or evil is off-topic, and irrelevant to boot.


Whether or not they have a right to exclude bots isn't.

Also worth noting, Project Gutenberg has digitised less than 30,000
 books since 1971.  Distributed Proofreaders has done 15,000 of those
 since 2000, so throughput is picking up.  But, there are more than
 enough too keep everyone busy for a very long time.


The interesting thing is, even if you don't use a bot, it's still faster to
copy/paste from Google manually than it is to get the book and scan it in
yourself (assuming you don't want to destroy the original, anyway).

If you're going to make a project out OCRing books that Google has already
OCRed, I don't see any point in reinventing the scanning or first pass
OCRing part.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 10:07 PM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote:

  Whether Google is good or evil is off-topic, and irrelevant to boot.
 

 Whether or not they have a right to exclude bots isn't.

Actually, it is.  This mailing list is about the Wikimedia Foundation
and its project, and this thread is about Wikisource.  Anyone who has
done significant amounts of Wikisource work will tell you that they
don't consider Google Book click through license to be an problem that
needs discussing at this level.

Do you think that 750,000 Google Books were manually converted to
DJVU, and copied over to Internet Archive?

Is there a book that you seek that isn't available at Internet Archive?

I wrote a GreaseMonkey user script to scrape the text from Google
Books; it is now broken and unmaintained because I no longer need to
take text from Google Books, as the vast majority of the texts I want
are now on Internet Archive, and that is a more productive workflow.

 Also worth noting, Project Gutenberg has digitised less than 30,000
  books since 1971.  Distributed Proofreaders has done 15,000 of those
  since 2000, so throughput is picking up.  But, there are more than
  enough too keep everyone busy for a very long time.


 The interesting thing is, even if you don't use a bot, it's still faster to
 copy/paste from Google manually than it is to get the book and scan it in
 yourself (assuming you don't want to destroy the original, anyway).

No, it is quicker to download the DJVU file from Internet Archive,
upload it to Wikisource, set up a transcription project, and fix the
OCR text there, and copy and paste it wherever you like.

It takes about 10 minutes unless there is some copyright concern.

 If you're going to make a project out OCRing books that Google has already
 OCRed, I don't see any point in reinventing the scanning or first pass
 OCRing part.

I suggest you take a look at a few of the DJVU files provided by
Internet Archive.  Then you can point out real faults that you see.

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Iran?

2009-06-21 Thread David Gerard
2009/6/21 Ray Saintonge sainto...@telus.net:

 Sure, transparency is a problem, but its absence alone does not imply
 fraud.  It hurts the Iranian authorities even more if the vote count is
 accurate because nobody believes them.


Evidence the numbers were made up: humans are not very good at picking
random numbers:

http://www.washingtonpost.com/wp-dyn/content/article/2009/06/20/AR200906204.html

(This is way off-topic ...)


- d.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Google Translate now assists with humantranslations of Wikipedia articles

2009-06-21 Thread David Gerard
2009/6/21 Nikola Smolenski smole...@eunet.yu:
 Дана Saturday 13 June 2009 18:20:36 picus-viridis написа:

 IMHO automatic translations into Polish are useless, as they only allow
 rough orientation in the contents of an article. It concerns  not only

 How is rough orientation in the contents of an article useless?


It's not useless, but it's not all that useful. I find when
translating from other Wikipedias to add to the English version of an
article that it's the subtle and important details that get mashed to
uncertainty.


- d.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


I will.  My apologies for misunderstanding your email.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


 I will.  My apologies for misunderstanding your email.


Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
be the first book I randomly picked from Google Book Search.  There's no
text version.

And the text version I find of other editions seems to be much much worse
than the google OCR results.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Google Translate now assists with humantranslations of Wikipedia articles

2009-06-21 Thread Mark Williamson
It also depends on the language pair. For Chinese to English, I
wouldn't even bother with such a process (having a machine translate
and then correct the errors); for Spanish to English I do this very
frequently and it's a great timesaver.

Mark

skype: node.ue



On Sun, Jun 21, 2009 at 7:05 AM, David Gerarddger...@gmail.com wrote:
 2009/6/21 Nikola Smolenski smole...@eunet.yu:
 Дана Saturday 13 June 2009 18:20:36 picus-viridis написа:

 IMHO automatic translations into Polish are useless, as they only allow
 rough orientation in the contents of an article. It concerns  not only

 How is rough orientation in the contents of an article useless?


 It's not useless, but it's not all that useful. I find when
 translating from other Wikipedias to add to the English version of an
 article that it's the subtle and important details that get mashed to
 uncertainty.


 - d.

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:55 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.comwrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


 I will.  My apologies for misunderstanding your email.


 Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
 be the first book I randomly picked from Google Book Search.  There's no
 text version.

 And the text version I find of other editions seems to be much much worse
 than the google OCR results.


http://books.google.com/books?id=TZ0UYAAJ strike two, not even there.
http://books.google.com/books?id=PYAaYAAJ strike three
http://www.archive.org/details/happinessessays00hiltgoog finally...let's
compare the OCR:

Great numbers of thoughtful people are just now much perplexed to know what
to make of the faffs of life, and are looking about them for some reasonable
interpretation of the modern world. They cannot abandon the work of the
world, but they are conscious that they have not learned the art of work.

Greaf numbers of thoughtful people are just now much perplexed to know what
to make of thefaSls of life^ and are looking about them for some reasonable
interpretation of the modem world. They cannot abandon the work of the
worlds but they are conscious that they have not learned the art of work.
---
Few people, however, really know how to work, and even in an age when
oftener perhaps than ever before we hear of work and workers one cannot
observe that the art of work makes much positive progress. On the contrary,
the general inclination seems to be to work as little as possible, or to
work for a short time in order to pass the remainder of one's life in rest.

Few people, however,  really know how to work, and even in an age when
oftener perhaps than ever before we hear of work  and  workers  one
cannotobserve that the art of work makes much positive progress. On the
contrary, the general inclination seems to be to work as little as possible,
or to work for a short time in order to pass the remainder of one's life in
rest. 
---
I guess that's acceptable.  The Catholic encyclopedia results were much
worse, though.  Maybe it was a font thing, but I'm not quite interested
enough to bother doing a more in depth study right now.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Iran?

2009-06-21 Thread Nathan
On Sun, Jun 21, 2009 at 10:00 AM, David Gerard dger...@gmail.com wrote:



 Evidence the numbers were made up: humans are not very good at picking
 random numbers:


 http://www.washingtonpost.com/wp-dyn/content/article/2009/06/20/AR200906204.html

 (This is way off-topic ...)


Convincing, surely, but not as definitive as reports that the Interior
Ministry in Tehran (where votes are counted) remained closed during and
after the election, with doors locked against employees who would otherwise
be tallying ballots.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


[Foundation-l] Google Books

2009-06-21 Thread John Vandenberg
subject line changed

On Mon, Jun 22, 2009 at 12:55 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote:

  On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote:
 
  I suggest you take a look at a few of the DJVU files provided by
  Internet Archive.  Then you can point out real faults that you see.
 
 
  I will.  My apologies for misunderstanding your email.
 

 Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
 be the first book I randomly picked from Google Book Search.  There's no
 text version.

Lucky you.  Most of the other CE1913 volumes on Internet Archive have
a DJVU file.

http://www.archive.org/search.php?query=The%20Catholic%20Encyclopedia%20AND%20mediatype%3Atexts

 And the text version I find of other editions seems to be much much worse
 than the google OCR results.

The OCR engines, especially tesseract which Google uses, have only
recently started to handle multiple columns well, so old OCR output
are of lesser quality.  If an old DJVU has been copied over to
Internet Archive, Google Books may have reprocessed that book
resulting in better OCR being available that way.  Internet Archive
also reprocesses its DJVU files, and Wikisource has its own OCR
button which allows per-page reprocessing to be done by an OCR bot in
the background.

However, CE1913 is not a good example as it would be a bit silly to
use OCR from _anywhere_: there are multiple complete proof-read
editions on the web, including on Wikisource ;-)

http://en.wikisource.org/wiki/CE1913

Also note that Google Books shows the volumes of CE1913 as mostly No
preview available to me, probably because I am in Australia, and only
one or two are Snippet view.

http://books.google.com.au/books?q=intitle%3ACatholic+Encyclopedia;

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote:

 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?

Here are seven articles from PLoS One.

http://en.wikisource.org/wiki/Category:Plosone

We have other published material that has been released under CC licenses:

http://en.wikisource.org/wiki/Unhappy_Thought

And books under various licenses:

http://en.wikisource.org/wiki/Bulgarian_Policies_on_the_Republic_of_Macedonia
http://en.wikisource.org/wiki/A_Short_History_of_Russian_%22Fantastica%22
http://en.wikisource.org/wiki/Free_as_in_Freedom

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wikitech-l] flagged revisions

2009-06-21 Thread Jussi-Ville Heiskanen
Yaroslav M. Blanter wrote:
 But on the other end of the spectrum, on projects like the one
 I am active on (the Finnish Wikipedia)...

 Disruptive behaviour is not wired into our genes or our
 culture, but quietly co?perative behaviour has been. Our
 wikipedia is a paradise in comparison to many. For this
 reason personally I would consider it a great shame if
 we were to be granted flagged revs before say 750 000
 or one million. And even as I say this, I know there are
 the chance brothers (Fat and Slim) that this devout wish
 will be observed. I consider it a great problem that
 solutions for problems larger wikis have are nearly
 without exception foisted on smaller wikis without much
 consideration of what their real effect there will be, and
 are they really ready for it.

 
 Could you may be motivate your opinion? Are you saying that there are no
 vandals on fi.wp (which I can buy) and that novices on fi.wp first read
 the rules and learn them by heart, and only then start creating articles?

   

No, that is definitely *not* what I am saying. Admins on the
Finnish wikipedia have on the occasion had to go to the
lengths of blocking whole grammar schools from editing.

What I *am* saying - and I suspect none of my countrymen
would dispute me in this - is that  in  Finland  vandals are
vastly overrun by people of good faith editing and cleaning
after the vandals. So much so  that the vandals effect is
easily negligible. Negligible over the long term, but also negligible
in the moment.

And thus, flaggedrevs would not provide nearly any added
disincentive for vandals, but would add workload for the
good faith editors, and slow down content production.


Yours,

Jussi-Ville Heiskanen


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Anthony wrote:
 On Sun, Jun 21, 2009 at 10:55 AM, Anthony wrote:
   
 Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
 be the first book I randomly picked from Google Book Search.  There's no
 text version.

 And the text version I find of other editions seems to be much much worse
 than the google OCR results.
 
 http://books.google.com/books?id=TZ0UYAAJ strike two, not even there.
 http://books.google.com/books?id=PYAaYAAJ strike three
 http://www.archive.org/details/happinessessays00hiltgoog finally...let's
 compare the OCR:

 Great numbers of thoughtful people are just now much perplexed to know what
 to make of the faffs of life, and are looking about them for some reasonable
 interpretation of the modern world. They cannot abandon the work of the
 world, but they are conscious that they have not learned the art of work.

 Greaf numbers of thoughtful people are just now much perplexed to know what
 to make of thefaSls of life^ and are looking about them for some reasonable
 interpretation of the modem world. They cannot abandon the work of the
 worlds but they are conscious that they have not learned the art of work.
 ---
 Few people, however, really know how to work, and even in an age when
 oftener perhaps than ever before we hear of work and workers one cannot
 observe that the art of work makes much positive progress. On the contrary,
 the general inclination seems to be to work as little as possible, or to
 work for a short time in order to pass the remainder of one's life in rest.

 Few people, however,  really know how to work, and even in an age when
 oftener perhaps than ever before we hear of work  and  workers  one
 cannotobserve that the art of work makes much positive progress. On the
 contrary, the general inclination seems to be to work as little as possible,
 or to work for a short time in order to pass the remainder of one's life in
 rest. 
 ---
 I guess that's acceptable.  The Catholic encyclopedia results were much
 worse, though.  Maybe it was a font thing, but I'm not quite interested
 enough to bother doing a more in depth study right now.
.
Who is expecting OCR to be perfect anywhere?  In the absence of real 
human proofreading I assume any OCR material to be fraught with errors. 
Wikisource aims to accurately reproduce what was published, including 
original errors.  Scans alone provide the needed accuracy, but they are 
not suitable for the added value of wikification.

Ec

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l