Re: [Wikisource-l] Systems for proofreading scanned books

Mateusz Malinowski Sun, 27 Dec 2020 22:29:37 -0800

"There is also a difference in how we view copyright,
as my own website can cut corners and scan some books
that are "most likely" out of copyright, which is
something Wikimedia's user communities never accept."


Some of the community accept this. Polish Wikisource project uploaded
translation of one's Montgomery book, as "pseudonymous" work without any
proofs that it is pseudonym (even if they are, they are against COM:PRP).
It's still on Commons and AFAIK rejected to delete by admins or not decided
yet.

Mateusz Malinowski

niedz., 27 gru 2020, 13:02 użytkownik <
[email protected]> napisał:

> Send Wikisource-l mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikisource-l digest..."
>
>
> Today's Topics:
>
>    1. Systems for proofreading scanned books (Lars Aronsson)
>    2. Re: Systems for proofreading scanned books (J Hayes)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 26 Dec 2020 19:23:02 +0100
> From: Lars Aronsson <[email protected]>
> To: Wikimedia developers <[email protected]>
> Cc: Wikisource <[email protected]>
> Subject: [Wikisource-l] Systems for proofreading scanned books
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> In 2005, at the first Wikimania in Frankfurt, Germany,
> Magnus Manske asked me if I could open up my Scandinavian
> book scanning website Project Runeberg to German and
> other languages, or release the software as open source.
>
> I refused, as my software is just a rapid prototype that
> would need to be rewritten from scratch anyway. But I
> said that Wikisource could be used for this purpose. At
> the time, Wikisource was only a wiki for e-text. As a
> proof of concept, I put up "Meyers Blitz-Lexikon" as
> the first book with scanned page images in Wikisource,
> https://de.wikisource.org/wiki/Seite:LA2-Blitz-0005.jpg
> and soon after the "New Student's Reference Work",
> https://en.wikisource.org/wiki/Page:LA2-NSRW-1-0013.jpg
>
> This was the basic inspiration for the "Proofread Page"
> extension, now used in Wikisource.
>
> In 2010-2011 I tried to use Wikisource, but I thought
> this extension was too hard to work with. From scanner
> to finished presentation, Wikisource was so much slower
> to work with than my own system. By primary gripes are:
> It is too hard to upload PDF files to Commons, it's too
> hard to create the Index page, each page is not created
> immediately (making the raw OCR text searchable), and
> pages hidden in the Page: namespace are not always
> indexed by search engines. Unfortunately, the system
> hasn't improved much in the last decade.
>
> (My criticism of my own website's system is a lot
> harsher, but hits different targets.)
>
> There is also a difference in how we view copyright,
> as my own website can cut corners and scan some books
> that are "most likely" out of copyright, which is
> something Wikimedia's user communities never accept.
>
> In 2012, I thought the time had finally come to rewrite
> my software, but I failed to organize a project around
> this, and instead I continued to use the existing system,
> just adding volume. Indeed, Project Runeberg has grown
> from 0.75 million book pages in 2012 to 3.1 million
> pages today.
>
> Now in 2020, I'm finally tired of my existing system's
> limitations. What should I do? It's not 2005 or 2012
> anymore. What has changed in that time?
>
> I can't move everything over to Wikisource, because of
> the copyright differences.
>
> Should I start to use Mediawiki + ProofreadPage and
> convert my collection to that format?
>
> Should I develop my own modification of Mediawiki?
> Is that a stable ground to work from?
>
> It seems to me that PHP, MariaDB and the architecture
> of Mediawiki with extensions has now been the same for
> a long time. Will this last for the next 20 years?
>
> Or is there today some other existing systems that
> solve the same problem, that weren't available in 2005?
> (And that Wikisource would have picked up, if it were
> started today, instead of developing its own extension.)
>
>
> --
>    Lars Aronsson ([email protected])
>    Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 26 Dec 2020 18:20:06 -0500
> From: J Hayes <[email protected]>
> To: "discussion list for Wikisource, the free library"
>         <[email protected]>
> Subject: Re: [Wikisource-l] Systems for proofreading scanned books
> Message-ID:
>         <CAN38RzKojj9K=
> [email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> My suggestions:
> Simplified UX to upload works is on the wishlist
> But a tool that led to user to interact on multiple projects to produce a
> “rough draft” work from a scan would be a great step forward.
> Copyright might be eased for a local copy at wikisource, not on commons.
> But you would need some community consensus. If you were bringing tools,
> they might work with you, you should reach out to them. You could also
> transfer over the easy copyright works to wikisource, and retain the loose
> ones at your site. (The value to using wikisource is the increased
> visibility being integrated in Wikipedia, and community building potential)
> So I would brainstorm some goals, and begin a conversation / partnership
> with your wikisource language community toward an action plan.
> If I can be of help let me know.
> Cheers
> Jim hayes
>
>
> On Sat, Dec 26, 2020 at 1:23 PM Lars Aronsson <[email protected]> wrote:
>
> > In 2005, at the first Wikimania in Frankfurt, Germany,
> > Magnus Manske asked me if I could open up my Scandinavian
> > book scanning website Project Runeberg to German and
> > other languages, or release the software as open source.
> >
> > I refused, as my software is just a rapid prototype that
> > would need to be rewritten from scratch anyway. But I
> > said that Wikisource could be used for this purpose. At
> > the time, Wikisource was only a wiki for e-text. As a
> > proof of concept, I put up "Meyers Blitz-Lexikon" as
> > the first book with scanned page images in Wikisource,
> > https://de.wikisource.org/wiki/Seite:LA2-Blitz-0005.jpg
> > and soon after the "New Student's Reference Work",
> > https://en.wikisource.org/wiki/Page:LA2-NSRW-1-0013.jpg
> >
> > This was the basic inspiration for the "Proofread Page"
> > extension, now used in Wikisource.
> >
> > In 2010-2011 I tried to use Wikisource, but I thought
> > this extension was too hard to work with. From scanner
> > to finished presentation, Wikisource was so much slower
> > to work with than my own system. By primary gripes are:
> > It is too hard to upload PDF files to Commons, it's too
> > hard to create the Index page, each page is not created
> > immediately (making the raw OCR text searchable), and
> > pages hidden in the Page: namespace are not always
> > indexed by search engines. Unfortunately, the system
> > hasn't improved much in the last decade.
> >
> > (My criticism of my own website's system is a lot
> > harsher, but hits different targets.)
> >
> > There is also a difference in how we view copyright,
> > as my own website can cut corners and scan some books
> > that are "most likely" out of copyright, which is
> > something Wikimedia's user communities never accept.
> >
> > In 2012, I thought the time had finally come to rewrite
> > my software, but I failed to organize a project around
> > this, and instead I continued to use the existing system,
> > just adding volume. Indeed, Project Runeberg has grown
> > from 0.75 million book pages in 2012 to 3.1 million
> > pages today.
> >
> > Now in 2020, I'm finally tired of my existing system's
> > limitations. What should I do? It's not 2005 or 2012
> > anymore. What has changed in that time?
> >
> > I can't move everything over to Wikisource, because of
> > the copyright differences.
> >
> > Should I start to use Mediawiki + ProofreadPage and
> > convert my collection to that format?
> >
> > Should I develop my own modification of Mediawiki?
> > Is that a stable ground to work from?
> >
> > It seems to me that PHP, MariaDB and the architecture
> > of Mediawiki with extensions has now been the same for
> > a long time. Will this last for the next 20 years?
> >
> > Or is there today some other existing systems that
> > solve the same problem, that weren't available in 2005?
> > (And that Wikisource would have picked up, if it were
> > started today, instead of developing its own extension.)
> >
> >
> > --
> >    Lars Aronsson ([email protected])
> >    Project Runeberg - free Nordic literature - http://runeberg.org/
> >
> >
> >
> > _______________________________________________
> > Wikisource-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.wikimedia.org/pipermail/wikisource-l/attachments/20201226/ecbdbaaf/attachment-0001.htm
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
> ------------------------------
>
> End of Wikisource-l Digest, Vol 1044, Issue 1
> *********************************************
>

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Systems for proofreading scanned books

Reply via email to