Re: [Wikisource-l] Assessing OCR quality

2019-03-12 Thread scann
I don't know how it works inside Wikisource, but at the very least
Tesseract has a confidence value (also called confidence score or level)
that will score how well it did OCR on a word (it also works at character
level). But for assessing that you normally need the hOCR result.

cheers,

El mar., 12 mar. 2019 a las 17:27, Lars Aronsson ()
escribió:

> If you have a large digitization project, such as Wikisource,
> with many pages and books of scanned images and OCR text
> (originating from different sources and times),
> how do you assess the OCR quality and determine which pages
> are in most need of improved OCR or proofreading?
>
> Is spell checking (and a normal dictionary) the only useful tool?
> Would you count the number of spelling errors, or the ratio
> of errors to correct words? Has anyone done this?
>
>
> --
>Lars Aronsson (l...@aronsson.se)
>Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] (no subject)

2018-10-01 Thread scann
share the videos in the openglam account if you haven't yet :)

congrats!

El lun., 1 oct. 2018 11:40, Bodhisattwa Mandal 
escribió:

> Hi Alex,
>
> Glad you liked the initiative.
>
> Sure, we will gather the data and submit in the grant report on 30
> October, 2018.
>
> Regards,
> Bodhisattwa
>
>
> On Mon, 1 Oct 2018, 20:57 Alex Stinson,  wrote:
>
>> Congratulations Bodhi! This its awesome to have such great materials
>> coming out of the Bengali community! Please let us know how the circulation
>> of the video effects the viewership/participation in WikiSource.
>>
>> Cheers,
>>
>> Alex
>>
>> On Mon, Oct 1, 2018 at 12:17 AM Bodhisattwa Mandal <
>> bodhisattwa.rg...@gmail.com> wrote:
>>
>>> Sorry,
>>>
>>> The links of the youtube videos were not correctly shared by me
>>> previously. I am sharing them again.
>>>
>>> * লে হালুয়া  (What just
>>> happened)
>>> * মায়ের নজর  (Watchful
>>> eyes)
>>>  * দুনিয়া যখন দোরগোড়ায়
>>>  (World at your doorstep)
>>>
>>> Thanks Abhinav for pointing that out.
>>>
>>> Many apologies,
>>> Bodhisattwa
>>>
>>>
>>> On Mon, 1 Oct 2018 at 08:59, Bodhisattwa Mandal <
>>> bodhisattwa.rg...@gmail.com> wrote:
>>>
 Hi all,

 We are happy to announce that, three Bengali Wikisource Awareness
 Campaign videos have been released in Youtube channel
  of West
 Bengal Wikimedians user Group
 . This
 campaign was first such initiative to promote Wikisource among new readers.

 The project was first placed in IdeaLab
 
 as part of Inspire Campaign and was later funded by Wikimedia Foundation 
 rapid
 grant
 .
 The details can be found here
 
 .

 The youtube links of the short videos can be found here -
 * লে হালুয়া  (What just
 happened)
 * মায়ের নজর  (Watchful
 eyes)
  * দুনিয়া যখন দোরগোড়ায়
  (World at your doorstep)

 The videos were also uploaded to Commons. The links are as follows-
 * লে হালুয়া
 
 (What just happened)
 * মায়ের নজর 
 (Watchful eyes)
 * দুনিয়া যখন দোরগোড়ায়
 
 (World at your doorstep)

 The video production could have not been possible without -
 * Director - Ishan Sharma (Satyajit Ray Film and Television Institute,
 Kolkata)
 * Cinematographer - Sukhansaar Singh (do)
 * Executive Producer - Aryaman Nath (do)
 * Sound Director - Sukrit Sen (do)
 * Focus Puller - Theodros Tadesse (Ethiopia)
 * Assistant Director - Sneha Das (do)
 * Editor - Himangshu Kamble (Mumbai)
 * Colorist - Pankaj Nelson (Mumbai)
 * Plot proposer - Deepanjan Ghosh (Radio Mirchi Kolkata)
 * Cast - Sujan Ghosh, Suavo Mukherjee, Ritam Sarkar, Raja Chakravorty,
 Shrabanti Bhattacharya and Avinash

- * Bengali Wikisource team - Bodhisattwa Mandal, Hrishikes Sen,
Mahir Morshed, Jayanta Nath
- * WMF liaison - Satdeep Gill (Thanks a lot, buddy!)


 It was a completely new experience for us, and we have learnt a lot
 from the entire project. We will try to promote the videos for next 1 month
 and submit our grant report within the deadline of 30 October, 2018.

 Please do watch the videos and feel free to create subtitles in your
 languages. The English and Bengali subtitles are already there in Youtube
 and Commons.

 Enjoy!!

 Thanks,
 Bodhisattwa
 Bengali Wikisource community
 West Bengal Wikimedians User Group


 --
 Bodhisattwa


>>>
>>> --
>>> Bodhisattwa
>>>
>>> ___
>>> Wikisource-l mailing list
>>> Wikisource-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>>
>>
>>
>> --
>> Alex Stinson
>> GLAM-Wiki Strategist
>> Wikimedia Foundation
>> Twitter:@glamwiki/@sadads
>>
>> Learn more about how the communities behind Wikipedia, Wikidata and other
>> Wikimedia projects partner with cultural heritage organizations:
>> https://outreach.wikimedia.org/wiki/GLAM
>> 

Re: [Wikisource-l] EAP books

2018-07-27 Thread scann
I don't know if this is a relevant discussion or not, but as far as I
understand most of the books digitized under the EAP program have a
restrictive copyright notice, such as this one:
https://eap.bl.uk/archive-file/EAP704-1-2#?c=0=0=0=
23=-409%2C-162%2C5067%2C3173

Most of them are clearly PD books, but maybe it wouldn't hurt to try to
contact Jody Butterworth first (she's in charge of the EAP program) and
reach her about the permissions. Is one of those tricky situations where
you might have your stuff taken down (or worst, have a legal case around
it). The email is: endangeredarchi...@bl.uk. When I reached them out they
were quite responsive.

best,
scann


2018-07-25 18:08 GMT-04:00 Bodhisattwa Mandal :

> Hi,
>
> During Wikimania hackathon, User:Pmlineditor created two python scripts
> for British Library Endangered Archives Program books upon request from
> Bengali WIkisource community
>
> One of the script is to mass download the books, the other script is to
> selectively download and directly upload the books to Commons.
>
> The scripts are in github - https://github.com/prachatos/eap2pdf
>
> Thanks,
> Bodhisattwa
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Scanner for you?

2017-05-08 Thread scann
Hi Nicolas,

That's a Hackerspace scanner (Carles seems to have The Archivist one, which
has some differences).

I'm part of the DIY Book Scanner project, so if you have any feedback over
why the project didn't work, it would be more than interesting to hear it.

Maybe you also want to check out the new software that has been released to
control the cameras: https://github.com/Tenrec-Builders/pi-scan

It works for more camertas than the Powershot IXUS160, it depends a lot of
how the CHDK was build, but right now I know that there are some other
projects that has tested the PiScan with other models of cameras and it
works pretty well. Using PiScan in combination with the MarkersCrop:
https://github.com/Tenrec-Builders/marker-crop makes a lot of the workflow
easier.

I'm also more than happy to provide some help on how to make a decent
software workflow for the BS. I know this tends to be a major bottleneck
(specially if you are using Linux and the library is used to work with
Windows), but it can be sorted it out.

Best,
Scann

2017-05-08 9:50 GMT-03:00 Nicolas VIGNERON <vigneron.nico...@gmail.com>:

> Hi,
>
> Thanks for the proposal but Wikimédia France already has one (is it the
> same ? https://commons.wikimedia.org/wiki/Category:Book_scanners_
> of_Wikimedia_France ) that is not often used (indeed it needs some tech
> skills and time).
>
> I don't recall if Wikimedia CH already has one or is currently
> buying/building one (I know it was discuss). Yann: can you tell us?
>
> Cdlt, ~nicolas
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Scanner purchase

2014-09-09 Thread scann
2014-09-09 12:01 GMT-03:00 David Cuenca dacu...@gmail.com:


 Have you tried the Book Uploader Bot? It is a project created by Rohit (in
 CC) for his GSoC.
 http://tools.wmflabs.org/bub/index


Wow, thank you! This seems like a tool that we could use for the project,
definitely.

Do you know if it's possible to add Europeana as a Library? They also have
a lot of interesting stuff that I think it would be worthed to upload to
Commons.
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Scanner purchase

2014-09-08 Thread scann
Hello everyone!

In Wikimedia Argentina we have been using these scanners (the ones
that you're pointing at your e-mails) since 2012.

There are a lot of technical insights I can provide about this issue
(both about what we have been doing and the scanners) if you want, but
I'll like to make some comments on the future of this project.

Right now, in the DIY Book Scanner project we've come out with a new
design: http://www.diybookscanner.org/forum/viewtopic.php?f=14t=3063
, that has a lot of new features (it's able to scan bigger books,
deals better with lighting, etc), but it's real novelty is that now it
comes with a control system for the whole scanner, using a Raspberry
Pi and a software called Spreads:
http://spreads.readthedocs.org/en/latest/ , that helps to control the
cameras. We use Canon cameras with CHDK (Canon Hackers Development
Kit) to do remote  synchronized shooting. Spreads it's a software
written in Python.

Since in Wikimedia Argentina we have seven of these scanners (not all
of them are operational, btw), what I would like to do is add them the
7 raspberry pi to use it with Spreads, BUT I would like to have
someone writing some pieces of code to be able to scan  upload:
which means, to scan your book and be able to upload it automatically
to wikimedia commons  wikisource, using the same control system,
specially with no command line interface. The idea is also to have a
Spreads server running by that time, probably to be installed in
Wikimedia Argentina office. My idea is to write this project and
present it for a IEG, so it could be really interesting if other
chapters are interested in participate and want to join us in the
development.

I know that Wikimedia Indonesia bought one of these scanners (and
they're using it), CC Uruguay, that has a close relationship with
Wikimedia Uruguay, also bought one of these, and now Wikimedia Brazil.

Well, that's pretty much it. I can answer any questions about the DIY
Book scanners, and specially the way I see the project, GLAM and the
wikimedia movement can interact.

Best,
Scann

2014-09-07 0:26 GMT-03:00 Luiz Augusto lugu...@gmail.com:
 You'll certainly be interested on what Wikimedia Argentina is doing with DIY
 Book Scanners [1].

 They tried to submit a Wikimania presentation, but the team behind it's
 organization simply refused the paper with no further details (sigh). At
 least, you can check some interesting infos both on [2] and [3].

 I'm CC'ing Evelin. She is the person behind this successful approach.

 Best,
 [[m:User:555]]

 

 [1] -
 https://meta.wikimedia.org/wiki/Wikimedia_Argentina/Reportes/2014-02#New_scanner_at_our_office

 [2] -
 https://wikimania2014.wikimedia.org/wiki/Submissions/Open_hardware_and_Open_Source_for_Open_Content:_GLAM,_the_DIY_Book_Scanner_community_and_Wikimedia_digitalize_public_domain_books._Case_study_from_Argentina

 [3] -
 https://wikimania2014.wikimedia.org/wiki/Talk:Submissions/Open_hardware_and_Open_Source_for_Open_Content:_GLAM,_the_DIY_Book_Scanner_community_and_Wikimedia_digitalize_public_domain_books._Case_study_from_Argentina



 On Sat, Sep 6, 2014 at 1:58 PM, Pierre-Yves Beaudouin
 pierre.beaudo...@gmail.com wrote:

 Hi Carles,

 Wikimedia France bought a bookscanner in June [1]. It is not yet
 operational but I've tried a similar book scanner. It is easy and very fast
 (500p/hour).

 Pyb

 [1] http://www.bookscanner.fr/about-this-bookscanner.html


 2014-09-06 18:20 GMT+02:00 Carles Paredes Lanau carlespare...@gmail.com:

 Hi,

 Amical Wikimedia is examining the possibility of purchasing a
 professional scanner for the digitalization of collections in local
 libraries.

 I have seen this cheap prototype but I dunno if it's the best option:

 http://diybookscanner.org/forum/viewtopic.php?f=1t=1192

 Have anyone a professional scanner? Any chapter? Have you any
 recommendation?

 Regards,

 Carles

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l