Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread Pradeep Mohandas
I got the 80% success from Sankarshan's posterous -
http://sankarshan.posterous.com/the-plan-to-create-a-digital-library-of-100-c

The
problem that Ashwin Baindur raised was the improper digitisation effort. A
rough Google search tells me that C-DAC is doing the digitisation for the
Maharashtra Archives - http://www.cdac.in/html/egov/mda.aspx - which as
Ashwin raised the point is stored on compact disks. Interestingly they are
using SQL and Visual Basic under Windows NT. I am not sure if this is a good
thing. I also do not know when this project was done either. So, not sure if
those were then current technologies.

We discussed yesterday that Maharashtra Archives being a public institution
(or for that matter any public institution) should ideally make these
documents either public domain or release under an open copyright (do
correct me if I am wrong with terminology).

warm regards,
Pradeep

On 14 February 2011 11:29, Pradeep Mohandas wrote:

> hi,
>
> At the discussion yesterday, we were told that the OCR did not work at all
> in case of many Indian languages. Also, as a person who does not understand
> OCR at all, can any one help me with what they mean by a 80% successful
> OCR?
>
> The other end of the process is the digitisation machine needed to convert
> the physical text into image. Any ideas on availability and cost of a museum
> grade digitisation machine? I am sure you cannot and the archives will not
> let you use an ordinary device to handle these documents.
>
> thanks in advance,
> Pradeep
>
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread Pradeep Mohandas
hi,

At the discussion yesterday, we were told that the OCR did not work at all
in case of many Indian languages. Also, as a person who does not understand
OCR at all, can any one help me with what they mean by a 80% successful
OCR?

The other end of the process is the digitisation machine needed to convert
the physical text into image. Any ideas on availability and cost of a museum
grade digitisation machine? I am sure you cannot and the archives will not
let you use an ordinary device to handle these documents.

thanks in advance,
Pradeep
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread jayanta nath
Dear all

Regarding IndicOCR I want to share our Bengali OCR open source project from
BRAC University Bangladesh. It is still developing stage and accuracy
80-90%.

http://crblp.bracu.ac.bd/
http://crblpocr.blogspot.com/

On Mon, Feb 14, 2011 at 10:31 AM, sankarshan wrote:

> On Mon, Feb 14, 2011 at 10:24 AM, Gautam John 
> wrote:
>
> > What ever became of the Digital Library of India project?
> > http://www.dli.ernet.in/
>
> Whatever happens to projects like that ... (there's a tweet from
> @abhaga in this regard)
>
> > Wasn't OCR high on their to-do list, as such?
>
> The point I was making is that most of the code that enables Indic OCR
> to reach higher percentages of accuracy isn't available under FOSS
> licenses. Debayan had been working on this for a while. There is a
> reference to the "technology" (as requested by Nagarjuna in a later
> mail) at <
> http://sankarshan.posterous.com/the-plan-to-create-a-digital-library-of-100-c
> >
>
> --
> sankarshan mukhopadhyay
> 
>
> ___
> Wikimediaindia-l mailing list
> Wikimediaindia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
>



-- 
With Warm Regards,
*Jayanta Nath*
Calcutta,West Bengal
+91 9836294438
Facebook :http://www.facebook.com/jayantanth
Wikipedia :http://en.wikipedia.org/wiki/User:Jayantanth
আসুন পাইরেসি মুক্ত ভারত  গড়ি,সবাই মুক্ত সফ্‌টওয়ার ব্যবহার করি [image:
O:-)],অন্যকে ব্যবহারে উৎসাহিত করি।
__

Wikimediaindia-l mailing list
wikimedia-in...@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikimedia-in-wb
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread sankarshan
On Mon, Feb 14, 2011 at 10:24 AM, Gautam John  wrote:

> What ever became of the Digital Library of India project?
> http://www.dli.ernet.in/

Whatever happens to projects like that ... (there's a tweet from
@abhaga in this regard)

> Wasn't OCR high on their to-do list, as such?

The point I was making is that most of the code that enables Indic OCR
to reach higher percentages of accuracy isn't available under FOSS
licenses. Debayan had been working on this for a while. There is a
reference to the "technology" (as requested by Nagarjuna in a later
mail) at 


-- 
sankarshan mukhopadhyay


___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread Gautam John
On 14 February 2011 10:18, sankarshan  wrote:

> Indic OCR, at least the bits that are available under an appropriate
> FOSS license, have an accuracy of around 80%. Considering the volume
> and fragility of what you will OCR, that's remarkably low.

What ever became of the Digital Library of India project?
http://www.dli.ernet.in/

Wasn't OCR high on their to-do list, as such?

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread sankarshan
2011/2/14 shirish शिरीष :

> In Pune, around this time lot of colleges have their technical weeks
> where they show projects, last year and couple of years before I had
> seen students who had made nice OCR's which could work with indic
> languages but obviously required lot of polish and getting into the
> whole 'code maintainance' thing. The students motivation for that had
> been to do as a project and not getting things 'maintained' which is
> unglamorous grunt work. Also documentation is something that would
> need to be looked at and fine-tuned.

Indic OCR, at least the bits that are available under an appropriate
FOSS license, have an accuracy of around 80%. Considering the volume
and fragility of what you will OCR, that's remarkably low.

-- 
sankarshan mukhopadhyay


___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread shirish शिरीष
In-line :-

On Mon, Feb 14, 2011 at 09:13, Pradeep Mohandas
 wrote:
> hi,
> Thanks Arun for posting it here. I was really tired when I posted this
> yesterday to the Mumbai mailing list. It does seem more or less accurate. A
> few additions from what I missed out yesterday night.
> 1. With The Museum ( Chattrapati Shivaji Maharaj Vastu Sangrahalay - CSMVS),
> we spoke with the Museum Director, Chief Art Conservator and the Business
> Development Executive there. After the requisite background that Liam
> provided (it also helped perhaps that the British Museum Director, Neil
> MacGregor visited the Museum this week) that we gave them a rough idea of
> possible activities we could do together. We also explained to the
> Conservator how a typical session would be conducted to help them understand
> their part of the responsibilities and what will be ours.
> 2. At Jnanapravaha, Bishaka and Liam visited. After the usual pitch, the
> feedback was that they were interested in getting students to write and
> improve articles on Indian art and aesthetics on Wikipedia. They have asked
> for help previous to the next semester in July on how such a thing can be
> organised.
> 3. Ashwin Baindur asked about how to work with institutions like Maharashtra
> Archives which are facing a brunt of the budget cuts (they get the money
> after the song and dance shows, museums etc all get their cut) and have
> trouble with up-keep of their archives. Liam replied that this would mainly
> be in helping them digitise records. The trouble, Liam said, was on where to
> begin and how to priorotise work. Stating the example of the National
> Library, Kolkata he said that some books were not even docketed (I forget
> the original word - something to do with giving the books numbers and
> classifying appropriately).

Hi all,
 Wanted to come but for many reasons couldn't be there. Anyways,
the word you are searching for is 'cataloging' and having some sort of
'Integrated Library Management' .

https://secure.wikimedia.org/wikipedia/en/wiki/Integrated_library_system

https://secure.wikimedia.org/wikipedia/en/wiki/Library_catalog

In fact making or getting a good cataloging system is a high pain point.

There are many foss tools that could be used for ILS but all of that
will need funding I guess. For inspiration people could look at Delhi
Public Library as they have used FOSS tools. Koha is what they use.

https://secure.wikimedia.org/wikipedia/en/wiki/Koha_%28software%29

>We agreed that Libraries and Archives also
> suffered because there was no good Optical Character Recognition (OCR)
> software for Indic languages. Liam suggested a French example of how an old
> French cursive text made it un-OCR-able (new word - mine!) and got help from
> Wikipedians to manually type in text onto WikiSource.

In Pune, around this time lot of colleges have their technical weeks
where they show projects, last year and couple of years before I had
seen students who had made nice OCR's which could work with indic
languages but obviously required lot of polish and getting into the
whole 'code maintainance' thing. The students motivation for that had
been to do as a project and not getting things 'maintained' which is
unglamorous grunt work. Also documentation is something that would
need to be looked at and fine-tuned.

> 4. Bishaka raised the point that all of the GLAM activities could also be
> simultaneously done in various languages locally. So, during a Backstage
> Pass event in Mumbai, we could improve the English, Hindi and Marathi (say)
> articles at once. Editors in any language are welcome to contribute.
> 5. There have been an influx of new people and requests from people for a
> basic editing session. Perhaps it is time to interlace the meetups with
> WikiAcademy.
> With that, I hope I've more or less covered all Wikipedia territory from the
> meetup. If I missed out on anything, feel very free to jump in and add your
> notes.
> warm regards,
> Pradeep

-- 
          Regards,
          Shirish Agarwal  शिरीष अग्रवाल
  My quotes in this email licensed under CC 3.0
http://creativecommons.org/licenses/by-nc/3.0/
http://flossexperiences.wordpress.com
065C 6D79 A68C E7EA 52B3  8D70 950D 53FB 729A 8B17

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] Fwd: [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

2011-02-13 Thread Pradeep Mohandas
hi,

Thanks Arun for posting it here. I was really tired when I posted this
yesterday to the Mumbai mailing list. It does seem more or less accurate. A
few additions from what I missed out yesterday night.

1. With The Museum ( Chattrapati Shivaji Maharaj Vastu Sangrahalay - CSMVS),
we spoke with the Museum Director, Chief Art Conservator and the Business
Development Executive there. After the requisite background that Liam
provided (it also helped perhaps that the British Museum Director, Neil
MacGregor visited the Museum this week) that we gave them a rough idea of
possible activities we could do together. We also explained to the
Conservator how a typical session would be conducted to help them understand
their part of the responsibilities and what will be ours.
2. At Jnanapravaha, Bishaka and Liam visited. After the usual pitch, the
feedback was that they were interested in getting students to write and
improve articles on Indian art and aesthetics on Wikipedia. They have asked
for help previous to the next semester in July on how such a thing can be
organised.
3. Ashwin Baindur asked about how to work with institutions like Maharashtra
Archives which are facing a brunt of the budget cuts (they get the money
after the song and dance shows, museums etc all get their cut) and have
trouble with up-keep of their archives. Liam replied that this would mainly
be in helping them digitise records. The trouble, Liam said, was on where to
begin and how to priorotise work. Stating the example of the National
Library, Kolkata he said that some books were not even docketed (I forget
the original word - something to do with giving the books numbers and
classifying appropriately). We agreed that Libraries and Archives also
suffered because there was no good Optical Character Recognition (OCR)
software for Indic languages. Liam suggested a French example of how an old
French cursive text made it un-OCR-able (new word - mine!) and got help from
Wikipedians to manually type in text onto WikiSource.
4. Bishaka raised the point that all of the GLAM activities could also be
simultaneously done in various languages locally. So, during a Backstage
Pass event in Mumbai, we could improve the English, Hindi and Marathi (say)
articles at once. Editors in any language are welcome to contribute.
5. There have been an influx of new people and requests from people for a
basic editing session. Perhaps it is time to interlace the meetups with
WikiAcademy.

With that, I hope I've more or less covered all Wikipedia territory from the
meetup. If I missed out on anything, feel very free to jump in and add your
notes.

warm regards,
Pradeep
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l