I explored abbyy gx files, the full xml output from ABBYY ocr engine
running at Internet Archive, and I've been astonished by the amount of data
they contain - they are stored at XCA_Extended detaiI (as documented at
http://www.abbyy-developers.com/en:tech:features:xml ).
Something that
On Sat, Jul 11, 2015 at 8:44 AM, Nicolas VIGNERON
vigneron.nico...@gmail.com wrote:
Hi,
I'm not a techie so I'm not sure to know what is OCR-as-service but you
should ask Tpt and Phe who have OCR stuff on the tool labs (to know what is
behind tools like
On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni zanni.andre...@gmail.com
wrote:
uh, that sounds very interesting.
Right now, we mainly use OCR from djvu from Internet Archive (that means
ABBYY Finereader, which is very nice).
Yes, the output is generally good. But as far as I can tell, the
Does anyone mind that I keep posting these things? This time it's on es:
[0] = Pedagogía_Tolteca - Categoría:ES-P
[1] = Pedagogía_Tolteca - Categoría:Ensayos
[2] = Pedagogía_Tolteca - Categoría:Ensayos_de_Guillermo_Marín_Ruiz
[3] = Pedagogía_Tolteca - Categoría:Historia_de_México
Hi all, what is required to have de there as well? Arnd
On 12/07/15 17:29, Nicolas VIGNERON wrote:
2015-07-12 4:59 GMT+02:00 Sam Wilson s...@samwilson.id.au
mailto:s...@samwilson.id.au:
It only re-runs the script weekly, or when I hit 'go'. I've hit
go... and it's found another
Niclas, 1 and 3 are fine, for 2 and 4 the semantic is not clear for me.
What does it mean? Arnd
2015-07-12 13:48 GMT+02:00 Arnd arnd.schroe...@gmail.com
mailto:arnd.schroe...@gmail.com:
Hi all, what is required to have de there as well? Arnd
Arnd, could you confirm, this is right :
The 'index_root' is the category in which Indexes are put when they're
validated (i.e. proofread by at least two people).
Perhaps for German it's actually Kategorie:Korrigiert? Or is that what
proceeds Fertig?
If the correct site link is added to
https://www.wikidata.org/wiki/Q15634466 then
On 12/07/15 19:48, Arnd wrote:
Hi all, what is required to have de there as well? Arnd
Good question!
An addition to https://www.wikidata.org/wiki/Q15634466 is all.
I'm afraid I don't know more about that Item. ricordisamoa pointed it out.
It'd be great to get all Wikisources added there.
2015-07-12 4:59 GMT+02:00 Sam Wilson s...@samwilson.id.au:
It only re-runs the script weekly, or when I hit 'go'. I've hit go... and
it's found another loop! This one on br:
(
[0] = Jezuz-Krist_en_Breiz-Izel - Rummad:Contes_bretons
[1] = Rummad:Contes_bretons - Rummad:Levrioù
OCR is available by a javascript. Numbers of wikisources have it enabled as
a gadget, though I cannot speak for all the wikis. I presume it relates to
the languages available in the OCR.
Script is noted at
https://wikisource.org/wiki/Wikisource:Shared_Scripts
Regards, Billinghurst
On Sun, Jul
2015-07-12 13:48 GMT+02:00 Arnd arnd.schroe...@gmail.com:
Hi all, what is required to have de there as well? Arnd
Arnd, could you confirm, this is right :
'cat_label' = 'Kategorie',
'cat_root' = '!Hauptkategorie',
'index_ns' = 104,
'index_root' =
Kategorie:Fertig is correkt but it contains both indexes and pages.
Thus, i get an error when updating the Wikidata item.
The 'index_root' is the category in which Indexes are put when they're
validated (i.e. proofread by at least two people).
Perhaps for German it's actually
12 matches
Mail list logo