Re: [sword-devel] Bible in Myanmar

2019-05-16 Thread David Haslam
Cyrille writes, "Do you know which mark?" I've yet to do the more detailed analysis, but these 3 are initial candidates: U+1038 း 7,959 MYANMAR SIGN VISARGA U+104A ၊ 601 MYANMAR SIGN LITTLE SECTION U+104B ။ 1,489 MYANMAR SIGN SECTION But as I observed before, where each verse ends requires more

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Cyrille
Il 15/05/2019 19:18, David Haslam ha scritto: > Each of the last 1 or 2 characters of each verse is a regular Myanmar > punctuation mark. > Do you know wich mark? > We need to be careful how we apply this.  There may well be > some exceptions. > > Windows users should install BabelPad. This free

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Each of the last 1 or 2 characters of each verse is a regular Myanmar punctuation mark. We need to be careful how we apply this. There may well be some exceptions. Windows users should install BabelPad. This free Unicode text editor is highly recommended. http://www.babelstone.co.uk/Software/

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Cyrille
I have not understood everything yet ... But I trust you. But if you have the courage to explain to me I want to learn :) What I don't understand is how you can find the marker of each verse and chapter in the utf8 text? What is this marker in question? Il 15/05/2019 19:03, David Haslam ha scritto

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Michael’s description matches how I imagined the method during my waking moments this morning. :) David Sent from ProtonMail Mobile On Wed, May 15, 2019 at 17:33, Michael H wrote: > I've been working long hours and emailing in my break time. David has the > basics of converting to VPL. > >

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Michael H
I've been working long hours and emailing in my break time. David has the basics of converting to VPL. I would then make the entire work a column in a spreadsheet. Then in other collumns insert a list of Book/chapter/verse in order. The BCV and versetext columns should align and can be verifie

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
The attachment contains a counted list of Myanmar words containing a font conversion error. NB. We need to match these words with what they are in the legacy font. This issue should be discussed with the current maintainer of the SIL TECkit converter, whoever that may be. It may be worthwhile a

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Observations: (continued) 5. The string "Kd;" also looks anomalous. It's found only once in ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊ 6. It's evident from the PDF file that the text is paragraphed with indented first lines. See https://www.dropbox.com/s/do5e675i19xfomf/

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Observations: (continued) 4. In addition to the reported instances of the anomalous 3 characters (È,Ø,ò) found after the font conversion, there are 6 instances of the string "m;" that are also probably due to bugs in the converter. Best regards, David Sent with [ProtonMail](https://protonmail

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Yep - sure - later I can do that. David Sent from ProtonMail Mobile On Wed, May 15, 2019 at 11:26, Cyrille wrote: > David I have no count in box, and I want not to create one. Can you push on > https://framadrop.org/ it's totally free and secure (and private). > Thank you. > > Il 15/05/2019

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Cyrille writes: “My question is, can you do something with the txt file for adding the verse number?” Well - yes - that’s my intention. It was after all an “interim progress report”. ;) David Sent from ProtonMail Mobile On Wed, May 15, 2019 at 11:24, Cyrille wrote: > Il 15/05/2019 11:46, D

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Cyrille
David I have no count in box, and I want not to create one. Can you push on https://framadrop.org/ it's totally free and secure (and private). Thank  you. Il 15/05/2019 11:46, David Haslam ha scritto: > Interim progress report. > > I downloaded the file Mat_utf8.zip from Cyrille's link and unzippe

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Cyrille
Il 15/05/2019 11:46, David Haslam ha scritto: > Interim progress report. > > I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the > contents to Mat_utf8-odt > > I opened the .odt file using 7-Zip from the Windows Explorer context menu, > and extracted the file contents.xml >

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread Cyrille
Sorry I was occupied! Il 15/05/2019 02:34, Michael H ha scritto: > I don't read Burmese and I don't know anyone who does. I was > suggesting you contact whoever provided you the files and permission, > and ask them if they can verify the unicode text reads correctly. > I did it, and the conversion

Re: [sword-devel] Bible in Myanmar

2019-05-15 Thread David Haslam
Interim progress report. I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml I used Notepad++ plug-in XMLTools to pretty print the XML fil

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread David Haslam
If the language is ordinary Burmese, we can use Google Translate to check that any particular verse has text that we should expect for the initially assigned reference. Machine translation is often adequate for such a limited purpose task. We can also check against how PageMaker rendered it to

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Michael H
If the text doesn't come from an active team with people who read Burmese: I am on another group list which has relationships with people working on minority languages related to Burmese, but not Myanmarese. I can ask if anyone can verify the unicode conversion works there if that's the only option

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Michael H
I don't read Burmese and I don't know anyone who does. I was suggesting you contact whoever provided you the files and permission, and ask them if they can verify the unicode text reads correctly. To a native speaker, the text will quickly be recognized as readable, or misspelled (letters are out

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
Il 14/05/2019 22:48, Michael H ha scritto: > You should be able to configure a regex search to find the verse > boundaries. > > Once you have verse boundaries, if you configure the text into Verse > per line it should be possible to assign each row a chapter and verse > number from a reference. T

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
Il 14/05/2019 22:55, Cyrille ha scritto: > > > Il 14/05/2019 22:45, Michael H ha scritto: >> Cyrille, did you start from the PDF or the pagemaker file? > PMaker >> Either way, you should send a snippet to your source and validate the >> words are still readable. As small as 30 words should be eno

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Michael H
You should be able to configure a regex search to find the verse boundaries. Once you have verse boundaries, if you configure the text into Verse per line it should be possible to assign each row a chapter and verse number from a reference. That is, the 3341 verse in the New Testament is usually J

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Michael H
Cyrille, did you start from the PDF or the pagemaker file? Either way, you should send a snippet to your source and validate the words are still readable. As small as 30 words should be enough. On Tue, May 14, 2019 at 8:09 AM Cyrille wrote: > I send my message again because it was bigger. > > Th

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
Il 14/05/2019 22:26, David Haslam ha scritto: > If Michael’s observations are anything to go by, then maybe I can > script the recovery of chapter & verse tags.  > > We shall see > > Even if I’m not immediately successful - valuable lessons can be > learned in the attempt. Very, well, I'll w

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread David Haslam
If Michael’s observations are anything to go by, then maybe I can script the recovery of chapter & verse tags. We shall see Even if I’m not immediately successful - valuable lessons can be learned in the attempt. David Sent from ProtonMail Mobile On Tue, May 14, 2019 at 21:21, Cyrille

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
Ok thank you!  I have already all the text in unicode but without the verse numbers and chapters... I begun manually... Il 14/05/2019 22:17, David Haslam ha scritto: > Hi Cyrille  > > If I can find the time tomorrow or later, I’ll have a look at what > might be feasible.  > > Thanks for all these

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread David Haslam
Hi Cyrille If I can find the time tomorrow or later, I’ll have a look at what might be feasible. Thanks for all these useful links. David Sent from ProtonMail Mobile On Tue, May 14, 2019 at 14:08, Cyrille wrote: > I send my message again because it was bigger. > > The conversion to UTF-8 is

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread David Haslam
The ThanLwinSoft software was indeed developed by Keith Stribley (1976-2011). Screenshot posted to my Facebook timeline. https://m.facebook.com/story.php?story_fbid=10213794210749822&id=1243443528 We had exchanged emails during the year before he died. Best regards David Sent from ProtonMail

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
I send my message again because it was bigger. The conversion to UTF-8 is 99% solved!! I used a online converter: https://thanlwinsoft.github.io/www.thanlwinsoft.org/ThanLwinSoft/MyanmarUnicode/Conversion/myanmarConverter.html or: http://burglish.my-mm.org/latest/trunk/web/fontconv.htm See the re

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Michael H
Cyrille, (Peter), Maybe further discussion on this belongs in Gitlab as issues. Can I get added to this project? Here are the first few lines of Matthew copied from the PDF: -- &Sifrmaw;OD; {0Ha*vdusrf; The Gospel According to Matthew ed'gef; usr;f ûyy*k Kd¾v f &iS rf maw;O;D \b0rwS wf r;f u

Re: [sword-devel] Bible in Myanmar

2019-05-14 Thread Cyrille
Yesterday I thought, if a pdf tool give the possibility to cut the pdf in the middle, then the raw conversion to txt can be possible, the we only need to convert it to UTF8. Any idea? Il 13/05/2019 17:40, Michael H ha scritto: > I unzipped the pagemaker file, and when I open NT_Proverb/Pagemaker >

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Michael H
I unzipped the pagemaker file, and when I open NT_Proverb/Pagemaker (10.1mb), with a Hex editor, I can 'find' all of the book names, and see the text there. To see the raw text: rename NT_Proverb.pmd > NT_Proverb.zip and open it with a zip archive progeram. The text is in the Pagemaker file at th

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
David, Probably you are right about TECkit , if we get the text it will help us to convert in UNICODE. About how to get the text, your method is out of my skills :) I you succeed please let me know. Il 13/05/2019 16:21, David

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread David Haslam
Given the insights from Michael Hart, it may be feasible to temporarily rearrange the main text stream as follows : 1. Replace every EOL by a horizontal tab. 2. Insert an EOL after each verse end character. Observe that the above two steps are wholly reversible such that the original text strea

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
Thank you Michael for your help! Let me know if you succeed to do something. Il 13/05/2019 15:57, Michael H ha scritto: > Cyrille > > LibreOffice Draw attempts to open the pagemaker file, with limited > success. But it confirms that even in the pagemaker source, the verse > numbers are a separate

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Michael H
Cyrille LibreOffice Draw attempts to open the pagemaker file, with limited success. But it confirms that even in the pagemaker source, the verse numbers are a separate text stream. With this source, there is no way to copy the text with verse numbers intact. It appears to be stored with each book

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
Il 13/05/2019 11:26, David Haslam ha scritto: > Last resort would be to reverse engineer a mapping table for the > legacy Myanmar font. Yes this is a good idea I think also to it even I can't do it. But First how to get the text... > > I once did a similar task for Times New Armenian.  > > David

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread David Haslam
Last resort would be to reverse engineer a mapping table for the legacy Myanmar font. I once did a similar task for Times New Armenian. David Sent from ProtonMail Mobile On Mon, May 13, 2019 at 10:10, Cyrille wrote: > Yes I found also this site, I made some test quickly, but the font used in

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
Il 13/05/2019 11:10, David Haslam ha scritto: > It’s bad enough when a typeset Bible has free floating margin notes > and xrefs instead of them being tied to tagged words.  > > When the verse tags are also relegated to the column side margins, > it’s seldom a simple task to reverse engineer.  > >

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread David Haslam
It’s bad enough when a typeset Bible has free floating margin notes and xrefs instead of them being tied to tagged words. When the verse tags are also relegated to the column side margins, it’s seldom a simple task to reverse engineer. Ask if they have a revision controlled set of files that ar

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
Yes I found also this site, I made some test quickly, but the font used in this bible seems to be an other. But first I need to get the text ;) You can found everything (if someone want to help) here: - Burmese_NT_Proverbs.pdf : https://framadrop.org/r/WYr7JpID2z#ZKKf1kToMWvyeDBOd3bv1aSzphfqAa+sZfl

Re: [sword-devel] Bible in Myanmar

2019-05-13 Thread David Haslam
First just the conversion of the legacy font to Unicode, I’d like to think this could be done with TECkit http://my.duniakitab.com/ThanLwinSoft/ThanLwinSoft/MyanmarUnicode/Conversion/TECkit.php Looks like the old website of the late Keith Stribley has been rescued from oblivion. IIRC, he died

[sword-devel] Bible in Myanmar

2019-05-13 Thread Cyrille
Hello, I recently receive a modern translation of Myanmar of the NT, Psalms and Proverbs with permission to create a new module. But the problems are many... Firs to get the text. I tested different way, but it's done with PageMaker! I can get the text but the problem is I don't have the verses num