Proofing Tool GUI - Hyphenation support
Hello! Just to let the development community know that after five years planning to add hyphenation support to Proofing Tool GUI, it is finally implemented. It took me just around a couple of weeks to have it ready since only now I was told about the algorithm. You can give it a try (PTG 3.0 - build 123) at: www.proofingtoolgui.org Don't hesitate to report any issues or give suggestions. Thank you! Kind regards, >Marco A.G.Pinto -- -- signature.asc Description: OpenPGP digital signature
Fwd: Proofing Tool GUI 3.0 - beta - build 104 - 2017-03-21 - Released
Hello! Just to inform everyone that I have released an update for Proofing Tool GUI: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html The main changes are: - GUI improvements and fixes; - It is now possible to have custom "AFF Aid" files with 16x16 PNGs flags; - Support for FLAG NUM and LONG (no recursivity yet); - Modern menus look for Windows; - Pop-up menu now has an extra option "Clone" (very useful); - Major speed gain in the .AFF optimising code (gl_ES). When I have more time I need to add an "undo/redo" feature (Shantanu suggested it) and optimise the .AFF decoding words as I came up with a brilliant idea a day or two ago (just like I had for the gl_ES). Thanks! Kind regards, >Marco A.G.Pinto -- -- -- signature.asc Description: OpenPGP digital signature
Proofing Tool GUI 3.0 - beta - build 100 - 2017-01-01 Released
Hello! Just to announce that I have updated PTG for Windows and Linux. It can be downloaded from its official site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html The latest build has the GUI improved a lot and also accepts items multiselection and to choose the window size: 1024x600 or 1280x600. In the next update I will try to implement GTK3 support (Linux) and allow undo/redo like Shantanu suggested. Also, notice the extra tabs which are still not working but they mean that in a future version it will be possible to edit/create rules for LanguageTool and have language specific operations (text processing). Does anyone know where I can get an hyphenator so that I see how it works and can code its tab in PTG? This tab has been blank for years. PS->Notice that I added some persons in Bcc in this e-mail. Thanks! Kind regards from the English Dictionaries maintainer, >Marco A.G.Pinto -- -- signature.asc Description: OpenPGP digital signature
Proofing Tool GUI 3.0 (beta) - build 96 - 2016-03-23
Hello! I have just released a bugfix of PTG. The .AFF file would get garbage at the end: What is strange is that I only noticed it yesterday. It seems that PureBasic for a couple of versions or so required a special "flag" in the command I was using to populate the EditorGadget after loading the .AFF file. Anyway, one just deletes the garbage and resaves. You can download PTG build 96 from: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Thanks! Kind regards, >Marco A.G.Pinto -- signature.asc Description: OpenPGP digital signature
Proofing Tool GUI 3.0 (beta) - build 95 - 2016-03-02
Hello! I have just released build 95, downloadable from: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html I fixed a crash in Windows (caused by PureBasic 5.41 - PB 5.42 has been released and fixed it), and also improved the Autocorrect code, also adding colours to it: So far I have added 200+ new words to the pt_PT autocorrect. Before the official release of AOO 4.2 I will upload the most recent file. Thanks! Kind regards, >Marco A.G.Pinto --- --
Proofing Tool GUI 3.0 - beta - build 94 - 2016-02-19 Released
Hello! *With Bcc to: l10n and QA.* Just to inform everyone that, as planned, I have released an update for my Hunspell tool: Proofing Tool GUI. It can be downloaded from: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html I had a previous build ready to ship but, while testing it in Ubuntu, there were some issues using CURS UP and CURS DOWN to scroll the thesaurus, which I seem to have fixed in build 94. Please give it a try. Thanks! Kind regards, >Marco A.G.Pinto --- --
Re: Proofing Tool GUI 3.0 - build 89 - 20160109 - Autocorrect working
On 09/01/2016 Marco A.G.Pinto wrote: I hereby offer myself to update the pt_PT since I have several grammar books with dozens/hundreds of examples of incorrect/correct words Last time I looked at it was several years ago, to fix the Italian autocorrect list. Everything is documented in https://bz.apache.org/ooo/show_bug.cgi?id=48897 but it might be outdated by now. Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Proofing Tool GUI 3.0 - build 89 - 20160109 - Autocorrect working
Hello, I have just released a new build of PTG, which now allows to edit the "Autocorrect words": Above is the en_GB autocorrect file. You can download PTG from the official site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Special instructions for working with autocorrect files: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html#build89 The GUI makes it a lot easier to edit because it avoids using the command line. I have checked the en_GB and pt_PT XMLs and they are from 2010, so I was wondering if someone could work on the English ones? I hereby offer myself to update the pt_PT since I have several grammar books with dozens/hundreds of examples of incorrect/correct words. How shall I do to make the pt_PT be updated both in AOO and LO? Thanks for your time! Kind regards, >Marco A.G.Pinto --
Re: Proofing Tool GUI 3.0 - Build 76 - It now saves .idx (Thesaurus) files
On 07/08/2015 12:13 AM, Marco A.G.Pinto wrote: > although I only tested it with the pt_PT Thesaurus... Alexandro Colorado has > said several times that I was a lazy ass and he was right :-( It is extremely difficult in advance, to know how much time a programming issue will take to solve. You obviously based the projected time on the worst case scenario, whilst Alexandro bases it on the best case scenario. jonathon signature.asc Description: OpenPGP digital signature
Proofing Tool GUI 3.0 - Build 76 - It now saves .idx (Thesaurus) files
Hello! Today I have released build 76. I have added .idx support, which means that while saving a Thesaurus (.dat) it also creates an .idx . I know I have been trying to do it for several months or even a year, but the php script people told me about was very hard to understand. I also lacked free time because of my two jobs and PhD. Today I received an e-mail of a person asking me about .idx support and I replied that in August (the university is closed so I will only have one job then) I would probably have free time. But, I decided to give it a try today and in around 10-15 minutes it was done although I only tested it with the pt_PT Thesaurus... Alexandro Colorado has said several times that I was a lazy ass and he was right :-( Please give it a try: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Thanks! PS->To the guys complaining that every month an update tip appears on the top right of AOO, that happens because I update the English dictionaries monthly (so, every month a new release comes out and everyone gets a notification in AOO). Kind regards from your friend, >Marco A.G.Pinto -- --
Proofing Tool GUI 3.0 - beta - build 76 - WIP
Hello! If you guys recall, I am improving en_GB with my own Hunspell tool. I have decided to dedicate some time in my vacations to add support for the three flags: 1) chr 2) num 3) long So far I have added the part that grabs the .AFF and optimises it (removes all spaces and creates an array with the position of each code). With numbers it takes 18 seconds on my T4300 laptop with the *gl_ES.aff* after I optimised the code to the limit... before optimising it, it would take near one minute. Since numbers have a variable size, that is what makes it slower, since I have to check when the number ends (last digit). I found dictionaries in my hdd with *2) num* and *3) long* which I had access to some time ago thanks to the LanguageTool team. I don't understand how recursivity works: *gl_ES.aff* SFX 520 er éren/666,134,135 er . is:infinitivo P6 + enclítico *nl.aff* SFX Yb 0 jes-/CaCbCp [^m] ts:NN2r These two examples shown above are from Mozilla dictionaries. Could one of you send me a small dictionary that I could test with Proofing Tool GUI and explain how I should make it work? Thanks! Kind regards, >Marco A.G.Pinto --- --
Proofing Tool GUI V3.0 - beta - build 72 - 2015-03-05
Hello everyone, Tonight I have updated PTG to build 72. As you probably know, this is an open-source Hunspell tool which I am using to work on the British dictionary. The last build I released was 70 and, since then, I have been able to optimise a lot the processing of wordlists. en_GB now takes almost half the time to process (extract, count, etc). This will also be visible on other dictionaries. I have also added the feature of exporting wordlists with just the hyphens, and that is why I decided to release this build today: Proofing Tool GUI 3.0 - beta - build 72 is available from its official site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Please notice that it only works with one code chr in the .AFFs (some more recent .AFFs may use other forms of encoding such as numbers and letters several characters long). I tried to improve this, but it is very hard to accomplish. Even more than that form of encoding also uses recursivity in the rules which is a mess without proper documentation. Thanks! Kind regards, >Marco A.G.Pinto -- --
Re: Proofing Tool GUI 2.3 released - 14.Feb.2014
Thanks, my dear friend Andrea! :-P :-P ;-) :-) On 14/02/2014 13:35, Andrea Pescetti wrote: On 14/02/2014 Marco A.G.Pinto wrote: Also, I want to implement the Hyphenation. Does anyone know how it works, so that I can implement it? Maybe there is documentation somewhere? The tool is called Hyphen and is part of Hunspell. So I think you should start from http://sourceforge.net/projects/hunspell/files/Hyphen/ The syntax of auxiliary files is very, very obscure! Most of the times the hyphenation patterns are not created, but they are converted from the ones available for TeX. The Italian package http://extensions.openoffice.org/project/dict-it should contain README files with links to the TeX sources. Regards, Andrea. --
Re: Proofing Tool GUI 2.3 released - 14.Feb.2014
On 14/02/2014 Marco A.G.Pinto wrote: Also, I want to implement the Hyphenation. Does anyone know how it works, so that I can implement it? Maybe there is documentation somewhere? The tool is called Hyphen and is part of Hunspell. So I think you should start from http://sourceforge.net/projects/hunspell/files/Hyphen/ The syntax of auxiliary files is very, very obscure! Most of the times the hyphenation patterns are not created, but they are converted from the ones available for TeX. The Italian package http://extensions.openoffice.org/project/dict-it should contain README files with links to the TeX sources. Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Proofing Tool GUI 2.3 released - 14.Feb.2014
Hello! I have released V2.3 of my tool [1]. My main goal in this version was to optimise the code to the limit. Unfortunately, only the "Advanced Find" became three times faster (x86 tested - x64 should be much faster) and I also fixed a bug there. All other improvements were to make it more user friendly. I had access to a dictionary with near 600 000 words and I was trying to optimise the code, but I noticed that what removes speed is the code that adds the words to the listboxgadget. I tried to load the 600'000 words into a dynamic array and it only took a couple of seconds. After some posts in the PureBasic forum I was advised about how I should proceed: I must only show a certain number of entries in the listboxgadget and manage the scroll bar myself. This implementation will take several weeks of coding or maybe months since I only have the idea and I still don't know how to code it. In V3.0 I expect to have this working and, if all goes well, it will work extremely fast in all OSes. Also, I want to implement the Hyphenation. Does anyone know how it works, so that I can implement it? Maybe there is documentation somewhere? [1]: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Thanks! Kind regards, >Marco A.G.Pinto --- --
Re: Proofing Tool GUI V2.0 - Released
Marco A.G.Pinto wrote: I have released V2.0 of my tool. I have edited the Dictionary Wiki page inserting the information and link. I had a look at the dictionary editing functionality and it works, nice! Maybe (for version 3, let's say...) it would be more intuitive to introduce an area where the tool shows all derived words, so that people know what to put after the "/". This would require to implement quite complex capabilities since the tool would need to understand .aff files, but it would help people in finding the right classification of words by providing real examples. Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Proofing Tool GUI V2.0 - Released
Hello! I have released V2.0 of my tool. I have edited the Dictionary Wiki page inserting the information and link. The date is 1-JUL-2013 but, since the tool has been ready already, I released it one day before (although in some parts of the world it is already Monday). The site for my tool is: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html I believe it now works with all Linuxes, even though there is a speed issue in Linux while opening the spellers and thesaurus. In Windows it works very well though. Thanks for your time! Kind regards, >Marco A.G.Pinto --- --
Re: Proofing Tool GUI V1.0 - Released
On 01/06/2013 Marco A.G.Pinto wrote: I have created and uploaded a Web page containing the manual, source and downloads for Proofing Tool GUI V1.0. ... Site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html Good! I still have the same issues as before (but I now receive a warning saying that ISO-8859-1 encoding is unsupported in your tool): - The file takes 39 minutes to open with your binary Linux version - The output is quite corrupted, see http://imagebin.org/260609 You already explained that you cannot reproduce the problems so I'll stop here, considering that my file is using an unsupported charset. This mailing list is not the place for discussing bugs in your tool: if you make an issue queue available on GitHub, SourceForge, Google Code or your own server I might consider posting bugs there. If you like the tool, maybe you could place it in the official AOO page of tools. What page are you referring to? If it's on a wiki, feel free to update it yourself; if it is somewhere else, please give the precise URL of the page to be modified and I'll list your tool there: even though I prefer manual editing, I'm sure that several community members will appreciate that a tool like yours exists. Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Proofing Tool GUI V1.0 - Released
Hello! Thanks for your feedback. On V1.1 I must enhance some things. PS->I haven't coded the Edit menu yet because I didn't go deep into the editing strings commands. But that can be done with the normal Windows keys: CTR+C/X and CTR+V. Kind regards, >Marco A.G.Pinto --- On 02/06/2013 15:07, Guy Waterval wrote: I have tried it and it's really easy to use. Just 2 remarks . - The edition commands of the menu Edit are not functional (Windows XP) - The command Find allows only a search in the "Synonyms" list and not in the "Meanings". Perhaps a command to extend to the "Meanings" (only my personal opinion). I don't know if it could be useful, but there's also another interesting extension in the languages area : Anaphraseus, which allows to translate directly in a Writer document, with creation of TM and glossary. Note that its author has to actualize it for AOO 4.0. But on the AOO 3.4.1, it's running fine. Many thanks for your work and a lot of success for your future developpements. A+ --
Re: Proofing Tool GUI V1.0 - Released
Hello Marco, 2013/6/1 Marco A.G.Pinto > Hello! > > I have created and uploaded a Web page containing the manual, source and > downloads for Proofing Tool GUI V1.0. > > The date there is 2-JUN-2013 but I have everything ready and, in some > parts of the world, it is already that date. > > Site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html > > This tool will allow to edit the Thesaurus of OpenOffice and LibreOffice. > > Later I want it to edit also the Dictionary and Hyphenation to make it > compatible also with Firefox and Thunderbird, since their EN-UK dictionary > is bad and no one volunteered to make a better version. > > If you like the tool, maybe you could place it in the official AOO page of > tools. > I have tried it and it's really easy to use. Just 2 remarks . - The edition commands of the menu Edit are not functional (Windows XP) - The command Find allows only a search in the "Synonyms" list and not in the "Meanings". Perhaps a command to extend to the "Meanings" (only my personal opinion). I don't know if it could be useful, but there's also another interesting extension in the languages area : Anaphraseus, which allows to translate directly in a Writer document, with creation of TM and glossary. Note that its author has to actualize it for AOO 4.0. But on the AOO 3.4.1, it's running fine. Many thanks for your work and a lot of success for your future developpements. A+ -- gw > >
Proofing Tool GUI V1.0 - Released
Hello! I have created and uploaded a Web page containing the manual, source and downloads for Proofing Tool GUI V1.0. The date there is 2-JUN-2013 but I have everything ready and, in some parts of the world, it is already that date. Site: http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html This tool will allow to edit the Thesaurus of OpenOffice and LibreOffice. Later I want it to edit also the Dictionary and Hyphenation to make it compatible also with Firefox and Thunderbird, since their EN-UK dictionary is bad and no one volunteered to make a better version. If you like the tool, maybe you could place it in the official AOO page of tools. Thanks! Kind regards, >Marco A.G.Pinto --- --
Re: Proofing Tool GUI V1.0 - alpha 1- build 7 - 23.May.2013
On 23/05/2013 Marco A.G.Pinto wrote: Today I have spent several hours improving my tool and decided to release an alpha version just for people to feel how it works. I've managed to open the Italian thesaurus from http://extensions.openoffice.org/en/project/dict-it and, while the import mostly worked: - something went wrong on some words (if you try, look around "calamita") - character encoding is not respected (this might also be a font problem) - import is really slow, it took extremely long (30+ minutes) to open the thesaurus so I couldn't actually try editing entries. On later releases I want to edit the speller and hyphenation. I'm not sure that this is a good approach to edit the spell checker entries. At least, make sure you properly use the .aff rules (well, I've already sent you the relevant links for this and other). Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Proofing Tool GUI V1.0 - alpha 1 - build 3
On 11/05/2013 Marco A.G.Pinto wrote: Here is another screenshot of the current status of the tool: http://i.imgur.com/Ir0Dloi.png Good. It looks like this could be an interesting tool for the l10n volunteers who don't have a lot of technical knowledge. I would like to ask if the copyright I placed at the bottom of the window is the correct one: I want the tool to be freely copied and compiled on other platforms such as Linux, Mac, Amiga. But, I don't want people to change the code and then release the tool as if were them who coded it. Many free software licenses will cover this basic requirement, and you can find plenty of information online about the differences among free software licenses. Since your tool is meant to be a companion to Apache OpenOffice, you might consider distributing it under the Apache License, version 2.0. See http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN for a brief explanation (note: when it says that it requires to "provide clear attribution to The Apache Software Foundation", in your case it means "provide clear attribution to Marco A.G.Pinto", of course), and see http://www.apache.org/licenses/LICENSE-2.0.html for the full license text. I have made tests opening the English .DAT file with 145866 synonyms and it took the following time: - Windows EXE (x86): 1:47 min - Windows EXE (x64): 1:36 min (less 11 seconds) This is still quite slow compared to OpenOffice. OpenOffice opens a thesaurus (at least for lookup) almost instantly. Maybe you can take advantage of the OpenOffice code for hints on how to speed up things, there is some parsing code in MyThes within the OpenOffice external components: http://opengrok.adfinis-sygroup.org/source/xref/aoo-trunk/ext_sources/067201ea8b126597670b5eff72e1f66c-mythes-1.2.0.tar.gz.dir/mythes-1.2.0/ After I have a more stable version I would like it to be uploaded to the community for people to feel the touch. How can it be done? For sure we will want to involve the Localization list and tell them that the tool is available: http://openoffice.apache.org/mailing-lists.html#localization-mailing-list-public Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Proofing Tool GUI V1.0 - alpha 1 - build 3
Hello! Here is another screenshot of the current status of the tool: http://i.imgur.com/Ir0Dloi.png I am progressing slowly as I am involved in several other things. I would like to ask if the copyright I placed at the bottom of the window is the correct one: I want the tool to be freely copied and compiled on other platforms such as Linux, Mac, Amiga. But, I don't want people to change the code and then release the tool as if were them who coded it. I have made tests opening the English .DAT file with 145866 synonyms and it took the following time: - Windows EXE (x86): 1:47 min - Windows EXE (x64): 1:36 min (less 11 seconds) After I have a more stable version I would like it to be uploaded to the community for people to feel the touch. How can it be done? Thanks! Kind regards, >Marco A.G.Pinto --
Re: Proofing Tool GUI
Rob, The PT-pt extension: oo3x-pt-PT-preao-13.3.31.1.oxt is only 544 kB large. Even the definitions updates for Microsoft Windows Defender for Windows are larger, most of the time twice the size of the PT-pt OXT and they are updated more than once daily. Kind regards, >Marco A.G.Pinto -- On 07/05/2013 00:23, Rob Weir wrote: This makes me wonder... Does it still make sense, in the year 2013, for updates to dictionaries and thesauruses to require a download and install of a large file. Is there a way to do this incrementally, even live, based on a feed (RSS or Atom)? So I could have AOO "subscribe" to a dictionary and receive new words as they become popular. Maybe there can even be the ability to have a custom subscription that is used only within a company, to publish special words used there, technical, product names, etc. You could even have a menu option as part of spell checking "Add to shared dictionary...". -Rob --
Re: Proofing Tool GUI
On Sun, May 5, 2013 at 4:45 PM, Marco A.G.Pinto < marcoagpi...@mail.telepac.pt> wrote: > Hello my dear ones, > > A couple of days ago I was on IRC in #dev.openoffice.org chatting with > JZA. > > I came up with the idea of creating a GUI to edit the thesaurus of AOO. > > JZA told me the files were in TXT format and gave me a URL with several > information but I gave a quick look and didn't find anything about the data > dictionary of the thesaurus. > > The tool will be called "Proofing Tool GUI" and will be coded in > PureBasic. Is this a good name? PureBasic allows to compile in > Windows/Linux/Mac/Amiga. > > The reason why I want to code it is because months ago I contacted my > friends at Minho University in Portugal who are in charge of PT-pt and I > wanted to send them words to be used as synonymous but they didn't know how > to add them. > > This makes me wonder... Does it still make sense, in the year 2013, for updates to dictionaries and thesauruses to require a download and install of a large file. Is there a way to do this incrementally, even live, based on a feed (RSS or Atom)? So I could have AOO "subscribe" to a dictionary and receive new words as they become popular. Maybe there can even be the ability to have a custom subscription that is used only within a company, to publish special words used there, technical, product names, etc. You could even have a menu option as part of spell checking "Add to shared dictionary...". -Rob > This made me think that there isn't a tool for doing that, so my idea is > good because it can be used by the whole community of developers. > > I unziped the Portuguese .OXT and grabbed the files: > - th_pt_PT.idx > - th_pt_PT.dat > > I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but > didn't understand completely how they work. > > For example, in the *.idx* one I had: > UTF-8 > 12940 > 1|6 > a cerca de|16097 > a começar de|19986 > a favor|32934 > a partir de|67469 > a respeito de|77248 >... etc... > > > in the *.dat* one I had: > UTF-8 > 1|3 > -|anuviado > -|aperitivo > -|sigla > ababelado|1 > -|atrapalhado|baralhado|atarantado|desnorteado > ababelar|1 > -|baralhar|atrapalhar > abaçanado|1 >... etc... > > It seems there are at least three levels of synonymous in the *.dat* one > but I don't know how to interpret them if I create a GUI. > > Also, in the *.idx* one there are numbers too which I don't understand > the meaning. > > Is there a URL which explains every detail of those files? > > Thanks! > > Kind regards from, > >Marco A.G.Pinto >--- > > > > -- >
Proofing Tool GUI V1.0 - alpha 1
Hello! This is just what I have coded of the tool: http://i.imgur.com/NwvkLhn.png I have more stuff drawn on paper but I am still learning how to code the GUI in PureBasic. The executable is only around 40 kB big and PureBasic runs on Windows/Linux/Mac/Amiga. Ahhh... just telling the news! Kind regards, >Marco A.G.Pinto --- --
Re: Proofing Tool GUI
Hello, You can see additional information on slides for Linguistic Tools in OpenOffice.org : OpenOffice.org Conference 2005 http://danielnaber.de/publications/ooocon2005-lingucomponent.pdf >Hello my dear ones, > >A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA. > >I came up with the idea of creating a GUI to edit the thesaurus of AOO. > >JZA told me the files were in TXT format and gave me a URL with several >information but I gave a quick look and didn't find anything about the data >dictionary of the thesaurus. > >The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is >this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga. > >The reason why I want to code it is because months ago I contacted my friends >at Minho University in Portugal who are in charge of PT-pt and I wanted to >send them words to be used as synonymous but they didn't know how to add them. > >This made me think that there isn't a tool for doing that, so my idea is good >because it can be used by the whole community of developers. > >I unziped the Portuguese .OXT and grabbed the files: >- th_pt_PT.idx >- th_pt_PT.dat > >I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but >didn't understand completely how they work. > >For example, in the .idx one I had: >UTF-8 >12940 >1|6 >a cerca de|16097 >a começar de|19986 >a favor|32934 >a partir de|67469 >a respeito de|77248 > ... etc... > > >in the .dat one I had: >UTF-8 >1|3 >-|anuviado >-|aperitivo >-|sigla >ababelado|1 >-|atrapalhado|baralhado|atarantado|desnorteado >ababelar|1 >-|baralhar|atrapalhar >abaçanado|1 > ... etc... > >It seems there are at least three levels of synonymous in the .dat one but I >don't know how to interpret them if I create a GUI. > >Also, in the .idx one there are numbers too which I don't understand the >meaning. > >Is there a URL which explains every detail of those files? -- Yakov Reztsov
Re: Proofing Tool GUI
Marco A.G.Pinto wrote: I came up with the idea of creating a GUI to edit the thesaurus of AOO. JZA told me the files were in TXT format and gave me a URL with several information but I gave a quick look and didn't find anything about the data dictionary of the thesaurus. My FOSDEM presentation may have a little bit more, at least in terms of examples and links (make sure you check the existing web interfaces available, like the one mentioned by RGB: read through the whole presentation). https://fosdem.org/2013/schedule/event/apache_openoffice_dictionaries/ If you need more, ask here and I will be able to provide some guidance. The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga. We may disagree about the language choice (we have already a dozen languages in use in the OpenOffice sources!) but for the moment let's discuss the overall functionality. Then you may code the tool as part of OpenOffice (much harder; but it isn't totally unreasonable to have built-in dictionary editing capabilities!) or as an independent tool, or whatever. I unziped the Portuguese .OXT and grabbed the files: - th_pt_PT.idx - th_pt_PT.dat As far as your tool is concerned, you may disregard the .idx file. It is automatically generated from the .dat file. (To do so, you use th_gen_idx.pl from Kevin B. Hendricks' MyThes; if you need further information just ask). in the *.dat* one I had: UTF-8 1|3 -|anuviado -|aperitivo -|sigla ababelado|1 -|atrapalhado|baralhado|atarantado|desnorteado ababelar|1 -|baralhar|atrapalhar abaçanado|1 ... etc... Slide 20 in my presentation has a more understandable example. Anyway, your file means: - The word (!) "1" has "3" possible meanings - The word "ababelado" has "1" possible meaning: - This meaning has no description ("-") - Synonyms, in this meaning, are: atrapalhado, baralhado, atarantado, desnorteado - And so on. Is there a URL which explains every detail of those files? You should already have quite a few pointers by now... If you collect information, feel free to create a wiki page on http://wiki.openoffice.org/ to consolidate knowledge there! Regards, Andrea. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Proofing Tool GUI
2013/5/5 RGB ES > 2013/5/5 Marco A.G.Pinto > >> Hello my dear ones, >> >> A couple of days ago I was on IRC in #dev.openoffice.org chatting with >> JZA. >> >> I came up with the idea of creating a GUI to edit the thesaurus of AOO. >> >> JZA told me the files were in TXT format and gave me a URL with several >> information but I gave a quick look and didn't find anything about the data >> dictionary of the thesaurus. >> >> The tool will be called "Proofing Tool GUI" and will be coded in >> PureBasic. Is this a good name? PureBasic allows to compile in >> Windows/Linux/Mac/Amiga. >> >> The reason why I want to code it is because months ago I contacted my >> friends at Minho University in Portugal who are in charge of PT-pt and I >> wanted to send them words to be used as synonymous but they didn't know how >> to add them. >> >> This made me think that there isn't a tool for doing that, so my idea is >> good because it can be used by the whole community of developers. >> >> I unziped the Portuguese .OXT and grabbed the files: >> - th_pt_PT.idx >> - th_pt_PT.dat >> >> I opened them with Microsoft Expression Web 4 to keep the UTF-8 format >> but didn't understand completely how they work. >> >> For example, in the *.idx* one I had: >> UTF-8 >> 12940 >> 1|6 >> a cerca de|16097 >> a começar de|19986 >> a favor|32934 >> a partir de|67469 >> a respeito de|77248 >>... etc... >> >> >> in the *.dat* one I had: >> UTF-8 >> 1|3 >> -|anuviado >> -|aperitivo >> -|sigla >> ababelado|1 >> -|atrapalhado|baralhado|atarantado|desnorteado >> ababelar|1 >> -|baralhar|atrapalhar >> abaçanado|1 >>... etc... >> >> It seems there are at least three levels of synonymous in the *.dat* one >> but I don't know how to interpret them if I create a GUI. >> >> Also, in the *.idx* one there are numbers too which I don't understand >> the meaning. >> >> Is there a URL which explains every detail of those files? >> >> Thanks! >> >> Kind regards from, >> >Marco A.G.Pinto >>--- >> > > > AFAIK, most AOO thesaurus are based on OpenThesaurus > > http://sourceforge.net/projects/openthesaurus/ > The right URL is https://github.com/danielnaber/openthesaurus > > > which is already a working web interface to add words to a thesaurus > database that can be exported to several formats, included the one used by > AOO. > > There are localized projects that use openthes like > > http://openthesaurus.caixamagica.pt/ > http://openthes-es.berlios.de/ > http://synonimy.sourceforge.net/ > http://www.openthesaurus.de/ > http://www.openthesaurus.tk > http://synonymer.merg.net/ > > The PT site seems quite old, but maybe you can find some tips there. > > There is an old article from Bruce Byfield here > > http://archive09.linux.com/articles/51675?tid=93 > > The problem with thesaurus and dictionaries in general is that they are > far more than a simple list of words: you need to tell the system the > possible variants, if it is a noun, a verb, if it's a real synonymous or > just a similar word... > > Regards > Ricardo > > > >> >> >> >> >> -- >> > >
Re: Proofing Tool GUI
2013/5/5 Marco A.G.Pinto > Hello my dear ones, > > A couple of days ago I was on IRC in #dev.openoffice.org chatting with > JZA. > > I came up with the idea of creating a GUI to edit the thesaurus of AOO. > > JZA told me the files were in TXT format and gave me a URL with several > information but I gave a quick look and didn't find anything about the data > dictionary of the thesaurus. > > The tool will be called "Proofing Tool GUI" and will be coded in > PureBasic. Is this a good name? PureBasic allows to compile in > Windows/Linux/Mac/Amiga. > > The reason why I want to code it is because months ago I contacted my > friends at Minho University in Portugal who are in charge of PT-pt and I > wanted to send them words to be used as synonymous but they didn't know how > to add them. > > This made me think that there isn't a tool for doing that, so my idea is > good because it can be used by the whole community of developers. > > I unziped the Portuguese .OXT and grabbed the files: > - th_pt_PT.idx > - th_pt_PT.dat > > I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but > didn't understand completely how they work. > > For example, in the *.idx* one I had: > UTF-8 > 12940 > 1|6 > a cerca de|16097 > a começar de|19986 > a favor|32934 > a partir de|67469 > a respeito de|77248 >... etc... > > > in the *.dat* one I had: > UTF-8 > 1|3 > -|anuviado > -|aperitivo > -|sigla > ababelado|1 > -|atrapalhado|baralhado|atarantado|desnorteado > ababelar|1 > -|baralhar|atrapalhar > abaçanado|1 >... etc... > > It seems there are at least three levels of synonymous in the *.dat* one > but I don't know how to interpret them if I create a GUI. > > Also, in the *.idx* one there are numbers too which I don't understand > the meaning. > > Is there a URL which explains every detail of those files? > > Thanks! > > Kind regards from, > >Marco A.G.Pinto >--- > AFAIK, most AOO thesaurus are based on OpenThesaurus http://sourceforge.net/projects/openthesaurus/ which is already a working web interface to add words to a thesaurus database that can be exported to several formats, included the one used by AOO. There are localized projects that use openthes like http://openthesaurus.caixamagica.pt/ http://openthes-es.berlios.de/ http://synonimy.sourceforge.net/ http://www.openthesaurus.de/ http://www.openthesaurus.tk http://synonymer.merg.net/ The PT site seems quite old, but maybe you can find some tips there. There is an old article from Bruce Byfield here http://archive09.linux.com/articles/51675?tid=93 The problem with thesaurus and dictionaries in general is that they are far more than a simple list of words: you need to tell the system the possible variants, if it is a noun, a verb, if it's a real synonymous or just a similar word... Regards Ricardo > > > > > -- >
Proofing Tool GUI
Hello my dear ones, A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA. I came up with the idea of creating a GUI to edit the thesaurus of AOO. JZA told me the files were in TXT format and gave me a URL with several information but I gave a quick look and didn't find anything about the data dictionary of the thesaurus. The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga. The reason why I want to code it is because months ago I contacted my friends at Minho University in Portugal who are in charge of PT-pt and I wanted to send them words to be used as synonymous but they didn't know how to add them. This made me think that there isn't a tool for doing that, so my idea is good because it can be used by the whole community of developers. I unziped the Portuguese .OXT and grabbed the files: - th_pt_PT.idx - th_pt_PT.dat I opened them with Microsoft _expression_ Web 4 to keep the UTF-8 format but didn't understand completely how they work. For example, in the .idx one I had: UTF-8 12940 1|6 a cerca de|16097 a começar de|19986 a favor|32934 a partir de|67469 a respeito de|77248 ... etc... in the .dat one I had: UTF-8 1|3 -|anuviado -|aperitivo -|sigla ababelado|1 -|atrapalhado|baralhado|atarantado|desnorteado ababelar|1 -|baralhar|atrapalhar abaçanado|1 ... etc... It seems there are at least three levels of synonymous in the .dat one but I don't know how to interpret them if I create a GUI. Also, in the .idx one there are numbers too which I don't understand the meaning. Is there a URL which explains every detail of those files? Thanks! Kind regards from, >Marco A.G.Pinto --- --