Re: [Grammar checker] Undocumented change in the API for LO 4
Hi all, what's the supposed regression, exactly? Do we have only sentences as segmented by LO? This would be a serious drawback as ICU methods are less than perfect, and our results are much more reliable (the BreakIterator simply uses a static list of abbreviations which is a vast simplification that cannot really capture a lot of ambiguous dots, so it's broken by design). Best, Marcin On Mon, Mar 4, 2013 at 9:58 PM, Németh László nem...@numbertext.org wrote: Hi, If I right know, that was an intended change from the original author, Thomas Lange, supported by the contributors, eg. Marcin Miłkowski and Daniel Naber, for the real needs, better sentence boundary disambiguation and grammar checking by LanguageTool and other grammar checker components. So the recent state is a drawback. I suggest to revert it (maybe it would be fine to add some comments to the ProofreadingResult.idl to prevent from similar changes, too). Best regards, László 2013/3/4 Olivier R. olivier.nore...@gmail.com: Caolán McNamara wrote do you get the pre LO 4 behaviour ? Probably. With LO 3, in doProofreading: - nStartOfSentencePos was always the beginning of the paragraph (=0) - nSuggestedSentenceEndPos was always the end of the paragraph (=length of rText) And each paragraph was passed once to the GC. Assuming that you do, then it appears to me that the current LO4 behaviour is the original programmer intent and that the intermediate behaviour was a bug (from the programmer intent perspective anyway) in whatever versions got released between 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4 Yes, we can assume that was the original programmer intent. But it worked another way for 3 years and nobody complained about it. :) I prefer the unintended behavior, as LO does not assume wrongly what is the end of sentences. So what LO will do? Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4041580.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
I have zero insight into that area of the code, but from what I gather: GrammarCheckingIterator::GetSuggestedEndOfSentence(rText, ...) -- where rText apparently is one single paragraph -- used to be convoluted code that always returns rText.getLength() for the last few years, whether that change was intentional or not. (From http://cgit.freedesktop.org/libreoffice/core/commit/?id=9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 #i103496#: split svtools; improve ConfitItems it would look odd if it were really intentional -- why not clean the function up to a single line then? but who knows.) From what I understand of linguistic/source/gciterator.cxx, the two calls to n = GrammarCheckingIterator::GetSuggestedEndOfSentence are in two loops that each: use that n as nSuggestedBehindEndOfSentencePosition argument to a css.linguistic2.XProofreader.doProofreading call, and then determine whether to do further iterations of the loop based on the returned css.linguistic2.ProofreadingResult, esp. its nBehindEndOfSentencePosition. Now, it beats me why anybody designed css.linguistic2.ProofreadingResult that way, to contain all the data already passed into css.linguistic2.XProofreader.doProofreading anyway. But could it be that clients that observe that [with] LibreOffice 4, each paragraph of a text is passed several times to [doProofreading] fail to set nBehindEndOfSentencePosition in the css.linguistic2.ProofreadingResult they return, to properly reflect their idea of how much they have already consumed? Stephan On 03/05/2013 11:12 AM, Marcin Milkowski wrote: what's the supposed regression, exactly? Do we have only sentences as segmented by LO? This would be a serious drawback as ICU methods are less than perfect, and our results are much more reliable (the BreakIterator simply uses a static list of abbreviations which is a vast simplification that cannot really capture a lot of ambiguous dots, so it's broken by design). Best, Marcin On Mon, Mar 4, 2013 at 9:58 PM, Németh László nem...@numbertext.org mailto:nem...@numbertext.org wrote: Hi, If I right know, that was an intended change from the original author, Thomas Lange, supported by the contributors, eg. Marcin Miłkowski and Daniel Naber, for the real needs, better sentence boundary disambiguation and grammar checking by LanguageTool and other grammar checker components. So the recent state is a drawback. I suggest to revert it (maybe it would be fine to add some comments to the ProofreadingResult.idl to prevent from similar changes, too). Best regards, László 2013/3/4 Olivier R. olivier.nore...@gmail.com mailto:olivier.nore...@gmail.com: Caolán McNamara wrote do you get the pre LO 4 behaviour ? Probably. With LO 3, in doProofreading: - nStartOfSentencePos was always the beginning of the paragraph (=0) - nSuggestedSentenceEndPos was always the end of the paragraph (=length of rText) And each paragraph was passed once to the GC. Assuming that you do, then it appears to me that the current LO4 behaviour is the original programmer intent and that the intermediate behaviour was a bug (from the programmer intent perspective anyway) in whatever versions got released between 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4 Yes, we can assume that was the original programmer intent. But it worked another way for 3 years and nobody complained about it. :) I prefer the unintended behavior, as LO does not assume wrongly what is the end of sentences. So what LO will do? ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
On Sat, 2013-03-02 at 04:09 -0800, Olivier R. wrote: I bibisected from scratch again, on Linux Mint. I got no crash at all. Looking at the log, (but without testing anything myself) the likely relevant changes are... c0f865c9b5a34b272c9e0b22d18969554265914a on linguistic/source/gciterator.cxx where noelg noticed that GrammarCheckingIterator::GetSuggestedEndOfSentence set xBreakIterator but used m_xBreakIterator so that m_xBreakIterator was always unset. The follow up commit of a33dbc169037d985f104c83d01d5efd9982413de by Stephen then fixed the code to use the correct m_xBreakIterator like it used to do back before 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 accidentally broke it initially. So, to test that theory if you change GrammarCheckingIterator::GetSuggestedEndOfSentence to be just... sal_Int32 GrammarCheckingIterator::GetSuggestedEndOfSentence( const OUString rText, sal_Int32 /*nSentenceStartPos*/, const lang::Locale /*rLocale*/ ) { return rText.getLength(); } do you get the pre LO 4 behaviour ? Assuming that you do, then it appears to me that the current LO4 behaviour is the original programmer intent and that the intermediate behaviour was a bug (from the programmer intent perspective anyway) in whatever versions got released between 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4 C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
Caolán McNamara wrote do you get the pre LO 4 behaviour ? Probably. With LO 3, in doProofreading: - nStartOfSentencePos was always the beginning of the paragraph (=0) - nSuggestedSentenceEndPos was always the end of the paragraph (=length of rText) And each paragraph was passed once to the GC. Assuming that you do, then it appears to me that the current LO4 behaviour is the original programmer intent and that the intermediate behaviour was a bug (from the programmer intent perspective anyway) in whatever versions got released between 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4 Yes, we can assume that was the original programmer intent. But it worked another way for 3 years and nobody complained about it. :) I prefer the unintended behavior, as LO does not assume wrongly what is the end of sentences. So what LO will do? Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4041580.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
Hi, If I right know, that was an intended change from the original author, Thomas Lange, supported by the contributors, eg. Marcin Miłkowski and Daniel Naber, for the real needs, better sentence boundary disambiguation and grammar checking by LanguageTool and other grammar checker components. So the recent state is a drawback. I suggest to revert it (maybe it would be fine to add some comments to the ProofreadingResult.idl to prevent from similar changes, too). Best regards, László 2013/3/4 Olivier R. olivier.nore...@gmail.com: Caolán McNamara wrote do you get the pre LO 4 behaviour ? Probably. With LO 3, in doProofreading: - nStartOfSentencePos was always the beginning of the paragraph (=0) - nSuggestedSentenceEndPos was always the end of the paragraph (=length of rText) And each paragraph was passed once to the GC. Assuming that you do, then it appears to me that the current LO4 behaviour is the original programmer intent and that the intermediate behaviour was a bug (from the programmer intent perspective anyway) in whatever versions got released between 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4 Yes, we can assume that was the original programmer intent. But it worked another way for 3 years and nobody complained about it. :) I prefer the unintended behavior, as LO does not assume wrongly what is the end of sentences. So what LO will do? Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4041580.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
I bibisected from scratch again, on Linux Mint. I got no crash at all. Result: 20bcb05266724b495f909e6bdc5b253b15515e7d is the first bad commit commit 20bcb05266724b495f909e6bdc5b253b15515e7d Author: Bjoern Michaelsen bjoern.michael...@canonical.com Date: Mon Dec 10 23:37:31 2012 + source-hash-f260c656da4457c5d87c161bdd43ad3023d07472 commit f260c656da4457c5d87c161bdd43ad3023d07472 Author: Michael Stahl mst...@redhat.com AuthorDate: Wed Oct 24 21:29:00 2012 +0200 Commit: Michael Stahl mst...@redhat.com CommitDate: Wed Oct 24 21:58:26 2012 +0200 vcl: uipreviewer: remove spurious return false Change-Id: Ie853d55f78f27a7249b8e960587c8d2bd833d1a7 :100644 100644 dd56205ffd7af4544acf767d928e0fb7c362b83d 7f44424346f71b2444850a3667fbcb6cf1ad374a M autogen.log :100644 100644 67df361b388bd5414a4ec178afbdeaadeb28ab12 c243de0266086f2cf364d025ede83dcb5f53fcbe M ccache.log :100644 100644 0cc69bf7010e61309aee815ea14e91c076f2c2a9 3d48a17ff896d1e27f4e0cd78c606bd6167043f6 M commitmsg :100644 100644 be0612023ec5f0a77e6afc9b23d0e3c96c66c4f4 141809713e042ff39b59de44fe7b570744c7279f M dev-install.log :100644 100644 ce6dd2ba02cd975b4350a4c14937a2101222a737 8b188733db3a9cf1f5810fac128a7a17acf5bf43 M make.log :04 04 ae9ac7f6e1ba77f1a9b4d8dcb3c04a4304a1c666 5a6c6e9d2d90c720a0b62142f2ce89bc713b42d7 M opt git bisect log: # bad: [5b4b36d87517a6ea96ff8c84c46b12f462fc9a1a] source-hash-8450a99c744e9005f19173e4df35d65640bcf5c4 # good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932 git bisect start 'latest' 'oldest' # good: [16b0b88cbd4ef0f51816e97277e40c5cf78f7bf9] source-hash-099198a4224778fe6e43f5dc13b5b9b1b4dc828c git bisect good 16b0b88cbd4ef0f51816e97277e40c5cf78f7bf9 # good: [f28b8f9a6c47fa59bf98fffe937a2f2db7a2445a] source-hash-a581d31b227623e09d2970a91214fda398f98eda git bisect good f28b8f9a6c47fa59bf98fffe937a2f2db7a2445a # good: [114fd3b76bcba890e6d702d00cef910f1493c262] source-hash-64ab96cd15e52da88781e720d6f031dbcd0ba902 git bisect good 114fd3b76bcba890e6d702d00cef910f1493c262 # bad: [47498a36f7af8f54e6e3dda89cd4708802a409e6] source-hash-19f4ebd8a54da0ae03b9cc8481613e5cd20ee1e7 git bisect bad 47498a36f7af8f54e6e3dda89cd4708802a409e6 # bad: [f4e2d84db194943180f3e7ed4adce5f8e377d9bc] source-hash-806d18ae7b8c241fe90e49d3d370306769c50a10 git bisect bad f4e2d84db194943180f3e7ed4adce5f8e377d9bc # bad: [20bcb05266724b495f909e6bdc5b253b15515e7d] source-hash-f260c656da4457c5d87c161bdd43ad3023d07472 git bisect bad 20bcb05266724b495f909e6bdc5b253b15515e7d # good: [ae1c4bb3a840728755f4390429863629953961dd] source-hash-b0da54bec69f4931af0adbc15d230d3f4eea7b08 git bisect good ae1c4bb3a840728755f4390429863629953961dd # good: [ea3a78c17e9a3cb150206f82dd2f615f678731aa] source-hash-bec62421a45da89d2812bdff30fbbab73291cf91 git bisect good ea3a78c17e9a3cb150206f82dd2f615f678731aa Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4041086.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
On Thursday 28 of February 2013, Olivier R. wrote: Stephan Bergmann-2 wrote How exactly do you launch LO? If by calling install/program/soffice.bin instead of install/program/soffice then the behavior is expected (soffice starts and controls soffice.bin, which needs to re-start on first start after any upgrade to cater for potential changes in bundled extensions etc.). I followed the instructions here: https://wiki.documentfoundation.org/QA/HowToBibisect I have edited the page to not recommend running the binary without the wrapper. -- Lubos Lunak l.lu...@suse.cz ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
On 02/27/2013 06:58 PM, Olivier R. wrote: But I have to say that I am not really sure this result can be trusted. For each times I bibisect, LO didn't launch at the first try, only on the second try. How exactly do you launch LO? If by calling install/program/soffice.bin instead of install/program/soffice then the behavior is expected (soffice starts and controls soffice.bin, which needs to re-start on first start after any upgrade to cater for potential changes in bundled extensions etc.). Stephan ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
Stephan Bergmann-2 wrote How exactly do you launch LO? If by calling install/program/soffice.bin instead of install/program/soffice then the behavior is expected (soffice starts and controls soffice.bin, which needs to re-start on first start after any upgrade to cater for potential changes in bundled extensions etc.). I followed the instructions here: https://wiki.documentfoundation.org/QA/HowToBibisect I typed: ./opt/program/soffice.bin in the folder bibisect40 At first launch, I get this message: Fontconfig warning: /home/olivier/Téléchargements/bibisect40/opt/share/fonts/truetype/fc_local.conf, line 13: Having multiple family in alias isn't supported and may not works as expected The splash screen appears 1 second then disappears. At the second launch, I get: Fontconfig warning: /home/olivier/Téléchargements/bibisect40/opt/share/fonts/truetype/fc_local.conf, line 13: Having multiple family in alias isn't supported and may not works as expected /home/olivier/Téléchargements/bibisect40/opt/program/pythonloader.py:60: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if 1 == os.access( encfile( path ), os.F_OK) and not path in sys.path: And LO starts. Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4040599.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
Caolán McNamara wrote Can you supply a simple to run test-kit that could be used by someone else to bibisect the change ? An easy to test/run macro/script/hack that works in 3.6 and fails in 4.0. C. I bibisected myself, then I got this result: 266724b495f909e6bdc5b253b15515e7d is the first bad commit commit 20bcb05266724b495f909e6bdc5b253b15515e7d Author: Bjoern Michaelsen bjoern.michael...@canonical.com Date: Mon Dec 10 23:37:31 2012 + source-hash-f260c656da4457c5d87c161bdd43ad3023d07472 commit f260c656da4457c5d87c161bdd43ad3023d07472 Author: Michael Stahl mst...@redhat.com AuthorDate: Wed Oct 24 21:29:00 2012 +0200 Commit: Michael Stahl mst...@redhat.com CommitDate: Wed Oct 24 21:58:26 2012 +0200 vcl: uipreviewer: remove spurious return false Change-Id: Ie853d55f78f27a7249b8e960587c8d2bd833d1a7 :100644 100644 dd56205ffd7af4544acf767d928e0fb7c362b83d 7f44424346f71b2444850a3667fbcb6cf1ad374a M autogen.log :100644 100644 67df361b388bd5414a4ec178afbdeaadeb28ab12 c243de0266086f2cf364d025ede83dcb5f53fcbe M ccache.log :100644 100644 0cc69bf7010e61309aee815ea14e91c076f2c2a9 3d48a17ff896d1e27f4e0cd78c606bd6167043f6 M commitmsg :100644 100644 be0612023ec5f0a77e6afc9b23d0e3c96c66c4f4 141809713e042ff39b59de44fe7b570744c7279f M dev-install.log :100644 100644 ce6dd2ba02cd975b4350a4c14937a2101222a737 8b188733db3a9cf1f5810fac128a7a17acf5bf43 M make.log :04 04 ae9ac7f6e1ba77f1a9b4d8dcb3c04a4304a1c666 5a6c6e9d2d90c720a0b62142f2ce89bc713b42d7 M op But I have to say that I am not really sure this result can be trusted. For each times I bibisect, LO didn't launch at the first try, only on the second try. Three times Ubuntu froze and Compiz crashed when I closed LO. The update system crashed while updating. The mouse pointer was frenetic. The mouse wheel didn't work. Etc. So if someone wants to check, here is how I proceed: Each time I launch LO, I looked in the Extension manager if the French grammar checker was installed. If it wasn't, I installed it in user profile (only for me). I closed LO and relaunched it. Then I typed: Mot. Mot. Mot. (With few spaces at the beginning.) Normally, the first spaces must be underlined. That's for checking if the grammar checker works properly. But if spaces between the words Mots. are underlined, that means the paragraph has been split as explained in my previous messages. Direct link to download the French grammar checker: http://www.dicollecte.org/download/fr/Grammalecte-v0.2.5.oxt Regards, Olivier -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4040542.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
[nSuggestedSentenceEndPos:nSuggestedSentenceEndPos+1] while l == : aRes.nStartOfNextSentencePosition = aRes.nStartOfNextSentencePosition + 1 l = rText[aRes.nStartOfNextSentencePosition:aRes.nStartOfNextSentencePosition+1] if aRes.nStartOfNextSentencePosition == nSuggestedSentenceEndPos and l!=: aRes.nStartOfNextSentencePosition = nSuggestedSentenceEndPos + 1 aRes.nBehindEndOfSentencePosition = aRes.nStartOfNextSentencePosition try: aRes.aErrors = lightproof_impl_en.proofread( nDocId, rText, rLocale, \ nStartOfSentencePos, aRes.nBehindEndOfSentencePosition, rProperties) except Exception as e: if len(rProperties) 0 and rProperties[0].Name == Debug and len(e.args) == 2: aRes.aText, aRes.nStartOfSentencePosition = e else: if 'PYUNO_LOGLEVEL' in os.environ: print(traceback.format_exc()) return aRes def ignoreRule(self, rid, aLocale): lightproof_impl_en.ignore[rid] = 1 def resetIgnoreRules(self): lightproof_impl_en.ignore = {} # XServiceDisplayName def getServiceDisplayName(self, aLocale): return lightproof_impl_en.name g_ImplementationHelper = unohelper.ImplementationHelper() g_ImplementationHelper.addImplementation( Lightproof, \ org.libreoffice.comp.pyuno.Lightproof. + pkg, (com.sun.star.linguistic2.Proofreader,),) g_ImplementationHelper.addImplementation( lightproof_handler_en.LightproofOptionsEventHandler, \ org.libreoffice.comp.pyuno.LightproofOptionsEventHandler. + pkg, (com.sun.star.awt.XContainerWindowEventHandler,),) -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639p4039657.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Grammar checker] Undocumented change in the API for LO 4
On Sun, 2013-01-20 at 08:25 -0800, Olivier R. wrote: With LibreOffice 4, each paragraph of a text is passed several times to this function in the parameter aText, and the parameters nStartOfSentencePosition and nSuggestedBehindEndOfSentencePosition are the beginning and the end of each sentence of this paragraph. Is this a bug ? a feature ? an undocumented change ? No idea, IMO it would be worth bibisecting this to find where it changed to re-examine if it was intentional or not. Would then likely be a good candidate for a unit test to lock in whichever is the right behaviour. C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
[Grammar checker] Undocumented change in the API for LO 4
Hi *, It seems that there is a change in the grammar checker behavior which seems undocumented anywhere. The change is in this function: http://api.libreoffice.org/docs/common/ref/com/sun/star/linguistic2/XProofreader.html#doProofreading With LibreOffice 3, each paragraph of a text is passed once to this function in the parameter aText, and the parameters nStartOfSentencePosition and nSuggestedBehindEndOfSentencePosition are the beginning and the end of the paragraph to check. With LibreOffice 4, each paragraph of a text is passed several times to this function in the parameter aText, and the parameters nStartOfSentencePosition and nSuggestedBehindEndOfSentencePosition are the beginning and the end of each sentence of this paragraph. Is this a bug ? a feature ? an undocumented change ? This change provokes unexpected errors with the French grammar checker. Imho, the previous way to parse texts was better than the new one, as it’s up to each grammar checker to decide what to do with the paragraph, how to split it if necessary, etc. Regards, Olivier R. -- View this message in context: http://nabble.documentfoundation.org/Grammar-checker-Undocumented-change-in-the-API-for-LO-4-tp4030639.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice