Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/24/2012 06:39 AM, Petr Viktorin wrote: On 07/24/2012 01:12 AM, John Dennis wrote: On 07/23/2012 06:27 AM, Petr Viktorin wrote: As a translator (for another project), I don't like Transifex and prefer to send good old Git pull requests. I understand a traditional workflow is hard to coordinate with others that use Transifex, but still I'd hate it if we became dependent on Tx. For better or worse we are dependent on TX (Transifex). Fedora has adopted TX as it's translation tool, RHEL's translation tools integrate with TX (as well as other translation portals). And SSSD and IPA have made a a commitment to TX based on the direction of Fedora and RHEL. Given that we've adopted TX I don't see the value in maintaining tools that support both TX and non-TX workflows. I'd rather see us delete the non-TX elements. If we have just one workflow it's easier to understand and maintain the code. If we ever decide we need to go back to a non-TX workflow we can always retrieve the deleted code from git. This means you have to be a member of a Fedora translation team to translate. Actually we're not using the Fedora TX instance, rather the transifex.net instance so I don't think we're limited to translators who are members of a Fedora translation team. It makes it harder for people to fork the project. A workflow with a mandatory central repository makes it impossible to experiment locally. I'm all for having a standard way to receive contributions, but limiting how people can create those contributions isn't good. I'm all for deleting unused code, but here I think it would be a bad move. Actually I don't have strong feelings about this one way or the other. My primary concern with two different workflows was that we have to test and maintain both and one of them is currently unused. My other concern is the added complexity, most developers and release engineers don't understand this stuff so keeping is simple to accommodate those less familiar with the process seemed like a win. But you have a valid point about being flexible, so it's fine with me to keep the old code. We probably need better documentation. John -- John Dennis jden...@redhat.com Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/24/2012 04:17 AM, Petr Viktorin wrote: On 07/23/2012 10:46 PM, John Dennis wrote: [...] The only thing holding up the ACK is the question of why po-files now has update_pot as a dependency. If files simply depend on $(DOMAIN).pot, then they are considered up-to-date even after they're changed (e.g. with strip-po). They need to depend on a rule that always runs so that they get merged. There's another alternative to achieve this: adding them to .PHONY. The attached version does that, perhaps it's cleaner. ACK, thanks for the good work. -- John Dennis jden...@redhat.com Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
John Dennis wrote: On 07/24/2012 04:17 AM, Petr Viktorin wrote: On 07/23/2012 10:46 PM, John Dennis wrote: [...] The only thing holding up the ACK is the question of why po-files now has update_pot as a dependency. If files simply depend on $(DOMAIN).pot, then they are considered up-to-date even after they're changed (e.g. with strip-po). They need to depend on a rule that always runs so that they get merged. There's another alternative to achieve this: adding them to .PHONY. The attached version does that, perhaps it's cleaner. ACK, thanks for the good work. pushed to master ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/23/2012 10:46 PM, John Dennis wrote: [...] The only thing holding up the ACK is the question of why po-files now has update_pot as a dependency. If files simply depend on $(DOMAIN).pot, then they are considered up-to-date even after they're changed (e.g. with strip-po). They need to depend on a rule that always runs so that they get merged. There's another alternative to achieve this: adding them to .PHONY. The attached version does that, perhaps it's cleaner. -- Petr³ From 87d94d673a7647ffe508a11c985e76f575180971 Mon Sep 17 00:00:00 2001 From: Petr Viktorin pvikt...@redhat.com Date: Wed, 20 Jun 2012 06:38:16 -0400 Subject: [PATCH] Arrange stripping .po files The .po files we use for translations have two shortcomings when used in Git: - They include file locations, which change each time the source is updated. This results in large, unreadable diffs that don't merge well. - They include source strings for untranslated messages, wasting space unnecessarily. Update the Makefile so that the extraneous information is stripped when the files are updated or pulled form Transifex, and empty translation files are removed entirely. Also, translations are normalized to a common style. This should help diffs and merges. The validator requires file location comments to identify the programming language, and to produce good error reports. To make this work, merge the comments in before validation. First patch for: https://fedorahosted.org/freeipa/ticket/2435 --- install/configure.ac |5 + install/po/Makefile.in | 22 -- install/po/README | 16 ++-- tests/i18n.py | 12 ++-- 4 files changed, 49 insertions(+), 6 deletions(-) diff --git a/install/configure.ac b/install/configure.ac index 827ddbab411a4aa8abbdd4488e217ce67046bd6b..9e781a684429191b3c5eb46aed4fceecc9be6586 100644 --- a/install/configure.ac +++ b/install/configure.ac @@ -48,6 +48,11 @@ if test x$MSGCMP = xno; then AC_MSG_ERROR([msgcmp not found, install gettext]) fi +AC_PATH_PROG(MSGATTRIB, msgattrib, [no]) +if test x$MSGATTRIB = xno; then +AC_MSG_ERROR([msgattrib not found, install gettext]) +fi + AC_PATH_PROG(TX, tx, [/usr/bin/tx]) AC_ARG_WITH([gettext_domain], diff --git a/install/po/Makefile.in b/install/po/Makefile.in index 9a3dde78a20a6beb35ab08230331f28b7ea3161d..bc91a933b9e10e4178cb4190e62140549da06591 100644 --- a/install/po/Makefile.in +++ b/install/po/Makefile.in @@ -14,6 +14,7 @@ MSGFMT = @MSGFMT@ MSGINIT = @MSGINIT@ MSGMERGE = @MSGMERGE@ MSGCMP = @MSGCMP@ +MSGATTRIB = @MSGATTRIB@ TX = @TX@ IPA_TEST_I18N = ../../tests/i18n.py @@ -67,7 +68,7 @@ C_POTFILES = $(C_FILES) $(H_FILES) .SUFFIXES: .SUFFIXES: .po .mo -.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug +.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug strip-po merge-po $(po_files) all: @@ -86,6 +87,19 @@ $(po_files): $(DOMAIN).pot echo Merging $(DOMAIN).pot into $@; \ $(MSGMERGE) --no-fuzzy-matching -o $@ $@ $(DOMAIN).pot +strip-po: + @for po_file in $(po_files); do \ + echo Stripping $$po_file; \ + $(MSGATTRIB) --translated --no-fuzzy --no-location $$po_file $$po_file.tmp; \ + mv $$po_file.tmp $$po_file; \ + done + @export FILES_TO_REMOVE=`find . -name '*.po' -empty`; \ + if [ $$FILES_TO_REMOVE != ]; then \ + echo Removing empty translation files; \ + rm -v $$FILES_TO_REMOVE; \ + echo; echo Please remove the deleted files from LINGUAS!; echo; \ + fi + create-po: $(DOMAIN).pot @for po_file in $(po_files); do \ if [ ! -e $$po_file ]; then \ @@ -98,10 +112,14 @@ create-po: $(DOMAIN).pot pull-po: cd ../..; $(TX) pull -f + $(MAKE) strip-po -update-po: update-pot +merge-po: update-pot $(MAKE) $(po_files) +update-po: merge-po + $(MAKE) strip-po + update-pot: @rm -f $(DOMAIN).pot.update @pushd ../.. ; \ diff --git a/install/po/README b/install/po/README index ada7df40e3f294b204a5d44c267ee57ebe734042..6894a06337fac68675cb1a852ca828c54da74f96 100644 --- a/install/po/README +++ b/install/po/README @@ -6,28 +6,40 @@ A: Edit Makefile.in and add the source file to the appropriate *_POTFILES list. NOTE: Now this i only necessary for python files that lack the .py extension. All .py, .c and .h files are automatically sourced. +Q: Untranslated strings and file locations are missing from my .po file. + How do I add them? + +A: make merge-po + Untranslated strings are left out of the files in SCM. The merge-po command + runs msgmerge to add them again. + Q: How do I pick up new strings to translate from the source files after the source have been modified? -A: make update-po +A: make merge-po This regenerates the pot template file by scanning all the source files. Then the new strings are merged into each .po file from the new pot file. Q: How do I just regenerate the pot template file without
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/24/2012 01:12 AM, John Dennis wrote: On 07/23/2012 06:27 AM, Petr Viktorin wrote: As a translator (for another project), I don't like Transifex and prefer to send good old Git pull requests. I understand a traditional workflow is hard to coordinate with others that use Transifex, but still I'd hate it if we became dependent on Tx. For better or worse we are dependent on TX (Transifex). Fedora has adopted TX as it's translation tool, RHEL's translation tools integrate with TX (as well as other translation portals). And SSSD and IPA have made a a commitment to TX based on the direction of Fedora and RHEL. Given that we've adopted TX I don't see the value in maintaining tools that support both TX and non-TX workflows. I'd rather see us delete the non-TX elements. If we have just one workflow it's easier to understand and maintain the code. If we ever decide we need to go back to a non-TX workflow we can always retrieve the deleted code from git. This means you have to be a member of a Fedora translation team to translate. It makes it harder for people to fork the project. A workflow with a mandatory central repository makes it impossible to experiment locally. I'm all for having a standard way to receive contributions, but limiting how people can create those contributions isn't good. I'm all for deleting unused code, but here I think it would be a bad move. -- Petr³ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/20/2012 07:14 PM, John Dennis wrote: On 07/20/2012 12:28 PM, Petr Viktorin wrote: On 07/20/2012 05:39 PM, John Dennis wrote: Great I agree with everything you said. I'm happy to have the file list be derived from the directory contents. Are you planning on doing that in another patch? Yes, I want to do it in a new patch. It's a bit more complicated than it looks: creating a new translation will work differently than just adding it to LINGUAS and running create-po. The ticket is for beta 2 so I'd rather not start a new round of reviews. Fine with me to do that in another patch. As for create-po, I think that's also holdover from pre-Transifex days. With Transifex I'd don't ever see a need to create an empty po file. Do you? Maybe we should just nuke the po creation in the Makefile. As a translator (for another project), I don't like Transifex and prefer to send good old Git pull requests. I understand a traditional workflow is hard to coordinate with others that use Transifex, but still I'd hate it if we became dependent on Tx. [...] But... I do have one final issue/question. I missed this in the first review. po_files is now dependent on update-pot instead of the pot file. We had decided that we were only going to regenerate the pot file on demand at specific times. Won't this dependency change cause the pot file to be updated frequently? (I realize only in the local tree). Note that when we run the validations we generate a temporary pot file from the current contents of the tree specifically to avoid overwriting the pot file. Are the po files updated more often? I don't really see a reason to merge the po files with an old pot. What merge are you referring to? The only merge I'm aware of at the moment is during validation, but that merge is done from a temporary updated pot file that is current with the tree. I'm referring to a manual merge-po. po_files are only rebuilt from merge-po, which merges the po files with the pot and adds all the missing translations and line numbers. This is not needed with Transifex workflow, as Tx should do this internally when a pot is pushed to it. Any other merging is done by Transifex at the time we pull a po file. The frequency of po update doesn't seem relevant, what is your concern in this regard? Is there another cause for the po_files to get rebuilt? I suppose having a conversation about when the pot file gets updated is a good one to have, we don't do it often enough IMHO. But I'm not sure it's correct to modify a file under SCM control if it wasn't intentional. How is Transifex set up here? If it automatically picks up changes when the pot file is modified, then we should back up the translations before changing the pot, so we can't do it automatically. Another wart is the line number cruft in the pot file -- any time it's updated we'll get a huge diff, so it makes sense to update sparingly. Transifex gives you two options for your pot file, either you tell TX the location of your pot file in a public SCM and it watches for updates and automatically pulls it when it changes in SCM -or- you manually push the pot file to TX. We've been using the watch the pot file in git option. Thus whenever we commit a new version of the pot file all developers and TX get it simultaneously (well sort of). If we do the manual push method the maintainer has to *both* commit to git *and* push to TX, so the former seems less error prone and more automated. Well, if the pot file is not in the repo, the maintainer only has to push it to Tx (after building it of course, but that needs to be done anyway). The idea was we would have a string freeze prior to release and/or periodic intervals during branch development to update the pot. But we haven't been good about hitting these. However, note a manual push suffers from the same somebody has to do it at the right moment problem. Is this idea documented anywhere? It's hard to do a string freeze if it's not enforced automatically, let alone if people don't know there should be one. If Transifex is not wired to the pot, we could even go as far as removing it from SCM entirely -- it's entirely generated, and rebuilding it takes less than a second. We'd just have to update Transifex manually. It currently is wired to the pot. You make a valid point about currently not needing to maintain the pot in SCM. When we first set up translations we weren't using TX so having the pot file in SCM was a necessity. Personally I don't trust TX's data storage and I think there is value in having each pot we push to TX be recoverable from our SCM. When things blow up (and they do) it's really nice to be able to reassemble the pieces or at lease follow the trail of how things changed. In the past I've had to answer questions like How the heck did this string get into this po file? Such questions can only be answered if we have the pot file we gave the translator. TX doesn't maintain it so we have
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/23/2012 06:27 AM, Petr Viktorin wrote: On 07/20/2012 07:14 PM, John Dennis wrote: On 07/20/2012 12:28 PM, Petr Viktorin wrote: On 07/20/2012 05:39 PM, John Dennis wrote: Great I agree with everything you said. I'm happy to have the file list be derived from the directory contents. Are you planning on doing that in another patch? Yes, I want to do it in a new patch. It's a bit more complicated than it looks: creating a new translation will work differently than just adding it to LINGUAS and running create-po. The ticket is for beta 2 so I'd rather not start a new round of reviews. Fine with me to do that in another patch. As for create-po, I think that's also holdover from pre-Transifex days. With Transifex I'd don't ever see a need to create an empty po file. Do you? Maybe we should just nuke the po creation in the Makefile. As a translator (for another project), I don't like Transifex and prefer to send good old Git pull requests. I understand a traditional workflow is hard to coordinate with others that use Transifex, but still I'd hate it if we became dependent on Tx. [...] But... I do have one final issue/question. I missed this in the first review. po_files is now dependent on update-pot instead of the pot file. We had decided that we were only going to regenerate the pot file on demand at specific times. Won't this dependency change cause the pot file to be updated frequently? (I realize only in the local tree). Note that when we run the validations we generate a temporary pot file from the current contents of the tree specifically to avoid overwriting the pot file. Are the po files updated more often? I don't really see a reason to merge the po files with an old pot. What merge are you referring to? The only merge I'm aware of at the moment is during validation, but that merge is done from a temporary updated pot file that is current with the tree. I'm referring to a manual merge-po. po_files are only rebuilt from merge-po, which merges the po files with the pot and adds all the missing translations and line numbers. This is not needed with Transifex workflow, as Tx should do this internally when a pot is pushed to it. Any other merging is done by Transifex at the time we pull a po file. The frequency of po update doesn't seem relevant, what is your concern in this regard? Is there another cause for the po_files to get rebuilt? Using the TX model there is never a reason to build po files. We just pull them from TX. I suppose having a conversation about when the pot file gets updated is a good one to have, we don't do it often enough IMHO. But I'm not sure it's correct to modify a file under SCM control if it wasn't intentional. How is Transifex set up here? If it automatically picks up changes when the pot file is modified, then we should back up the translations before changing the pot, so we can't do it automatically. Another wart is the line number cruft in the pot file -- any time it's updated we'll get a huge diff, so it makes sense to update sparingly. Transifex gives you two options for your pot file, either you tell TX the location of your pot file in a public SCM and it watches for updates and automatically pulls it when it changes in SCM -or- you manually push the pot file to TX. We've been using the watch the pot file in git option. Thus whenever we commit a new version of the pot file all developers and TX get it simultaneously (well sort of). If we do the manual push method the maintainer has to *both* commit to git *and* push to TX, so the former seems less error prone and more automated. Well, if the pot file is not in the repo, the maintainer only has to push it to Tx (after building it of course, but that needs to be done anyway). The idea was we would have a string freeze prior to release and/or periodic intervals during branch development to update the pot. But we haven't been good about hitting these. However, note a manual push suffers from the same somebody has to do it at the right moment problem. Is this idea documented anywhere? It's hard to do a string freeze if it's not enforced automatically, let alone if people don't know there should be one. It was discussed in the developer conference calls. If Transifex is not wired to the pot, we could even go as far as removing it from SCM entirely -- it's entirely generated, and rebuilding it takes less than a second. We'd just have to update Transifex manually. It currently is wired to the pot. You make a valid point about currently not needing to maintain the pot in SCM. When we first set up translations we weren't using TX so having the pot file in SCM was a necessity. Personally I don't trust TX's data storage and I think there is value in having each pot we push to TX be recoverable from our SCM. When things blow up (and they do) it's really nice to be able to reassemble the pieces or at lease follow the trail of how things changed. In the past I've had to answer
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/23/2012 06:27 AM, Petr Viktorin wrote: As a translator (for another project), I don't like Transifex and prefer to send good old Git pull requests. I understand a traditional workflow is hard to coordinate with others that use Transifex, but still I'd hate it if we became dependent on Tx. For better or worse we are dependent on TX (Transifex). Fedora has adopted TX as it's translation tool, RHEL's translation tools integrate with TX (as well as other translation portals). And SSSD and IPA have made a a commitment to TX based on the direction of Fedora and RHEL. Given that we've adopted TX I don't see the value in maintaining tools that support both TX and non-TX workflows. I'd rather see us delete the non-TX elements. If we have just one workflow it's easier to understand and maintain the code. If we ever decide we need to go back to a non-TX workflow we can always retrieve the deleted code from git. -- John Dennis jden...@redhat.com Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/19/2012 10:52 PM, John Dennis wrote: On 06/25/2012 07:17 AM, Petr Viktorin wrote: The translation files we currently store in Git are full of redundant information: source strings for untranslated messages, and file locations. The first causes unnecessarily huge files. The second makes diffs unreadable: when code is edited and line numbers change, metadata for all messages shows up as changed. This makes reviewing translation patches, and merging possible conflicts, hard -- it requires specialized tools. This patch changes the Makefile to strip the unneeded data from .po files. Translators using Git must now run msgmerge (or, `make merge-po`) to get .po files they can work with. Transifex users are unaffected, as the source .pot file is not changed. The i18n tests use file locations for producing nice error reports¹. To make this work as before, the .pot is merged in before validation to restore comments. Currently this takes a noticeable amount of time, because polib uses a particularly naïve algorithm for merging. I've sent a patch to polib to resolve this; once that makes it downstream merging will be fast again. Updating the translations with the new Makefile will cause a 5MB patch. I don't want to pollute the mailing list with it, at least until the Makefile patch is reviewed. It's available https://github.com/encukou/freeipa/commit/65e2e4.patch https://fedorahosted.org/freeipa/ticket/2435 -- ¹ And for divining the programming language messages come from, but that is only done on the .pot file, unaffected by this patch. Good work and it's very close to getting an ACK. There is now a discrepancy between what the Makefile thinks is the list of po files and the actual list of po files after running strip-po. This causes confusing errors. I think the source of this problem is the Makefile has a list of po files in the variable $(po_files) For starters why is: strip-po: @for po_file in $$(ls *.po); do \ instead of: strip-po: @for po_file in $(po_files); do \ Good catch, I'll update it to be consistent with the status quo. But see below. If you run make validate-po before running make strip-po you get: 5 errors in 21 files After stripping the po files make validate-po gives you: 14 errors in 21 files I left updating the files to a subsequent patch (https://github.com/encukou/freeipa/commit/65e2e4.patch); the LINGUAS update is part of that. The extra 9 errors are due to the fact validate-po is being asked to validate a non-existent po file which it considers an error (which I believe is a correct check). make msg-stats gets confused for the same reason, it's asked to examine files that no longer exist. make mo-files now fails catastrophically for the same reason, it's being asked to operate on files that don't exist. In general large parts of the Makefile will now be confused or generate errors because the file list is incorrect. Somehow we have to align the list of po files. That presents all sorts of interesting questions: * does the list come from the LINQUAS file? (current method) * does the list come from git? Doesn't work if you're not in a git development tree. This problem is easily seen when the RPM's are built. No file list can be generated because there is no git repo so you end up with 0 files being passed to the validation commands. Since validation is not critical when building RPM's this hasn't been a show stopper but it really needs to be fixed in some way at some point. I agree that tying ourselves to Git isn't a nice thing to do. I know I am never happy when I can't compile some project in Mercurial after importing it to Git :) If we use the ls-files strategy then that should at least write the list to a version-controlled file, which we fall back to in case we're not in a git tree. * does the list come from the current directory contents? What you did with strip-po, but that also has a potential for errors. What if someone deletes or adds a file in their tree by mistake? I personally would do this -- the most straightforward way to do it. If someone adds or deletes a file by mistake, a `git status` will reveal it. We could have a sanity check that refuses to build if there is a discrepancy between Git and the working tree (of course outside of a Git repo it would just warn). There's one more reason for going with directory contents: when you're pulling from Transifex or otherwise adding/removing the translation files, you have to carefully keep LINGUAS in sync with the tree, otherwise the tools can either blow up or do too little. Debugging that could be frustrating. Having the tools look in the directory itself, and only doing sanity checking at a point where everything should be in order, should make everything easier. * should make strip-po edit the LINGUAS file? (maybe the best solution). Maybe when it detects an empty file and removes it it should run a sed command to delete the line in LINGUAS?
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
Great I agree with everything you said. I'm happy to have the file list be derived from the directory contents. Are you planning on doing that in another patch? FWIW the LINGUAS file was a holdover from when we first set this up based exclusively on GNU gettext suggested examples. As things have evolved it no longer makes sense. Also the contributing translators file is now out of date and was from an earlier era when translators emailed .po files to us, so it was easy to maintain. Now that everything is TX based we should probably nuke that file or figure out someway to extract the contributors from either TX or the po files. I'm not sure we're even giving credit to the translators anymore, but we should. But... I do have one final issue/question. I missed this in the first review. po_files is now dependent on update-pot instead of the pot file. We had decided that we were only going to regenerate the pot file on demand at specific times. Won't this dependency change cause the pot file to be updated frequently? (I realize only in the local tree). Note that when we run the validations we generate a temporary pot file from the current contents of the tree specifically to avoid overwriting the pot file. I suppose having a conversation about when the pot file gets updated is a good one to have, we don't do it often enough IMHO. But I'm not sure it's correct to modify a file under SCM control if it wasn't intentional. -- John Dennis jden...@redhat.com Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 07/20/2012 12:28 PM, Petr Viktorin wrote: On 07/20/2012 05:39 PM, John Dennis wrote: Great I agree with everything you said. I'm happy to have the file list be derived from the directory contents. Are you planning on doing that in another patch? Yes, I want to do it in a new patch. It's a bit more complicated than it looks: creating a new translation will work differently than just adding it to LINGUAS and running create-po. The ticket is for beta 2 so I'd rather not start a new round of reviews. Fine with me to do that in another patch. As for create-po, I think that's also holdover from pre-Transifex days. With Transifex I'd don't ever see a need to create an empty po file. Do you? Maybe we should just nuke the po creation in the Makefile. FWIW the LINGUAS file was a holdover from when we first set this up based exclusively on GNU gettext suggested examples. As things have evolved it no longer makes sense. Also the contributing translators file is now out of date and was from an earlier era when translators emailed .po files to us, so it was easy to maintain. Now that everything is TX based we should probably nuke that file or figure out someway to extract the contributors from either TX or the po files. I'm not sure we're even giving credit to the translators anymore, but we should. Noted; when the discussion's done I'll file a ticket. But... I do have one final issue/question. I missed this in the first review. po_files is now dependent on update-pot instead of the pot file. We had decided that we were only going to regenerate the pot file on demand at specific times. Won't this dependency change cause the pot file to be updated frequently? (I realize only in the local tree). Note that when we run the validations we generate a temporary pot file from the current contents of the tree specifically to avoid overwriting the pot file. Are the po files updated more often? I don't really see a reason to merge the po files with an old pot. What merge are you referring to? The only merge I'm aware of at the moment is during validation, but that merge is done from a temporary updated pot file that is current with the tree. Any other merging is done by Transifex at the time we pull a po file. The frequency of po update doesn't seem relevant, what is your concern in this regard? I suppose having a conversation about when the pot file gets updated is a good one to have, we don't do it often enough IMHO. But I'm not sure it's correct to modify a file under SCM control if it wasn't intentional. How is Transifex set up here? If it automatically picks up changes when the pot file is modified, then we should back up the translations before changing the pot, so we can't do it automatically. Another wart is the line number cruft in the pot file -- any time it's updated we'll get a huge diff, so it makes sense to update sparingly. Transifex gives you two options for your pot file, either you tell TX the location of your pot file in a public SCM and it watches for updates and automatically pulls it when it changes in SCM -or- you manually push the pot file to TX. We've been using the watch the pot file in git option. Thus whenever we commit a new version of the pot file all developers and TX get it simultaneously (well sort of). If we do the manual push method the maintainer has to *both* commit to git *and* push to TX, so the former seems less error prone and more automated. The idea was we would have a string freeze prior to release and/or periodic intervals during branch development to update the pot. But we haven't been good about hitting these. However, note a manual push suffers from the same somebody has to do it at the right moment problem. If Transifex is not wired to the pot, we could even go as far as removing it from SCM entirely -- it's entirely generated, and rebuilding it takes less than a second. We'd just have to update Transifex manually. It currently is wired to the pot. You make a valid point about currently not needing to maintain the pot in SCM. When we first set up translations we weren't using TX so having the pot file in SCM was a necessity. Personally I don't trust TX's data storage and I think there is value in having each pot we push to TX be recoverable from our SCM. When things blow up (and they do) it's really nice to be able to reassemble the pieces or at lease follow the trail of how things changed. In the past I've had to answer questions like How the heck did this string get into this po file? Such questions can only be answered if we have the pot file we gave the translator. TX doesn't maintain it so we have to (or at least I think there is value in it). Perhaps you can read between the lines and detect I don't view TX as the epitome of stability and robustness. It's still young and they are still adding features and changing how it works (kinda like IPA :-) Oh, one thing I'll ask about: the Makefile is
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 06/25/2012 07:17 AM, Petr Viktorin wrote: The translation files we currently store in Git are full of redundant information: source strings for untranslated messages, and file locations. The first causes unnecessarily huge files. The second makes diffs unreadable: when code is edited and line numbers change, metadata for all messages shows up as changed. This makes reviewing translation patches, and merging possible conflicts, hard -- it requires specialized tools. This patch changes the Makefile to strip the unneeded data from .po files. Translators using Git must now run msgmerge (or, `make merge-po`) to get .po files they can work with. Transifex users are unaffected, as the source .pot file is not changed. The i18n tests use file locations for producing nice error reports¹. To make this work as before, the .pot is merged in before validation to restore comments. Currently this takes a noticeable amount of time, because polib uses a particularly naïve algorithm for merging. I've sent a patch to polib to resolve this; once that makes it downstream merging will be fast again. Updating the translations with the new Makefile will cause a 5MB patch. I don't want to pollute the mailing list with it, at least until the Makefile patch is reviewed. It's available https://github.com/encukou/freeipa/commit/65e2e4.patch https://fedorahosted.org/freeipa/ticket/2435 -- ¹ And for divining the programming language messages come from, but that is only done on the .pot file, unaffected by this patch. Good work and it's very close to getting an ACK. There is now a discrepancy between what the Makefile thinks is the list of po files and the actual list of po files after running strip-po. This causes confusing errors. I think the source of this problem is the Makefile has a list of po files in the variable $(po_files) For starters why is: strip-po: @for po_file in $$(ls *.po); do \ instead of: strip-po: @for po_file in $(po_files); do \ If you run make validate-po before running make strip-po you get: 5 errors in 21 files After stripping the po files make validate-po gives you: 14 errors in 21 files The extra 9 errors are due to the fact validate-po is being asked to validate a non-existent po file which it considers an error (which I believe is a correct check). make msg-stats gets confused for the same reason, it's asked to examine files that no longer exist. make mo-files now fails catastrophically for the same reason, it's being asked to operate on files that don't exist. In general large parts of the Makefile will now be confused or generate errors because the file list is incorrect. Somehow we have to align the list of po files. That presents all sorts of interesting questions: * does the list come from the LINQUAS file? (current method) * does the list come from git? Doesn't work if you're not in a git development tree. This problem is easily seen when the RPM's are built. No file list can be generated because there is no git repo so you end up with 0 files being passed to the validation commands. Since validation is not critical when building RPM's this hasn't been a show stopper but it really needs to be fixed in some way at some point. * does the list come from the current directory contents? What you did with strip-po, but that also has a potential for errors. What if someone deletes or adds a file in their tree by mistake? * should make strip-po edit the LINGUAS file? (maybe the best solution). Maybe when it detects an empty file and removes it it should run a sed command to delete the line in LINGUAS? It may not be evident from Makefile.in but over the years there has been competing strategies for how to get our list of files. Simo added the git ls-files strategy because he didn't want to have an explict list which had to be maintained (a valid concern) that still left us with the PY_EXPLICIT_FILES list, so how much did that really accomplish? Maybe PY_EXPLICIT_FILES can be removed in favor of a utility that tests the file type (or the hashbang interpreter line). But that still ties things to a git tree (ugh). If you have any great ideas on how to address the file list issue it would be good to hear. However in the interim we have to somehow adjust the po file list after strip-po runs, once that's done I'm happy to ACK it. I wouldn't be surprised if you responded Well, the file list discrepancy only occurs when a maintainer is explicitly stripping po files and they should know they have to adjust the LINGUAS file therefore these confusing errors won't be seen by someone who would be confused by them. Maybe yes, maybe no. I can think of plenty of times I debugged some build/configure/make failure and groaned because it was in some area that was totally cryptic and unknown to me, took a long time to unravel and had a trivial adjustment fix that would have only been known to an expert in that part of the code.
Re: [Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
On 06/25/2012 01:17 PM, Petr Viktorin wrote: The translation files we currently store in Git are full of redundant information: source strings for untranslated messages, and file locations. The first causes unnecessarily huge files. The second makes diffs unreadable: when code is edited and line numbers change, metadata for all messages shows up as changed. This makes reviewing translation patches, and merging possible conflicts, hard -- it requires specialized tools. This patch changes the Makefile to strip the unneeded data from .po files. Translators using Git must now run msgmerge (or, `make merge-po`) to get .po files they can work with. Transifex users are unaffected, as the source .pot file is not changed. The i18n tests use file locations for producing nice error reports¹. To make this work as before, the .pot is merged in before validation to restore comments. Currently this takes a noticeable amount of time, because polib uses a particularly naïve algorithm for merging. I've sent a patch to polib to resolve this; once that makes it downstream merging will be fast again. Updating the translations with the new Makefile will cause a 5MB patch. I don't want to pollute the mailing list with it, at least until the Makefile patch is reviewed. It's available https://github.com/encukou/freeipa/commit/65e2e4.patch https://fedorahosted.org/freeipa/ticket/2435 Could someone (John?) take some time to look at the patch? I'll be away from office, returning on Tuesday 17th before the beta. It would be nice to have a review when I return. -- Petr³ ___ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
[Freeipa-devel] [PATCH] 0066 Arrange stripping .po files
The translation files we currently store in Git are full of redundant information: source strings for untranslated messages, and file locations. The first causes unnecessarily huge files. The second makes diffs unreadable: when code is edited and line numbers change, metadata for all messages shows up as changed. This makes reviewing translation patches, and merging possible conflicts, hard -- it requires specialized tools. This patch changes the Makefile to strip the unneeded data from .po files. Translators using Git must now run msgmerge (or, `make merge-po`) to get .po files they can work with. Transifex users are unaffected, as the source .pot file is not changed. The i18n tests use file locations for producing nice error reports¹. To make this work as before, the .pot is merged in before validation to restore comments. Currently this takes a noticeable amount of time, because polib uses a particularly naïve algorithm for merging. I've sent a patch to polib to resolve this; once that makes it downstream merging will be fast again. Updating the translations with the new Makefile will cause a 5MB patch. I don't want to pollute the mailing list with it, at least until the Makefile patch is reviewed. It's available https://github.com/encukou/freeipa/commit/65e2e4.patch https://fedorahosted.org/freeipa/ticket/2435 -- ¹ And for divining the programming language messages come from, but that is only done on the .pot file, unaffected by this patch. -- Petr³ From 16b20b737225908311f98e55db0938515e1abad6 Mon Sep 17 00:00:00 2001 From: Petr Viktorin pvikt...@redhat.com Date: Wed, 20 Jun 2012 06:38:16 -0400 Subject: [PATCH] Arrange stripping .po files The .po files we use for translations have two shortcomings when used in Git: - They include file locations, which change each time the source is updated. This results in large, unreadable diffs that don't merge well. - They include source strings for untranslated messages, wasting space unnecessarily. Update the Makefile so that the extraneous information is stripped when the files are updated or pulled form Transifex, and empty translation files are removed entirely. Also, translations are normalized to a common style. This should help diffs and merges. The validator requires file location comments to identify the programming language, and to produce good error reports. To make this work, merge the comments in before validation. First patch for: https://fedorahosted.org/freeipa/ticket/2435 --- install/configure.ac |5 + install/po/Makefile.in | 20 +--- install/po/README | 16 ++-- tests/i18n.py | 12 ++-- 4 files changed, 46 insertions(+), 7 deletions(-) diff --git a/install/configure.ac b/install/configure.ac index 827ddbab411a4aa8abbdd4488e217ce67046bd6b..9e781a684429191b3c5eb46aed4fceecc9be6586 100644 --- a/install/configure.ac +++ b/install/configure.ac @@ -48,6 +48,11 @@ if test x$MSGCMP = xno; then AC_MSG_ERROR([msgcmp not found, install gettext]) fi +AC_PATH_PROG(MSGATTRIB, msgattrib, [no]) +if test x$MSGATTRIB = xno; then +AC_MSG_ERROR([msgattrib not found, install gettext]) +fi + AC_PATH_PROG(TX, tx, [/usr/bin/tx]) AC_ARG_WITH([gettext_domain], diff --git a/install/po/Makefile.in b/install/po/Makefile.in index 9a3dde78a20a6beb35ab08230331f28b7ea3161d..c1a9bc8b8962fa2f9c7ff2bf541f5996e34a642f 100644 --- a/install/po/Makefile.in +++ b/install/po/Makefile.in @@ -14,6 +14,7 @@ MSGFMT = @MSGFMT@ MSGINIT = @MSGINIT@ MSGMERGE = @MSGMERGE@ MSGCMP = @MSGCMP@ +MSGATTRIB = @MSGATTRIB@ TX = @TX@ IPA_TEST_I18N = ../../tests/i18n.py @@ -67,25 +68,34 @@ C_POTFILES = $(C_FILES) $(H_FILES) .SUFFIXES: .SUFFIXES: .po .mo -.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug +.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug strip-po merge-po all: SUFFIXES = .po .mo .po.mo: @echo Creating $@; \ $(MSGFMT) -c -o t-$@ $ mv t-$@ $@ -$(po_files): $(DOMAIN).pot +$(po_files): update-pot @if [ ! -f $@ ]; then \ lang=`echo $@ | $(SED) -r -e 's/\.po$$//'` # Strip .po suffix ; \ echo Creating nonexistent $@, you should add this file to your SCM repository; \ $(MSGINIT) --locale $$lang --no-translator -i $(DOMAIN).pot -o $@; \ fi; \ echo Merging $(DOMAIN).pot into $@; \ $(MSGMERGE) --no-fuzzy-matching -o $@ $@ $(DOMAIN).pot +strip-po: + @for po_file in $$(ls *.po); do \ + echo Stripping $$po_file; \ + $(MSGATTRIB) --translated --no-fuzzy --no-location $$po_file $$po_file.tmp; \ + mv $$po_file.tmp $$po_file; \ + done + @echo Remove empty translation files; \ + find . -name '*.po' -empty -exec rm -v {} \; + create-po: $(DOMAIN).pot @for po_file in $(po_files); do \ if [ ! -e $$po_file ]; then \ @@ -98,10 +108,14 @@ create-po: $(DOMAIN).pot pull-po: cd ../..; $(TX) pull -f + $(MAKE) strip-po -update-po: