Re: [PATCH] Support generate poison .mo files for testing
On Wed, Aug 22, 2012 at 11:22 PM, Junio C Hamano gits...@pobox.com wrote: Nguyen Thai Ngoc Duy pclo...@gmail.com writes: But a better way could be replacing tracked with t r a c k e d. We know the rule so we can recreate the that string from tracked in test_i18n*. Or reverse the upper/lower case, whichever is easier for the recreation by test_i18n* That does not make much sense to me, so either one of us must be slightly confused. I thought the only purpose of testing with the poison was to find messages that must not be localized but were localized by mistake. For that, we have to make sure that anything that uses test_i18n* is reading from Porcelain, in other words, we must use the byte-for-byte comparison without using test_i18n* when verifying the plumbing output. And the primary requirement for this arrangement to work is that the expected output in C locale and the actual ouptut in the synthetic poison locale are reliably different. They do not have to be reversible (I was actually going to suggest rot13 of the original instead of cycling through the * gettext poison * in your patch --- prefixing with QQ would not work, as it is likely that the test with grep may not be anchored at the left end). Yes, for verifying plumbing output that should be enough. Teaching test_i18n* to fuzzily match the expected output in C locale against the actual output in synthetic poison locale may or may not be doable, but spending any cycle working on that sounds like a total waste of time (even though it might be fun). It does not test that we are translating Porcelain messages correctly. Right. I was aiming at testing translated messages but obviously went off track trying to test a fake locale instead. Thanks for stopping me. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support generate poison .mo files for testing
On Wed, Aug 22, 2012 at 11:43 PM, Junio C Hamano gits...@pobox.com wrote: why are you special casing a run of non-blank letters that begin with a dollar sign (swapping two ints is done with %2$d %1$d, a percent still at the beginning, so there must be something else I am missing)? '$' is for shell variables. I should separate it from C messages. All these because now that we run the (fake) translation through gettext toolchain, it'll catch us if we don't preserve dynamic parts correctly. Also why do you stop at isspace()? Isn't a (space) a flag that means If the first character of a signed conversion is not a sign or if a signed conversion results in no characters, a space shall be prefixed to the result. A hurry attempt to get past msgfmt. I should refine these code, either by... As the flags, min-width, precision, and length do not share the same character as the conversion that has to come at the end, I think you only want to do something like /* * conversion specifier characters, taken from: * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html */ static const char printf_conversion[] = diouxXfFeEgGaAcspnCS%; ... while (msg end) { if (*msg == '%') { strbuf_addch(buf, *msg++); while (msg end) { int ch = *msg++; strbuf_addch(buf, ch); if (strchr(printf_conversion, ch)) break; } /* copied the printf part literally */ continue; } ... keep \n ... ... muck with string ... } perhaps? following this, or copying the matching logic from msgfmt source. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support generate poison .mo files for testing
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: test-poisongen does a similar job to gettext poison feature except that it does it at build time. Gibberish .mo files are generated for all supported langauges and put in po/build/poison-locale. Target poison-locale is for this. What is the significance of this locale being Gibberish? Currently, for any string, we give ### gettext poison ### or something but the only thing we care about in the poison mode is that it is different from the message id, no? I was wondering if these phony translations can be something simple like Add QQ at the beginning of the message id string and still can catch mistakenly marked messages that come from the plumbing layer, or something. As you have already written a printf skipper that looks fairly conservative, perhaps I shouldn't be worried too much about it, but we seem to be spending considerable effort on the poison, and it makes me wonder (even though no better alternative comes to mind) if we could do better. The reason we do poison (be it the current one or locale based one) in the first place is so that we want to make sure messages from the plumbing are not marked for i18n, and we do so by running our test under the poison mode that produces output different from the in-code text that are marked for i18n, which somehow feels quite a roundabout way of doing so. User can run the test with these .mo files by setting POISON_LOCALE while running the test suite. User must also set LANG/LC_* correctly (and the system is supposed to support that locale). OK let me redo step one. test-poisongen requires libgettextpo. I'm not sure if this library if gnu specific. We may need another flag for it instead of NO_GETTEXT. We don't need a fake language code with this approach. OK. Makefile | 19 t/test-lib.sh| 10 +++- test-poisongen.c | 139 +++ wrap-for-bin.sh | 6 ++- 4 files changed, 171 insertions(+), 3 deletions(-) mode change 100644 = 100755 t/test-lib.sh create mode 100644 test-poisongen.c mode change 100644 = 100755 wrap-for-bin.sh Thanks. I suspect two mode changes weren't intentional? +static void translate(const char *msg, struct strbuf *buf) +{ + const char *end = msg + strlen(msg); + const char *text = * GETTEXT POISON *; + int text_len = strlen(text); + int t = 0; + + strbuf_reset(buf); + /* preserve \n and printf format specifiers because msgfmt +barfs otherwise. */ + while (msg end) { + /* printf specifiers and shell variables, it's a quite +relax check */ + if ((*msg == '%' || *msg == '$') msg+1 end) { + strbuf_addch(buf, *msg++); + do +strbuf_addch(buf, *msg); + while (msg end !isspace(*msg++)); + } else if (*msg == '\n') { + /* we only need to preserve trailing newlines, doing +more does not really harm */ + strbuf_addch(buf, '\n'); + msg++; + } else { + strbuf_addch(buf, text[t]); + t = (t + 1) % text_len; + msg++; + } + } +} -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support generate poison .mo files for testing
On Wed, Aug 22, 2012 at 6:13 PM, Junio C Hamano gits...@pobox.com wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: test-poisongen does a similar job to gettext poison feature except that it does it at build time. Gibberish .mo files are generated for all supported langauges and put in po/build/poison-locale. Target poison-locale is for this. What is the significance of this locale being Gibberish? Currently, for any string, we give ### gettext poison ### or something but the only thing we care about in the poison mode is that it is different from the message id, no? I was wondering if these phony translations can be something simple like Add QQ at the beginning of the message id string and still can catch mistakenly marked messages that come from the plumbing layer, or something. I'm gradually getting there, partly thanks to your question about grepping tracked in another thread. This patch does not really generate random strings. It repeats the pattern * gettext poison * for evey character that can be replaced. But a better way could be replacing tracked with t r a c k e d. We know the rule so we can recreate the that string from tracked in test_i18n*. Or reverse the upper/lower case, whichever is easier for the recreation by test_i18n* mode change 100644 = 100755 t/test-lib.sh create mode 100644 test-poisongen.c mode change 100644 = 100755 wrap-for-bin.sh Thanks. I suspect two mode changes weren't intentional? It's an emacs hook gone bad. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support generate poison .mo files for testing
Nguyen Thai Ngoc Duy pclo...@gmail.com writes: But a better way could be replacing tracked with t r a c k e d. We know the rule so we can recreate the that string from tracked in test_i18n*. Or reverse the upper/lower case, whichever is easier for the recreation by test_i18n* That does not make much sense to me, so either one of us must be slightly confused. I thought the only purpose of testing with the poison was to find messages that must not be localized but were localized by mistake. For that, we have to make sure that anything that uses test_i18n* is reading from Porcelain, in other words, we must use the byte-for-byte comparison without using test_i18n* when verifying the plumbing output. And the primary requirement for this arrangement to work is that the expected output in C locale and the actual ouptut in the synthetic poison locale are reliably different. They do not have to be reversible (I was actually going to suggest rot13 of the original instead of cycling through the * gettext poison * in your patch --- prefixing with QQ would not work, as it is likely that the test with grep may not be anchored at the left end). Teaching test_i18n* to fuzzily match the expected output in C locale against the actual output in synthetic poison locale may or may not be doable, but spending any cycle working on that sounds like a total waste of time (even though it might be fun). It does not test that we are translating Porcelain messages correctly. Am I missing something? Puzzled... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support generate poison .mo files for testing
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: +static void translate(const char *msg, struct strbuf *buf) +{ + const char *end = msg + strlen(msg); + const char *text = * GETTEXT POISON *; + int text_len = strlen(text); + int t = 0; + + strbuf_reset(buf); + /* preserve \n and printf format specifiers because msgfmt +barfs otherwise. */ + while (msg end) { + /* printf specifiers and shell variables, it's a quite +relax check */ + if ((*msg == '%' || *msg == '$') msg+1 end) { + strbuf_addch(buf, *msg++); + do +strbuf_addch(buf, *msg); + while (msg end !isspace(*msg++)); Aside from the Style: do { ... } while (); why are you special casing a run of non-blank letters that begin with a dollar sign (swapping two ints is done with %2$d %1$d, a percent still at the beginning, so there must be something else I am missing)? Also why do you stop at isspace()? Isn't a (space) a flag that means If the first character of a signed conversion is not a sign or if a signed conversion results in no characters, a space shall be prefixed to the result. As the flags, min-width, precision, and length do not share the same character as the conversion that has to come at the end, I think you only want to do something like /* * conversion specifier characters, taken from: * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html */ static const char printf_conversion[] = diouxXfFeEgGaAcspnCS%; ... while (msg end) { if (*msg == '%') { strbuf_addch(buf, *msg++); while (msg end) { int ch = *msg++; strbuf_addch(buf, ch); if (strchr(printf_conversion, ch)) break; } /* copied the printf part literally */ continue; } ... keep \n ... ... muck with string ... } perhaps? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html