Re: [PATCH] Support generate poison .mo files for testing

2012-08-23 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 22, 2012 at 11:22 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyen Thai Ngoc Duy pclo...@gmail.com writes:

 But a better way could be
 replacing tracked with t r a c k e d. We know the rule so we can
 recreate the that string from tracked in test_i18n*. Or reverse the
 upper/lower case, whichever is easier for the recreation by test_i18n*

 That does not make much sense to me, so either one of us must be
 slightly confused.  I thought the only purpose of testing with the
 poison was to find messages that must not be localized but were
 localized by mistake.  For that, we have to make sure that anything
 that uses test_i18n* is reading from Porcelain, in other words, we
 must use the byte-for-byte comparison without using test_i18n* when
 verifying the plumbing output.  And the primary requirement for this
 arrangement to work is that the expected output in C locale and the
 actual ouptut in the synthetic poison locale are reliably different.
 They do not have to be reversible (I was actually going to suggest
 rot13 of the original instead of cycling through the * gettext
 poison * in your patch --- prefixing with QQ would not work, as it
 is likely that the test with grep may not be anchored at the left
 end).

Yes, for verifying plumbing output that should be enough.

 Teaching test_i18n* to fuzzily match the expected output in C locale
 against the actual output in synthetic poison locale may or may not
 be doable, but spending any cycle working on that sounds like a
 total waste of time (even though it might be fun).  It does not test
 that we are translating Porcelain messages correctly.

Right. I was aiming at testing translated messages but obviously went
off track trying to test a fake locale instead. Thanks for stopping
me.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support generate poison .mo files for testing

2012-08-23 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 22, 2012 at 11:43 PM, Junio C Hamano gits...@pobox.com wrote:
 why are you special casing a run of non-blank letters that begin
 with a dollar sign (swapping two ints is done with %2$d %1$d, a
 percent still at the beginning, so there must be something else I am
 missing)?

'$' is for shell variables. I should separate it from C messages. All
these because now that we run the (fake) translation through gettext
toolchain, it'll catch us if we don't preserve dynamic parts
correctly.

 Also why do you stop at isspace()?  Isn't a   (space) a flag that
 means If the first character of a signed conversion is not a sign
 or if a signed conversion results in no characters, a space shall
 be prefixed to the result.

A hurry attempt to get past msgfmt. I should refine these code, either by...

 As the flags, min-width, precision, and length do not share the same
 character as the conversion that has to come at the end, I think you
 only want to do something like

 /*
  * conversion specifier characters, taken from:
  * 
 http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
  */
 static const char printf_conversion[] = diouxXfFeEgGaAcspnCS%;

 ...

 while (msg  end) {
 if (*msg == '%') {
 strbuf_addch(buf, *msg++);
 while (msg  end) {
 int ch = *msg++;
 strbuf_addch(buf, ch);
 if (strchr(printf_conversion, ch))
 break;
 }
 /* copied the printf part literally */
 continue;
 }
 ... keep \n ...
 ... muck with string ...
 }

 perhaps?

following this, or copying the matching logic from msgfmt source.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support generate poison .mo files for testing

2012-08-22 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 test-poisongen does a similar job to gettext poison feature except
 that it does it at build time. Gibberish .mo files are generated for
 all supported langauges and put in po/build/poison-locale. Target
 poison-locale is for this.

What is the significance of this locale being Gibberish?
Currently, for any string, we give ### gettext poison ### or
something but the only thing we care about in the poison mode is
that it is different from the message id, no?  I was wondering if
these phony translations can be something simple like Add QQ at the
beginning of the message id string and still can catch mistakenly
marked messages that come from the plumbing layer, or something.

As you have already written a printf skipper that looks fairly
conservative, perhaps I shouldn't be worried too much about it, but
we seem to be spending considerable effort on the poison, and it
makes me wonder (even though no better alternative comes to mind) if
we could do better.  The reason we do poison (be it the current
one or locale based one) in the first place is so that we want to
make sure messages from the plumbing are not marked for i18n, and we
do so by running our test under the poison mode that produces
output different from the in-code text that are marked for i18n,
which somehow feels quite a roundabout way of doing so.

 User can run the test with these .mo files by setting POISON_LOCALE
 while running the test suite. User must also set LANG/LC_* correctly
 (and the system is supposed to support that locale).

  OK let me redo step one. test-poisongen requires libgettextpo. I'm
  not sure if this library if gnu specific. We may need another flag
  for it instead of NO_GETTEXT. We don't need a fake language code with
  this approach.

OK.

  Makefile |  19 
  t/test-lib.sh|  10 +++-
  test-poisongen.c | 139 
 +++
  wrap-for-bin.sh  |   6 ++-
  4 files changed, 171 insertions(+), 3 deletions(-)
  mode change 100644 = 100755 t/test-lib.sh
  create mode 100644 test-poisongen.c
  mode change 100644 = 100755 wrap-for-bin.sh

Thanks.  I suspect two mode changes weren't intentional?

 +static void translate(const char *msg, struct strbuf *buf)
 +{
 + const char *end = msg + strlen(msg);
 + const char *text = * GETTEXT POISON *;
 + int text_len = strlen(text);
 + int t = 0;
 +
 + strbuf_reset(buf);
 + /* preserve \n and printf format specifiers because msgfmt
 +barfs otherwise. */
 + while (msg  end) {
 + /* printf specifiers and shell variables, it's a quite
 +relax check */
 + if ((*msg == '%' || *msg == '$')  msg+1  end) {
 + strbuf_addch(buf, *msg++);
 + do
 +strbuf_addch(buf, *msg);
 + while (msg  end  !isspace(*msg++));
 + } else if (*msg == '\n') {
 + /* we only need to preserve trailing newlines, doing
 +more does not really harm */
 + strbuf_addch(buf, '\n');
 + msg++;
 + } else {
 + strbuf_addch(buf, text[t]);
 + t = (t + 1) % text_len;
 + msg++;
 + }
 + }
 +}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support generate poison .mo files for testing

2012-08-22 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 22, 2012 at 6:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 test-poisongen does a similar job to gettext poison feature except
 that it does it at build time. Gibberish .mo files are generated for
 all supported langauges and put in po/build/poison-locale. Target
 poison-locale is for this.

 What is the significance of this locale being Gibberish?
 Currently, for any string, we give ### gettext poison ### or
 something but the only thing we care about in the poison mode is
 that it is different from the message id, no?  I was wondering if
 these phony translations can be something simple like Add QQ at the
 beginning of the message id string and still can catch mistakenly
 marked messages that come from the plumbing layer, or something.

I'm gradually getting there, partly thanks to your question about
grepping tracked in another thread. This patch does not really
generate random strings. It repeats the pattern * gettext poison *
for evey character that can be replaced. But a better way could be
replacing tracked with t r a c k e d. We know the rule so we can
recreate the that string from tracked in test_i18n*. Or reverse the
upper/lower case, whichever is easier for the recreation by test_i18n*

  mode change 100644 = 100755 t/test-lib.sh
  create mode 100644 test-poisongen.c
  mode change 100644 = 100755 wrap-for-bin.sh

 Thanks.  I suspect two mode changes weren't intentional?

It's an emacs hook gone bad.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support generate poison .mo files for testing

2012-08-22 Thread Junio C Hamano
Nguyen Thai Ngoc Duy pclo...@gmail.com writes:

 But a better way could be
 replacing tracked with t r a c k e d. We know the rule so we can
 recreate the that string from tracked in test_i18n*. Or reverse the
 upper/lower case, whichever is easier for the recreation by test_i18n*

That does not make much sense to me, so either one of us must be
slightly confused.  I thought the only purpose of testing with the
poison was to find messages that must not be localized but were
localized by mistake.  For that, we have to make sure that anything
that uses test_i18n* is reading from Porcelain, in other words, we
must use the byte-for-byte comparison without using test_i18n* when
verifying the plumbing output.  And the primary requirement for this
arrangement to work is that the expected output in C locale and the
actual ouptut in the synthetic poison locale are reliably different.
They do not have to be reversible (I was actually going to suggest
rot13 of the original instead of cycling through the * gettext
poison * in your patch --- prefixing with QQ would not work, as it
is likely that the test with grep may not be anchored at the left
end).

Teaching test_i18n* to fuzzily match the expected output in C locale
against the actual output in synthetic poison locale may or may not
be doable, but spending any cycle working on that sounds like a
total waste of time (even though it might be fun).  It does not test
that we are translating Porcelain messages correctly.

Am I missing something?  Puzzled...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support generate poison .mo files for testing

2012-08-22 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 +static void translate(const char *msg, struct strbuf *buf)
 +{
 + const char *end = msg + strlen(msg);
 + const char *text = * GETTEXT POISON *;
 + int text_len = strlen(text);
 + int t = 0;
 +
 + strbuf_reset(buf);
 + /* preserve \n and printf format specifiers because msgfmt
 +barfs otherwise. */
 + while (msg  end) {
 + /* printf specifiers and shell variables, it's a quite
 +relax check */
 + if ((*msg == '%' || *msg == '$')  msg+1  end) {
 + strbuf_addch(buf, *msg++);
 + do
 +strbuf_addch(buf, *msg);
 + while (msg  end  !isspace(*msg++));

Aside from the Style:

do {
...
} while ();

why are you special casing a run of non-blank letters that begin
with a dollar sign (swapping two ints is done with %2$d %1$d, a
percent still at the beginning, so there must be something else I am
missing)?

Also why do you stop at isspace()?  Isn't a   (space) a flag that
means If the first character of a signed conversion is not a sign
or if a signed conversion results in no characters, a space shall
be prefixed to the result.

As the flags, min-width, precision, and length do not share the same
character as the conversion that has to come at the end, I think you
only want to do something like

/*
 * conversion specifier characters, taken from:
 * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
 */
static const char printf_conversion[] = diouxXfFeEgGaAcspnCS%;

...

while (msg  end) {
if (*msg == '%') {
strbuf_addch(buf, *msg++);
while (msg  end) {
int ch = *msg++;
strbuf_addch(buf, ch);
if (strchr(printf_conversion, ch))
break;
}
/* copied the printf part literally */
continue;
}
... keep \n ...
... muck with string ...
}

perhaps?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html