Re: [Haskell-cafe] haskell i18n best practices

2011-10-11 Thread Edward Kmett
On Thu, Sep 29, 2011 at 6:54 PM, Paulo Pocinho poci...@gmail.com wrote:

 Hello list.

 I've been trying to figure a nice method to provide localisation. An
 application is deployed using a conventional installer. The end-user
 is not required to have the Haskell runtimes, compiler or platform.
 The application should bundle ready to use translation data. What I am
 after is simple; an intuitive way that an interested translator, with
 little knowledge of Haskell, can look at and create valid translation
 data.


I've been meaning to bundle up some i18n/l10n code that I have lying around
from previous compiler projects.

What I was using was a gettext/printf template haskell function that can be
hunted for with xgettext, which expands to code that reads translated .po
files for the current module at two different times. Once at compile time to
check that any printf-style format strings are compatible across each
translation, and again later at runtime to allow for additional translations
to be added.

The biggest headache I have is that doing all this requires a pretty hairy
.cabal file, and I haven't yet figured out how to package that up nicely for
use in libraries.

I'll admit I have only ever really tested this with a joke
en@lolcattranslation, which I auto-translate with perl, though I admit
if I could
find a nice perl module for generating zalgo-style text, en@zalgo would be
pretty neat to auto-generate as well.

I'm not sure its considered best practice, since I haven't bundled it up
for third party use yet, but its *my* practice. ;)

-Edward Kmett



 This is what I've been looking at lately. The first thing I noticed
 was the GNU gettext implementation for Haskell. The wiki page [1] has
 a nice explanation by Aufheben. The hgettext package is found here
 [2].

 I don't know if this is a bad habit, but I had already separated the
 dialogue text in the code with variables holding the respective
 strings. At this time, I thought there could be some other way than
 gettext. Then I figured how to import localisation data, that the
 program loads, from external files. The data type is basically a tuple
 with variable-names associated with strings. This is bit like the
 file-embed package [3].

 Still uncomfortable with i18n, I learned about the article I18N in
 Haskell in yesod blog [4]. I'd like to hear more about it.

 What is considered the best practice for localisation?

 --
 [1]
 http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs
 [2] http://hackage.haskell.org/packages/archive/hgettext/
 [3] http://hackage.haskell.org/package/file-embed
 [4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-10-11 Thread Felipe Almeida Lessa
On Tue, Oct 11, 2011 at 5:03 PM, Edward Kmett ekm...@gmail.com wrote:
 I'll admit I have only ever really tested this with a joke en@lolcat
 translation, which I auto-translate with perl, though I admit if I could
 find a nice perl module for generating zalgo-style text, en@zalgo would be
 pretty neat to auto-generate as well.

Using Yesod's approach and assuming

  lolspeak :: String - String

you could have

  render_en_lolcat = lolspeak . render_en_US

Pretty neat! ;-D

Cheers,

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-10-03 Thread Yitzchak Gale
Thanks for all the great information provided in this thread.

The wiki page that Paulo originally linked had Vasyl's
fantastic documentation for using his hgettext package,
but it did not mention any of the other methods we discussed.

I moved the gettext documentation to its own linked page
and tried to collect together the general information from
this thread.

Please take a moment and look it over. Correct any
mistakes I made.

http://haskell.org/haskellwiki/Internationalization_of_Haskell_programs

Rogan, especially, please look it over. I really had to
read between the lines to come up with a clear and concise
description of GF and what it does, so I may have gotten
it wrong.

Felipe, I put your wonderful example on its own linked
page.

Thanks,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-10-03 Thread Felipe Almeida Lessa
On Mon, Oct 3, 2011 at 10:05 AM, Yitzchak Gale g...@sefer.org wrote:
 Felipe, I put your wonderful example on its own linked
 page.

Thanks, I'm always lazy with wikis =).  I've corrected a few typos
I've made on my e-mail there.

Cheers,

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-09-30 Thread Ertugrul Soeylemez
Paulo Pocinho poci...@gmail.com wrote:

 I don't know if this is a bad habit, but I had already separated the
 dialogue text in the code with variables holding the respective
 strings. At this time, I thought there could be some other way than
 gettext. Then I figured how to import localisation data, that the
 program loads, from external files. The data type is basically a tuple
 with variable-names associated with strings. This is bit like the
 file-embed package [3].

 Still uncomfortable with i18n, I learned about the article I18N in
 Haskell in yesod blog [4]. I'd like to hear more about it.

 What is considered the best practice for localisation?

I can't help you with best practice for Haskell, and I don't think there
is any.  Gettext is probably the easiest approach, because it integrates
nicely with the rest of the environment.  It automatically uses the
usual LANG and LC_* variables, which are used in Unix-like systems.

An even simpler (but not necessarily easier) approach is to hard-code
the languages in a Map and just look up the string you need.  In this
case you have to code the integration yourself.  It somewhat sounds like
you are targetting the Windows platform anyway.  Personally I'd likely
prefer Gettext for its integration and all the existing translation
tools.

In either case, the best practice is not to work with variables, but
with a default language.  You write your text strings in your default
language (usually English), but wrap them in a certain function call.
The function will try to look up a translated message for the current
language.  This makes both programming and translating easier.  This is
how I imagine it works (or should work):

main :: IO ()
main = do
tr - getTranslator
putStrLn (tr This is a test.)

The 'tr' function is called just '_' in other languages, but you can't
use the underscore in Haskell.  A translater (person) would use a
program to search your entire source code for those translatable
strings, then they would use a translation program, which shows an
English string and asks them to enter the translated string over and
over, until all strings are translated.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-09-30 Thread Felipe Almeida Lessa
On Thu, Sep 29, 2011 at 7:54 PM, Paulo Pocinho poci...@gmail.com wrote:
 Still uncomfortable with i18n, I learned about the article I18N in
 Haskell in yesod blog [4]. I'd like to hear more about it.

Yesod's approach is pretty nice [1].  The idea is to have a data type
with all your messages, like

  data Message =
Hello |
WhatsYourName |
MyNameIs String |
Ihave_apples Int
GoodBye

For each of your supported languages, you provide a rendering function
(they may be in separate source files)

  render_en_US :: Message - String
  render_en_US Hello = Hello!
  render_en_US WhatsYourName = What's your name?
  render_en_US (MyNameIs name) = My name is  ++ name ++ .
  render_en_US (Ihave_apples 0) = I don't have any apples.
  render_en_US (Ihave_apples 1) = I have one apple.
  render_en_US (Ihave_apples n) = I have  ++ n ++  apples.
  render_en_US GoodBye = Good bye!

  render_pt_BR :: Message - String
  render_pt_BR Hello = Olá!
  render_pt_BR WhatsYourName = Como você se chama?
  render_pt_BR (MyNameIs name) = Eu me chamo  ++ name ++ .
  render_pt_BR (Ihave_apples 0) = Não tenho nenhuma maçã.
  render_pt_BR (Ihave_apples 1) = Tenho uma maçã.
  render_pt_BR (Ihave_apples 2) = Tenho uma maçã.
  render_pt_BR (Ihave_apples n) = Tenho  ++ show n ++  maçãs.
  render_pt_BR GoodBye = Tchau!

Given those functions, you can construct something like

  type Lang = String

  render :: [Lang] - Message - String
  render (pt   :_) = render_pt_BR
  render (pt_BR:_) = render_pt_BR
  render (en   :_) = render_en_US
  render (en_US:_) = render_en_US
  render (_:xs) = render xs
  render _ = render_en_US

So 'r = render [fr, pt]' will do the right thing.  You just need
to pass this 'r' around in your code.  Using is easy and clear:

  putStrLn $ r Hello
  putStrLn $ r WhatsYourName
  name - getLine
  putStrLn $ r MyNameIs Alice
  putStrLn $ r (Ihave_apples $ length name `mod` 4)
  putStrLn $ r GoodBye

This approach is nice for several reasons:

 - Builtin support for complicated messages.  Making something like
Ihave_apples in gettext would be hard.  Each language has its own
rules, and you need to encode all of them in your code.  On this
example, my render_pt_BR recognizes and treats differently the 2
apples case.  If you didn't think about it when you wrote your code
(using gettext), you'd need to change your code for pt_BR.

 - Fast processing.  render as I've coded above looks at the
language list just once.  After that, it's just GHC's pattern
matching.

 - Fast startup.  No need to look for strings on the hard drive.

 - Flexible.  You may try several extensions, depending on your needs

(a) Using a type class (like Yesod) if you don't want one big data type.

(b) Using Text instead of String.  Or even Builder.

The biggest drawback is lack of tool support and lack of translators'
expertise.  gettext has a lot of inertia and is used everywhere on a
FLOSS system.  But as Ertugrul Soeylemez said, if you're targeting
Windows, _not_ using gettext should be an advantage (less pain while
create installers).

HTH,

[1] 
http://hackage.haskell.org/packages/archive/yesod-core/0.9.2/doc/html/Yesod-Message.html

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-09-30 Thread Rogan Creswick
On Thu, Sep 29, 2011 at 3:54 PM, Paulo Pocinho poci...@gmail.com wrote:
 Hello list.

 I've been trying to figure a nice method to provide localisation. An

The grammatical framework excels at translation and localization -- it
probably has the highest learning curve of the options; but it will
generate the best / most accurate text depending on the target
language:

 * http://www.grammaticalframework.org

At first brush, it may seem like extreme overkill; but it is able to
handle many, many infuriating corner cases (eg: properly forming
discontinuous constituents, updating case / tense and number to agree
with potentially variable quantities and genders, addressing the
absence of yes and no in some languages, etc...)

The language processing bits are expressed in a PMCFG grammar, which
uses a syntax similar to haskell.  The PMCFG compiles to a PGF file
that can be loaded and used by a haskell module that implements the
runtime, so it doesn't change your run-time requirements (if you
already rely on haskell, there are also runtime implementations in
javascript, java, c and python).

--Rogan

 application is deployed using a conventional installer. The end-user
 is not required to have the Haskell runtimes, compiler or platform.
 The application should bundle ready to use translation data. What I am
 after is simple; an intuitive way that an interested translator, with
 little knowledge of Haskell, can look at and create valid translation
 data.

 This is what I've been looking at lately. The first thing I noticed
 was the GNU gettext implementation for Haskell. The wiki page [1] has
 a nice explanation by Aufheben. The hgettext package is found here
 [2].

 I don't know if this is a bad habit, but I had already separated the
 dialogue text in the code with variables holding the respective
 strings. At this time, I thought there could be some other way than
 gettext. Then I figured how to import localisation data, that the
 program loads, from external files. The data type is basically a tuple
 with variable-names associated with strings. This is bit like the
 file-embed package [3].

 Still uncomfortable with i18n, I learned about the article I18N in
 Haskell in yesod blog [4]. I'd like to hear more about it.

 What is considered the best practice for localisation?

 --
 [1] 
 http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs
 [2] http://hackage.haskell.org/packages/archive/hgettext/
 [3] http://hackage.haskell.org/package/file-embed
 [4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-09-30 Thread Felipe Almeida Lessa
On Fri, Sep 30, 2011 at 5:44 PM, Rogan Creswick cresw...@gmail.com wrote:
 The grammatical framework excels at translation and localization -- it
 probably has the highest learning curve of the options; but it will
 generate the best / most accurate text depending on the target
 language:

  * http://www.grammaticalframework.org

 At first brush, it may seem like extreme overkill; but it is able to
 handle many, many infuriating corner cases (eg: properly forming
 discontinuous constituents, updating case / tense and number to agree
 with potentially variable quantities and genders, addressing the
 absence of yes and no in some languages, etc...)

 The language processing bits are expressed in a PMCFG grammar, which
 uses a syntax similar to haskell.  The PMCFG compiles to a PGF file
 that can be loaded and used by a haskell module that implements the
 runtime, so it doesn't change your run-time requirements (if you
 already rely on haskell, there are also runtime implementations in
 javascript, java, c and python).

I've seen GF before, but I can't actually see how one would use it for
localization.  Are there any simple examples?

Cheers, =)

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell i18n best practices

2011-09-30 Thread Rogan Creswick
On Fri, Sep 30, 2011 at 2:09 PM, Felipe Almeida Lessa
felipe.le...@gmail.com wrote:
 On Fri, Sep 30, 2011 at 5:44 PM, Rogan Creswick cresw...@gmail.com wrote:
 The grammatical framework excels at translation and localization -- it
 probably has the highest learning curve of the options; but it will
 generate the best / most accurate text depending on the target
 language:

  * http://www.grammaticalframework.org

 At first brush, it may seem like extreme overkill; but it is able to
 handle many, many infuriating corner cases (eg: properly forming
 discontinuous constituents, updating case / tense and number to agree
 with potentially variable quantities and genders, addressing the
 absence of yes and no in some languages, etc...)

 The language processing bits are expressed in a PMCFG grammar, which
 uses a syntax similar to haskell.  The PMCFG compiles to a PGF file
 that can be loaded and used by a haskell module that implements the
 runtime, so it doesn't change your run-time requirements (if you
 already rely on haskell, there are also runtime implementations in
 javascript, java, c and python).

 I've seen GF before, but I can't actually see how one would use it for
 localization.  Are there any simple examples?

Here's a *very* simple example I just threw together, based on the
Foods grammar (so it's quite contrived), but hopefully it's sufficient
for the moment:

https://github.com/creswick/gfI8N

Updating it to use the Phrasebook example would make it much more
interesting... I think there are numbers in there, and iirc, it uses
the actual resource grammars, which is what you really want for a real
system.

Usage details in the README.md, and I've commented the important
function in the haskell source.  The rest of the magic is in the (also
ugly) Setup.hs.

You will also need to manually install gf, I believe, even if you use
cabal-dev, due to some annoyingly complex (but solveable) build-order
and PATH complications.

--Rogan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] haskell i18n best practices

2011-09-29 Thread Paulo Pocinho
Hello list.

I've been trying to figure a nice method to provide localisation. An
application is deployed using a conventional installer. The end-user
is not required to have the Haskell runtimes, compiler or platform.
The application should bundle ready to use translation data. What I am
after is simple; an intuitive way that an interested translator, with
little knowledge of Haskell, can look at and create valid translation
data.

This is what I've been looking at lately. The first thing I noticed
was the GNU gettext implementation for Haskell. The wiki page [1] has
a nice explanation by Aufheben. The hgettext package is found here
[2].

I don't know if this is a bad habit, but I had already separated the
dialogue text in the code with variables holding the respective
strings. At this time, I thought there could be some other way than
gettext. Then I figured how to import localisation data, that the
program loads, from external files. The data type is basically a tuple
with variable-names associated with strings. This is bit like the
file-embed package [3].

Still uncomfortable with i18n, I learned about the article I18N in
Haskell in yesod blog [4]. I'd like to hear more about it.

What is considered the best practice for localisation?

--
[1] http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs
[2] http://hackage.haskell.org/packages/archive/hgettext/
[3] http://hackage.haskell.org/package/file-embed
[4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe