Re: Plugging self-hosted Joshua into mailman?

2017-01-19 Thread Karel Novotný


On 19.1.2017 15:15, Matt Post wrote:
> Karel — On this point, I don't think you should have to use the tutorials, 
> which tell you how to identify training data and build new translation models 
> yourself. I imagine that you would be more interested in downloading 
> pre-built models that don't really require you to be an expert in MT. See 
> this page:
>
>   https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs

Thanks Matt for clarifications: Actually did download the language pairs
yesterday and tried to run them to test the webapp by doing:

./joshua -server-port 5674 -server-type http
and
firefox "web/index.html?server=localhost=5674"

However, it started consuming more and more memory until it jammed my
computer completely (dual core 8GB ram). It might have been some bad
config on my side though, or some other omission.

Our sysadmin should be able to make use of the API you mentioned.

If all sentences must be sent separate Then I suppose that there is
no way that we would automatically re-compose any formatting
(paragraphs), right? Having translated text in one big block or as
separate phrases on separate lines might make translating of messages a
bit challenging.

As for the volume While this is difficult to estimate, I've made a
calculation based on monthly volume in list archives in the absolute
peak month. The average per day is approx 1000 sentences, so it might be
around 3000 in peak days.

thanks for your interest in this.

karel

>
> matt
>
>
>> On Jan 17, 2017, at 12:07 PM, lewis john mcgibbney  
>> wrote:
>>
>> Hi Karel,
>> The short answer is yes.
>> I would advise you to start at the Tutorial
>> https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started
>> If you find anything which causes you problems then please write back here.
>> Once you have skipped through the tutorial then you will have a much better
>> feel for the workflow required.
>> I can see the Apache Tika language identification and translate API's being
>> of particular use here when considered in a runtime context. We have a
>> Joshua implementation over in Tika which can aid you in this task however
>> try the Joshua tutorial first.
>> Lewis
>>
>> On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann  wrote:
>>
>>> Hi Karel,
>>>
>>> I would recommend moving this thread to dev@joshua.incubator.apache.org
>>> instead of the private list. I’ve moved private to BCC.
>>>
>>> Thank you.
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>
>>> On 1/16/17, 6:58 AM, wrote:
>>>
>>>Hello,
>>>
>>>We would like to build a self-hosted machine translation system that
>>>could be plugged into our mailman installs. The objective is that the
>>>members of our multicultural network would be able to send email in
>>>their mother language and it would be delivered to the list
>>>machine-translated (and vise versa).
>>>
>>>Are we on the right track with Joshua? I suppose that a lot of
>>>configuration would be needed, but at this point I want to know if I am
>>>not completely mistaken when considering your sw for this.
>>>
>>>Thanks
>>>
>>>karel
>>>
>>>
>>>--
>>>~~~
>>>Karel Novotny
>>>Knowledge Sharing & Network Development Coordinator
>>>APC - The Association for Progressive Communications
>>>https://www.apc.org
>>>GSM: +420 605 243 246 (GMT +1)
>>>jabber: ka...@riseup.net
>>>Working/online: Monday - Thursday
>>>~~~
>>>My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get=
>>> 0x7FDEF502377E4FCA
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>

-- 
~~~
Karel Novotny 
Knowledge Sharing & Network Development Coordinator
APC - The Association for Progressive Communications 
https://www.apc.org
GSM: +420 605 243 246 (GMT +1)
jabber: ka...@riseup.net
Working/online: Monday - Thursday
~~~
My public OpenPGP key: 
https://pgp.mit.edu/pks/lookup?op=get=0x7FDEF502377E4FCA




Re: Plugging self-hosted Joshua into mailman?

2017-01-19 Thread Karel Novotný
Yes. Will respond there. Thanks


On 19.1.2017 15:13, Matt Post wrote:
> (Are you on the dev@joshua mailing list?)
>
>
>> On Jan 17, 2017, at 11:55 AM, Karel Novotný > > wrote:
>>
>> Hello Matt,
>>
>> Thanks for responding...
>>
>> On 17.1.2017 17:31, Matt Post wrote:
>>> Hello,
>>>
>>> Joshua would be suitable to this. We have models built for FR→EN and
>>> ES→EN. I want to improve these because some certain data was left
>>> out. I could also build ones for the other direction.
>> That's excellent news. Can you please tell me a bit more about what you
>> mean by having models for FR→EN and ES→EN ? Does this mean that the tool
>> is ready to be used by other applications (e.g. mailman) to
>> auto-translate?
>>
>> Have you had any previous experience with similar implementation as I
>> described?
>
> This just means we have pre-built models (which we call "language
> packs") that you can just download and immediately use to translate
> from French to English and from Spanish to English. For the complete
> list of language packs, along with instructions for how to use it, see
> this page:
>
> https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
>
> You can just download any of these, unpack them, and start
> translating. The quality will vary, but for these two languages should
> be reasonable.
>
> To translate, the data you send to Joshua has to have already been
> sentence-split, because Joshua expects to receive input one sentence
> at a time. Joshua provides an API that you can make use of. Do you
> have any kind of expectations about your volume requirements? How many
> sentences will you be translating per day?
>
> matt
>
>
>>>
>>> One question — What do you mean about 3rd party services being
>>> "untrustworthy"?
>>
>> We wish to auto-translate lists with private conversations, so we can
>> not run those by systems where we don't know (don't have control of)
>> what happens with the data. That's all, I didn't want to accuse anyone.
>
> Oh, that makes perfect sense. For some reason I assumed you were
> translating public mailing lists, but if you're doing private ones, it
> is reasonable to want to keep the data entirely in-house.
>
>
>> thanks
>>
>> karel
>>
>>>
>>> matt
>>>
>>>
 On Jan 16, 2017, at 12:27 PM, Karel Novotný > wrote:

 Hello developers,

 I am new to this list, so missing a lot of background. Apologies
 beforehand for eventually dumb questions...

 We would like to build a self-hosted machine translation system that
 could be plugged into our mailman installs. The objective is that the
 members of our multicultural network would be able to send email in
 their mother language and it would be delivered to the list
 machine-translated (and vise versa). The translation pairs we care
 about
 most are EN<->FR and EN<->ES

 Our dream scenario is:

 1. A translator machine is installed on our server, so the messages
 don't need to be run through untrustworthy 3rd party services
 (googletrans)
 2. Mailman (or similar) is connected to such a translator
 3. Mailing list users can opt to receive messages sent to the mailing
 list in following format:

 
 Message body
 --
 Message body translated
 -

 4. Similarly, the system can be configured so that when receiving
 messages from specific senders the messages get translated from FR
 or ES
 into EN

 Our default language used on lists is EN

 Is Joshua relevant for this? Any previous experience with similar
 setup?
 I suppose that a lot of configuration would be needed, but at this
 point
 I want to know if I am not completely mistaken when considering your
 Joshua for this.

 Thanks

 karel

 ---

 -- 
 ~~~
 Karel Novotny 
 Knowledge Sharing & Network Development Coordinator
 APC - The Association for Progressive Communications 
 https://www.apc.org
 GSM: +420 605 243 246 (GMT +1)
 jabber: ka...@riseup.net
 Working/online: Monday - Thursday
 ~~~
 My public OpenPGP key:
 https://pgp.mit.edu/pks/lookup?op=get=0x7FDEF502377E4FCA


>>>
>>
>> -- 
>> ~~~
>> Karel Novotny 
>> Knowledge Sharing & Network Development Coordinator
>> APC - The Association for Progressive Communications 
>> https://www.apc.org 
>> GSM: +420 605 243 246 (GMT +1)
>> jabber: ka...@riseup.net 
>> Working/online: Monday - Thursday
>> ~~~
>> My public OpenPGP
>> key: https://pgp.mit.edu/pks/lookup?op=get=0x7FDEF502377E4FCA 
>> 

Re: Plugging self-hosted Joshua into mailman?

2017-01-19 Thread Matt Post

> On Jan 17, 2017, at 11:55 AM, Karel Novotný  wrote:
> 
> Hello Matt,
> 
> Thanks for responding...
> 
> On 17.1.2017 17:31, Matt Post wrote:
>> Hello,
>> 
>> Joshua would be suitable to this. We have models built for FR→EN and ES→EN. 
>> I want to improve these because some certain data was left out. I could also 
>> build ones for the other direction.
> That's excellent news. Can you please tell me a bit more about what you
> mean by having models for FR→EN and ES→EN ? Does this mean that the tool
> is ready to be used by other applications (e.g. mailman) to auto-translate?
> 
> Have you had any previous experience with similar implementation as I
> described?

This just means we have pre-built models (which we call "language packs") that 
you can just download and immediately use to translate from French to English 
and from Spanish to English. For the complete list of language packs, along 
with instructions for how to use it, see this page:

https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs

You can just download any of these, unpack them, and start translating. The 
quality will vary, but for these two languages should be reasonable.

To translate, the data you send to Joshua has to have already been 
sentence-split, because Joshua expects to receive input one sentence at a time. 
Joshua provides an API that you can make use of. Do you have any kind of 
expectations about your volume requirements? How many sentences will you be 
translating per day?

matt


>> 
>> One question — What do you mean about 3rd party services being 
>> "untrustworthy"?
> 
> We wish to auto-translate lists with private conversations, so we can
> not run those by systems where we don't know (don't have control of)
> what happens with the data. That's all, I didn't want to accuse anyone.

Oh, that makes perfect sense. For some reason I assumed you were translating 
public mailing lists, but if you're doing private ones, it is reasonable to 
want to keep the data entirely in-house.


> thanks
> 
> karel
> 
>> 
>> matt
>> 
>> 
>>> On Jan 16, 2017, at 12:27 PM, Karel Novotný  wrote:
>>> 
>>> Hello developers,
>>> 
>>> I am new to this list, so missing a lot of background. Apologies
>>> beforehand for eventually dumb questions...
>>> 
>>> We would like to build a self-hosted machine translation system that
>>> could be plugged into our mailman installs. The objective is that the
>>> members of our multicultural network would be able to send email in
>>> their mother language and it would be delivered to the list
>>> machine-translated (and vise versa). The translation pairs we care about
>>> most are EN<->FR and EN<->ES
>>> 
>>> Our dream scenario is:
>>> 
>>> 1. A translator machine is installed on our server, so the messages
>>> don't need to be run through untrustworthy 3rd party services (googletrans)
>>> 2. Mailman (or similar) is connected to such a translator
>>> 3. Mailing list users can opt to receive messages sent to the mailing
>>> list in following format:
>>> 
>>> 
>>> Message body
>>> --
>>> Message body translated
>>> -
>>> 
>>> 4. Similarly, the system can be configured so that when receiving
>>> messages from specific senders the messages get translated from FR or ES
>>> into EN
>>> 
>>> Our default language used on lists is EN
>>> 
>>> Is Joshua relevant for this? Any previous experience with similar setup?
>>> I suppose that a lot of configuration would be needed, but at this point
>>> I want to know if I am not completely mistaken when considering your
>>> Joshua for this.
>>> 
>>> Thanks
>>> 
>>> karel
>>> 
>>> ---
>>> 
>>> -- 
>>> ~~~
>>> Karel Novotny 
>>> Knowledge Sharing & Network Development Coordinator
>>> APC - The Association for Progressive Communications 
>>> https://www.apc.org
>>> GSM: +420 605 243 246 (GMT +1)
>>> jabber: ka...@riseup.net
>>> Working/online: Monday - Thursday
>>> ~~~
>>> My public OpenPGP key: 
>>> https://pgp.mit.edu/pks/lookup?op=get=0x7FDEF502377E4FCA
>>> 
>>> 
>> 
> 
> -- 
> ~~~
> Karel Novotny 
> Knowledge Sharing & Network Development Coordinator
> APC - The Association for Progressive Communications 
> https://www.apc.org 
> GSM: +420 605 243 246 (GMT +1)
> jabber: ka...@riseup.net
> Working/online: Monday - Thursday
> ~~~
> My public OpenPGP key: 
> https://pgp.mit.edu/pks/lookup?op=get=0x7FDEF502377E4FCA 
> 


Re: Plugging self-hosted Joshua into mailman?

2017-01-19 Thread Matt Post
Karel — On this point, I don't think you should have to use the tutorials, 
which tell you how to identify training data and build new translation models 
yourself. I imagine that you would be more interested in downloading pre-built 
models that don't really require you to be an expert in MT. See this page:

https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs

matt


> On Jan 17, 2017, at 12:07 PM, lewis john mcgibbney  wrote:
> 
> Hi Karel,
> The short answer is yes.
> I would advise you to start at the Tutorial
> https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started
> If you find anything which causes you problems then please write back here.
> Once you have skipped through the tutorial then you will have a much better
> feel for the workflow required.
> I can see the Apache Tika language identification and translate API's being
> of particular use here when considered in a runtime context. We have a
> Joshua implementation over in Tika which can aid you in this task however
> try the Joshua tutorial first.
> Lewis
> 
> On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann  wrote:
> 
>> Hi Karel,
>> 
>> I would recommend moving this thread to dev@joshua.incubator.apache.org
>> instead of the private list. I’ve moved private to BCC.
>> 
>> Thank you.
>> 
>> Cheers,
>> Chris
>> 
>> 
>> 
>> On 1/16/17, 6:58 AM, wrote:
>> 
>>Hello,
>> 
>>We would like to build a self-hosted machine translation system that
>>could be plugged into our mailman installs. The objective is that the
>>members of our multicultural network would be able to send email in
>>their mother language and it would be delivered to the list
>>machine-translated (and vise versa).
>> 
>>Are we on the right track with Joshua? I suppose that a lot of
>>configuration would be needed, but at this point I want to know if I am
>>not completely mistaken when considering your sw for this.
>> 
>>Thanks
>> 
>>karel
>> 
>> 
>>--
>>~~~
>>Karel Novotny
>>Knowledge Sharing & Network Development Coordinator
>>APC - The Association for Progressive Communications
>>https://www.apc.org
>>GSM: +420 605 243 246 (GMT +1)
>>jabber: ka...@riseup.net
>>Working/online: Monday - Thursday
>>~~~
>>My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get=
>> 0x7FDEF502377E4FCA
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney



Re: mvn assembly issues

2017-01-19 Thread Matt Post
I have never seen this error before! It seems like this must have something to 
do with the build environment where this is being done? Maybe there are tar 
options to not store the userid or to set it to something?


> On Jan 18, 2017, at 9:08 PM, David Meikle  wrote:
> 
> Hey Lewis,
> 
>> On 18 Jan 2017, at 22:02, lewis john mcgibbney  wrote:
>> 
>> Hi Folks,
>> Anyone know how to work through this issue? The code in question can be
>> found at
>> https://github.com/apache/incubator-joshua/blob/master/pom.xml#L287-L309
>> Lewis
>> 
>> [INFO]
>> 
>> [INFO] BUILD FAILURE
>> [INFO]
>> 
>> [INFO] Total time: 16.222 s
>> [INFO] Finished at: 2017-01-18T13:59:41-08:00
>> [INFO] Final Memory: 37M/639M
>> [INFO]
>> 
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single
>> (source-release-assembly) on project joshua-incubating: Execution
>> source-release-assembly of goal
>> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single failed: user id
>> '498339010' is too big ( > 2097151 ). -> [Help 1]
>> [ERROR]
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
>> switch.
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>> [ERROR]
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
>> 
>> -- 
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
> 
> 
> Normally the switching tar to posix mode does the trick when I have had this 
> before - normally when logged into a AD domain on my Mac.  What is the full 
> log with -X saying?
> 
> Cheers,
> Dave
>