Re: [VOTE] Release Apache Joshua 6.1 RC#2

2016-11-23 Thread Tommaso Teofili
+1

Tommaso

Il giorno mer 23 nov 2016 alle ore 15:25 kellen sunderland <
kellen.sunderl...@gmail.com> ha scritto:

> +1, many thanks Lewis.
>
> On Wed, Nov 23, 2016 at 2:34 PM, Matt Post  wrote:
>
> > +1 Thanks, Lewis!
> >
> >
> > > On Nov 23, 2016, at 12:15 AM, lewis john mcgibbney  >
> > wrote:
> > >
> > > Hello user@ and dev,
> > > Please VOTE on the Apache Joshua 6.1 Release Candidate #2.
> > >
> > > We solved 50 issues: https://s.apache.org/joshua6.1
> > >
> > > Git source tag (29c8be650d53216f779a340d33f8f61af4d45629):
> > > https://s.apache.org/pk2t 
> > >
> > > Staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapachejoshua-1001/
> > >  > orgapachejoshua-1000/>
> > >
> > > Source Release Artifacts: https://dist.apache.org/repos/
> > > dist/dev/incubator/joshua/
> > >
> > > PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/
> > > dist/release/incubator/joshua/KEYS
> > >
> > > Vote will be open for 72 hours.
> > > Thank you to everyone that is able to VOTE as well as everyone that
> > > contributed to Apache Joshua 6.1.
> > >
> > > [ ] +1, let's get it released!!!
> > > [ ] +/-0, fine, but consider to fix few issues before...
> > > [ ] -1, nope, because... (and please explain why)
> > >
> > > P.S. here is my +1
> > >
> > > --
> > > http://home.apache.org/~lewismc/
> > > @hectorMcSpector
> > > http://www.linkedin.com/in/lmcgibbney
> >
> >
>


Re: Any symal experts?

2016-11-23 Thread Matt Post
I think it will be much less of a headache. The GIZA++ code is notorious for 
being unreadable, and the Perl piece of that pipeline only hurts (even though 
Philipp's Perl is unusually clear). I think adding atools to your port is the 
way to go, and that it's written in C++ should facilitate that.




> On Nov 23, 2016, at 12:25 PM, John Hewitt  wrote:
> 
> It'll be a headache because it also has no documentation, but to be fair it
> may be less of a headache / a better long-term solution than trying to move
> forward with this hackier solution.
> 
> I'll keep the symal use on the backburner and start putting together an
> atools port.
> 
> -John
> 
> On Wed, Nov 23, 2016 at 12:18 PM, Matt Post  wrote:
> 
>> John — I suggest trying to ditch those GIZA++ tools entirely. fast_align
>> indeed replaced them with "atools"; how much work would it be to port that?
>> 
>> 
>>> On Nov 23, 2016, at 12:11 PM, John Hewitt 
>> wrote:
>>> 
>>> Hey everyone,
>>> 
>>> I'm packaging up a Java port Fast Align for Joshua and integrating it
>> into
>>> the pipeline.
>>> Fast Align does not produce symmetrical alignments -- it relies on a tool
>>> that I haven't ported to Java.
>>> We package symal (which symmetricizes alignments) with Joshua right now
>> for
>>> GIZA++, so I'm attempting to re-use that.
>>> However, symal uses the .bal format, which it fails to describe.
>>> It gets away with this because files from GIZA++ are piped through
>>> giza2bal.pl, which itself is not well documented.
>>> I'm attempting to write, say, fastalign2bal.py.
>>> With a bit of tinkering, I got at the .bal format:
>>> 
>>> 1
>>> 
>>> 7 jehovah said to moses and aaron :  # 3 2 2 4 5 6 8
>>> 
>>> 8 i řekl hospodin mojžíšovi a aronovi takto :  # 2 2 1 4 5 6 6 7
>>> 
>>> A template for which would be
>>> 
>>> 1
>>> 
>>> NUM_TGT_TOKENS [tgt_token1 tgt_token2 ... tgt_tokenN] # [alignment1
>>> alignment2 ... alignmentN]
>>> NUM_SRC_TOKENS [src_token1 src_token2 ... src_tokenN] # [alignment1
>>> alignment2 ... alignmentN]
>>> 
>>> 
>>> However, I'm hitting some pretty nasty errors with symal when I pipe in
>>> some fastalign2bal.py output.
>>> A few hours with gdb made some progress (for as far as I can tell, the
>>> formats are identical) but if anyone has experience with symal, I would
>>> greatly appreciate some consultation.
>>> 
>>> -John
>> 
>> 



Re: Any symal experts?

2016-11-23 Thread John Hewitt
It'll be a headache because it also has no documentation, but to be fair it
may be less of a headache / a better long-term solution than trying to move
forward with this hackier solution.

I'll keep the symal use on the backburner and start putting together an
atools port.

-John

On Wed, Nov 23, 2016 at 12:18 PM, Matt Post  wrote:

> John — I suggest trying to ditch those GIZA++ tools entirely. fast_align
> indeed replaced them with "atools"; how much work would it be to port that?
>
>
> > On Nov 23, 2016, at 12:11 PM, John Hewitt 
> wrote:
> >
> > Hey everyone,
> >
> > I'm packaging up a Java port Fast Align for Joshua and integrating it
> into
> > the pipeline.
> > Fast Align does not produce symmetrical alignments -- it relies on a tool
> > that I haven't ported to Java.
> > We package symal (which symmetricizes alignments) with Joshua right now
> for
> > GIZA++, so I'm attempting to re-use that.
> > However, symal uses the .bal format, which it fails to describe.
> > It gets away with this because files from GIZA++ are piped through
> > giza2bal.pl, which itself is not well documented.
> > I'm attempting to write, say, fastalign2bal.py.
> > With a bit of tinkering, I got at the .bal format:
> >
> > 1
> >
> > 7 jehovah said to moses and aaron :  # 3 2 2 4 5 6 8
> >
> > 8 i řekl hospodin mojžíšovi a aronovi takto :  # 2 2 1 4 5 6 6 7
> >
> > A template for which would be
> >
> > 1
> >
> > NUM_TGT_TOKENS [tgt_token1 tgt_token2 ... tgt_tokenN] # [alignment1
> > alignment2 ... alignmentN]
> > NUM_SRC_TOKENS [src_token1 src_token2 ... src_tokenN] # [alignment1
> > alignment2 ... alignmentN]
> >
> >
> > However, I'm hitting some pretty nasty errors with symal when I pipe in
> > some fastalign2bal.py output.
> > A few hours with gdb made some progress (for as far as I can tell, the
> > formats are identical) but if anyone has experience with symal, I would
> > greatly appreciate some consultation.
> >
> > -John
>
>


Re: Any symal experts?

2016-11-23 Thread Matt Post
John — I suggest trying to ditch those GIZA++ tools entirely. fast_align indeed 
replaced them with "atools"; how much work would it be to port that?


> On Nov 23, 2016, at 12:11 PM, John Hewitt  wrote:
> 
> Hey everyone,
> 
> I'm packaging up a Java port Fast Align for Joshua and integrating it into
> the pipeline.
> Fast Align does not produce symmetrical alignments -- it relies on a tool
> that I haven't ported to Java.
> We package symal (which symmetricizes alignments) with Joshua right now for
> GIZA++, so I'm attempting to re-use that.
> However, symal uses the .bal format, which it fails to describe.
> It gets away with this because files from GIZA++ are piped through
> giza2bal.pl, which itself is not well documented.
> I'm attempting to write, say, fastalign2bal.py.
> With a bit of tinkering, I got at the .bal format:
> 
> 1
> 
> 7 jehovah said to moses and aaron :  # 3 2 2 4 5 6 8
> 
> 8 i řekl hospodin mojžíšovi a aronovi takto :  # 2 2 1 4 5 6 6 7
> 
> A template for which would be
> 
> 1
> 
> NUM_TGT_TOKENS [tgt_token1 tgt_token2 ... tgt_tokenN] # [alignment1
> alignment2 ... alignmentN]
> NUM_SRC_TOKENS [src_token1 src_token2 ... src_tokenN] # [alignment1
> alignment2 ... alignmentN]
> 
> 
> However, I'm hitting some pretty nasty errors with symal when I pipe in
> some fastalign2bal.py output.
> A few hours with gdb made some progress (for as far as I can tell, the
> formats are identical) but if anyone has experience with symal, I would
> greatly appreciate some consultation.
> 
> -John



Any symal experts?

2016-11-23 Thread John Hewitt
Hey everyone,

I'm packaging up a Java port Fast Align for Joshua and integrating it into
the pipeline.
Fast Align does not produce symmetrical alignments -- it relies on a tool
that I haven't ported to Java.
We package symal (which symmetricizes alignments) with Joshua right now for
GIZA++, so I'm attempting to re-use that.
However, symal uses the .bal format, which it fails to describe.
It gets away with this because files from GIZA++ are piped through
giza2bal.pl, which itself is not well documented.
I'm attempting to write, say, fastalign2bal.py.
With a bit of tinkering, I got at the .bal format:

1

7 jehovah said to moses and aaron :  # 3 2 2 4 5 6 8

8 i řekl hospodin mojžíšovi a aronovi takto :  # 2 2 1 4 5 6 6 7

A template for which would be

1

NUM_TGT_TOKENS [tgt_token1 tgt_token2 ... tgt_tokenN] # [alignment1
alignment2 ... alignmentN]
NUM_SRC_TOKENS [src_token1 src_token2 ... src_tokenN] # [alignment1
alignment2 ... alignmentN]


However, I'm hitting some pretty nasty errors with symal when I pipe in
some fastalign2bal.py output.
A few hours with gdb made some progress (for as far as I can tell, the
formats are identical) but if anyone has experience with symal, I would
greatly appreciate some consultation.

-John


Re: Dockerhub hosted images

2016-11-23 Thread kellen sunderland
Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
something similar.  I think the default command should eventually be to run
the http server, so ideally we'd just do 'docker run -p 5674
 kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
server on port 5674.

Good point on Perl + Python, I can add them.

-Kellen

On Wed, Nov 23, 2016 at 3:22 PM, Matt Post  wrote:

> Okay, I have this with
>
> docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>
> It seems we are missing Perl (./prepare.sh fails), and we should replace
> the LanguageModel line with a KenLM instance and build that. I bet we'll
> need Python, too.
>
>
>
>
> > On Nov 23, 2016, at 8:15 AM, Matt Post  wrote:
> >
> > Kellen, can I bother you to post a few first steps? I've successfully
> pulled this down to my mac but now do not know how to find it, edit it, or
> run it. I'm porting through the documentation and will find it eventually
> but this would save me a bit of time.
> >
> >
> >> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> >>
> >> Yes my next step was going to be getting it hosted officially.
> >>
> >> I'll go ahead and open a ticket.  I think I'll hold off on pushing to
> the
> >> Apache account until I've done a little more testing though.
> >>
> >> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" 
> wrote:
> >>
> >>> Hi Kellen,
> >>> Nice :)
> >>> Another option is for us to host these via the Apache account.
> >>> https://hub.docker.com/r/apache/
> >>> We could then add a badge to our README which points to the
> Dockerfile(s).
> >>> Do you want to open a ticket over on the INFRA Jira for this?
> >>>
> >>> On Tue, Nov 22, 2016 at 1:57 PM, <
> >>> dev-digest-h...@joshua.incubator.apache.org> wrote:
> >>>
>  From: kellen sunderland 
>  To: "dev@joshua.incubator.apache.org"  org>
>  Cc:
>  Date: Tue, 22 Nov 2016 22:56:56 +0100
>  Subject: Re: Dockerhub hosted images
>  Ok, the first image should be properly uploaded now.
> 
>  https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
> 
>  -Kellen
> 
> 
> >>>
> >
>
>


Re: Dockerhub hosted images

2016-11-23 Thread Matt Post
Okay, I have this with

docker run -it kellens/apache-joshua-es-en-2016-10-05 bash

It seems we are missing Perl (./prepare.sh fails), and we should replace the 
LanguageModel line with a KenLM instance and build that. I bet we'll need 
Python, too.




> On Nov 23, 2016, at 8:15 AM, Matt Post  wrote:
> 
> Kellen, can I bother you to post a few first steps? I've successfully pulled 
> this down to my mac but now do not know how to find it, edit it, or run it. 
> I'm porting through the documentation and will find it eventually but this 
> would save me a bit of time.
> 
> 
>> On Nov 23, 2016, at 8:07 AM, kellen sunderland  
>> wrote:
>> 
>> Yes my next step was going to be getting it hosted officially.
>> 
>> I'll go ahead and open a ticket.  I think I'll hold off on pushing to the
>> Apache account until I've done a little more testing though.
>> 
>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney"  wrote:
>> 
>>> Hi Kellen,
>>> Nice :)
>>> Another option is for us to host these via the Apache account.
>>> https://hub.docker.com/r/apache/
>>> We could then add a badge to our README which points to the Dockerfile(s).
>>> Do you want to open a ticket over on the INFRA Jira for this?
>>> 
>>> On Tue, Nov 22, 2016 at 1:57 PM, <
>>> dev-digest-h...@joshua.incubator.apache.org> wrote:
>>> 
 From: kellen sunderland 
 To: "dev@joshua.incubator.apache.org" 
 Cc:
 Date: Tue, 22 Nov 2016 22:56:56 +0100
 Subject: Re: Dockerhub hosted images
 Ok, the first image should be properly uploaded now.
 
 https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
 
 -Kellen
 
 
>>> 
> 



Re: [VOTE] Release Apache Joshua 6.1 RC#2

2016-11-23 Thread Matt Post
+1 Thanks, Lewis!


> On Nov 23, 2016, at 12:15 AM, lewis john mcgibbney  wrote:
> 
> Hello user@ and dev,
> Please VOTE on the Apache Joshua 6.1 Release Candidate #2.
> 
> We solved 50 issues: https://s.apache.org/joshua6.1
> 
> Git source tag (29c8be650d53216f779a340d33f8f61af4d45629):
> https://s.apache.org/pk2t 
> 
> Staging repo:
> https://repository.apache.org/content/repositories/orgapachejoshua-1001/
> 
> 
> Source Release Artifacts: https://dist.apache.org/repos/
> dist/dev/incubator/joshua/
> 
> PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
> dist/release/incubator/joshua/KEYS
> 
> Vote will be open for 72 hours.
> Thank you to everyone that is able to VOTE as well as everyone that
> contributed to Apache Joshua 6.1.
> 
> [ ] +1, let's get it released!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
> 
> P.S. here is my +1
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney



Re: Dockerhub hosted images

2016-11-23 Thread Matt Post
Kellen, can I bother you to post a few first steps? I've successfully pulled 
this down to my mac but now do not know how to find it, edit it, or run it. I'm 
porting through the documentation and will find it eventually but this would 
save me a bit of time.


> On Nov 23, 2016, at 8:07 AM, kellen sunderland  
> wrote:
> 
> Yes my next step was going to be getting it hosted officially.
> 
> I'll go ahead and open a ticket.  I think I'll hold off on pushing to the
> Apache account until I've done a little more testing though.
> 
> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney"  wrote:
> 
>> Hi Kellen,
>> Nice :)
>> Another option is for us to host these via the Apache account.
>> https://hub.docker.com/r/apache/
>> We could then add a badge to our README which points to the Dockerfile(s).
>> Do you want to open a ticket over on the INFRA Jira for this?
>> 
>> On Tue, Nov 22, 2016 at 1:57 PM, <
>> dev-digest-h...@joshua.incubator.apache.org> wrote:
>> 
>>> From: kellen sunderland 
>>> To: "dev@joshua.incubator.apache.org" 
>>> Cc:
>>> Date: Tue, 22 Nov 2016 22:56:56 +0100
>>> Subject: Re: Dockerhub hosted images
>>> Ok, the first image should be properly uploaded now.
>>> 
>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>>> 
>>> -Kellen
>>> 
>>> 
>> 



Re: Dockerhub hosted images

2016-11-23 Thread kellen sunderland
Yes my next step was going to be getting it hosted officially.

I'll go ahead and open a ticket.  I think I'll hold off on pushing to the
Apache account until I've done a little more testing though.

On Nov 23, 2016 5:22 AM, "lewis john mcgibbney"  wrote:

> Hi Kellen,
> Nice :)
> Another option is for us to host these via the Apache account.
> https://hub.docker.com/r/apache/
> We could then add a badge to our README which points to the Dockerfile(s).
> Do you want to open a ticket over on the INFRA Jira for this?
>
> On Tue, Nov 22, 2016 at 1:57 PM, <
> dev-digest-h...@joshua.incubator.apache.org> wrote:
>
> > From: kellen sunderland 
> > To: "dev@joshua.incubator.apache.org" 
> > Cc:
> > Date: Tue, 22 Nov 2016 22:56:56 +0100
> > Subject: Re: Dockerhub hosted images
> > Ok, the first image should be properly uploaded now.
> >
> > https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
> >
> > -Kellen
> >
> >
>


test non apache account

2016-11-23 Thread Matt Post


matt (from my phone)