[jira] [Commented] (JOSHUA-331) Address Apache Joshua 6.1 RC#3 Issues

2017-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900200#comment-15900200
 ] 

Hudson commented on JOSHUA-331:
---

SUCCESS: Integrated in Jenkins build joshua_master #195 (See 
[https://builds.apache.org/job/joshua_master/195/])
JOSHUA-331 - added IDEA project files to .gitignore (tommaso: rev 
f198d664de20df94eb9aa1c4fb33c2a94e9a6d73)
* (edit) .gitignore


> Address Apache Joshua 6.1 RC#3 Issues
> -
>
> Key: JOSHUA-331
> URL: https://issues.apache.org/jira/browse/JOSHUA-331
> Project: Joshua
>  Issue Type: Task
>  Components: release
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.1
>
>
> Address the following issues:
> {quote}
> Every ASF release MUST contain one or more source packages, which MUST be
> sufficient for a user to build and test the release provided they have
> access to the appropriate platform and tools. - NO
> -Not building due to failing test (BerkleyLM failure).  I'm digging a
> bit more into this.
> {quote}
> {quote}
> Every artifact distributed to the public through Apache channels MUST be
> accompanied by one file containing an OpenPGP compatible ASCII armored
> detached signature and another file containing an MD5 checksum.
> - .asc - NO
> I get warning:
> "gpg --verify joshua-incubating-6.1-src.tar.gz.asc
> joshua-incubating-6.1-src.tar.gz
> gpg: Signature made Thu Feb 23 09:15:17 2017 CET using RSA key ID
> 891768A5
> gpg: Good signature from "Tommaso Teofili "
> [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner."
> - .md5 - NO
> My md5 of joshua-incubating-6.1-src.tar.gz is
> 504976876b01294811293aa45b5400f5, the joshua-incubating-6.1-src.tar.gz.md5
> indicates it should be 22b738eeae45757715080702a5bd2789
> - .sha - NO
> My sha of joshua-incubating-6.1-src.tar.gz is
> 4AB5BA24301590F36AE6452DACC3F21CBD8B3FEC, the
> joshua-incubating-6.1-src.tar.gz.md5 indicates it should be
> 2a55b6d341dddc5369b22a4802a86ec40accd0a1
> - KEYS - YES
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Dockerhub hosted images

2017-03-07 Thread Matt Post
FYI, I stress-tested the Joshua server with the following protocol: for both 
the TCP and HTTP servers, I started a six-thread server, and then sent five 
simultaneous 16k documents at each. The translation times were as follows:

TCP: (times: 8:07 8:06 8:06)

for x in 1 2 3 4; do for num in $(seq 1 5); do cat corpus.es | nc 
localhost 5674 > t.tcp.$num & done; time wait; done)

HTTP: (times: 7:25 7:34 7:20)

for x in 1 2 3 4; do for num in $(seq 1 5); do 
/home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p 
5674 corpus.es > t.out.$num & done; time wait; done

The HTTP query takes 100 lines of the test set at a time, constructs the 
RESTful query string (with 100 url-encoded "q=..." lines), and sends it to the 
server.

So the bottom line is that the HTTP server both has an extended 
Google-translate API (which also supports other things like adding rules) and 
is a bit faster.

I'm documenting the RESTful API here: 
https://cwiki.apache.org/confluence/display/JOSHUA/RESTful+API

matt


> On Mar 3, 2017, at 11:24 AM, Matt Post  wrote:
> 
> Folks,
> 
> I've updated the code with a few changes that will support Dockerized 
> language packs. The nice thing is that this makes it easy to include KenLM.
> 
> Here are some changes that were made:
> 
> - Joshua now notes what directory the config file was found in and loads 
> relative paths found in the config file relative to that directory 
> automatically. This means you don't have to "cd" to the LP (language pack) 
> directory before running Joshua.
> 
> - I fixed the HTTP server to take multiple "q=" lines, just like the Google 
> translate API. Before, they only took one "q=" line. This should mean (I'll 
> test later today) that the HTTP server can handle throughput essentially at 
> the rates of the TCP server.
> 
> - I added (but haven't pushed yet) the KenLM model files to the language 
> packs. In addition, I added a file "joshua.config.kenlm". These are not used 
> except by Docker.
> 
> - I fixed the docker setup. See the new file:
> 
>   
> https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile
>  
> 
> 
> This docker container builds KenLM. It then expects to be run with docker 
> mounting an existing language pack to /model. It then runs the 
> joshua.config.kenlm file, running it as a server in HTTP mode. See the README 
> file for information:
> 
>   
> https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm
>  
> 
> 
> If anyone wants to test this out, please do. You can grab an updated language 
> pack (version 3) here:
> 
>   
> http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz 
> 
> 
> (Warning: 9 GB)
> 
> matt
> 
> 
>> On Nov 23, 2016, at 10:14 AM, kellen sunderland 
>>  wrote:
>> 
>> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
>> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
>> something similar.  I think the default command should eventually be to run
>> the http server, so ideally we'd just do 'docker run -p 5674
>> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
>> server on port 5674.
>> 
>> Good point on Perl + Python, I can add them.
>> 
>> -Kellen
>> 
>> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post  wrote:
>> 
>>> Okay, I have this with
>>> 
>>>   docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>>> 
>>> It seems we are missing Perl (./prepare.sh fails), and we should replace
>>> the LanguageModel line with a KenLM instance and build that. I bet we'll
>>> need Python, too.
>>> 
>>> 
>>> 
>>> 
 On Nov 23, 2016, at 8:15 AM, Matt Post  wrote:
 
 Kellen, can I bother you to post a few first steps? I've successfully
>>> pulled this down to my mac but now do not know how to find it, edit it, or
>>> run it. I'm porting through the documentation and will find it eventually
>>> but this would save me a bit of time.
 
 
> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
>>> kellen.sunderl...@gmail.com> wrote:
> 
> Yes my next step was going to be getting it hosted officially.
> 
> I'll go ahead and open a ticket.  I think I'll hold off on pushing to
>>> the
> Apache account until I've done a little more testing though.
> 
> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" 
>>> wrote:
> 
>> Hi Kellen,
>> Nice :)
>> Another option is for us to host these via the Apache account.
>> https://hub.docker.com/r/apache/
>> We could then add a badge to our README which points to the
>>> Dockerfile(s).

[jira] [Commented] (JOSHUA-331) Address Apache Joshua 6.1 RC#3 Issues

2017-03-07 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899372#comment-15899372
 ] 

Tommaso Teofili commented on JOSHUA-331:


{quote}
Now, compile_berkeleylm.py is a fairly simply wrapper around a java call. So it 
would not be difficult to modify the code and distribute only the 
human-readable "lm" file.
{quote}

+1, thanks Matt for pointing it out, I'll take care of it.

> Address Apache Joshua 6.1 RC#3 Issues
> -
>
> Key: JOSHUA-331
> URL: https://issues.apache.org/jira/browse/JOSHUA-331
> Project: Joshua
>  Issue Type: Task
>  Components: release
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.1
>
>
> Address the following issues:
> {quote}
> Every ASF release MUST contain one or more source packages, which MUST be
> sufficient for a user to build and test the release provided they have
> access to the appropriate platform and tools. - NO
> -Not building due to failing test (BerkleyLM failure).  I'm digging a
> bit more into this.
> {quote}
> {quote}
> Every artifact distributed to the public through Apache channels MUST be
> accompanied by one file containing an OpenPGP compatible ASCII armored
> detached signature and another file containing an MD5 checksum.
> - .asc - NO
> I get warning:
> "gpg --verify joshua-incubating-6.1-src.tar.gz.asc
> joshua-incubating-6.1-src.tar.gz
> gpg: Signature made Thu Feb 23 09:15:17 2017 CET using RSA key ID
> 891768A5
> gpg: Good signature from "Tommaso Teofili "
> [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner."
> - .md5 - NO
> My md5 of joshua-incubating-6.1-src.tar.gz is
> 504976876b01294811293aa45b5400f5, the joshua-incubating-6.1-src.tar.gz.md5
> indicates it should be 22b738eeae45757715080702a5bd2789
> - .sha - NO
> My sha of joshua-incubating-6.1-src.tar.gz is
> 4AB5BA24301590F36AE6452DACC3F21CBD8B3FEC, the
> joshua-incubating-6.1-src.tar.gz.md5 indicates it should be
> 2a55b6d341dddc5369b22a4802a86ec40accd0a1
> - KEYS - YES
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)