[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...

2018-03-13 Thread thammegowda
Github user thammegowda commented on the issue:

https://github.com/apache/incubator-joshua/pull/81
  
Thanks for the tips @KellenSunderland . There is plenty of RAM on server so 
I should let KenLM use more RAM for indexing instead of IO. I will use Java 
Flight recorder to get more details.


---


[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...

2018-03-06 Thread thammegowda
Github user thammegowda commented on the issue:

https://github.com/apache/incubator-joshua/pull/81
  
@KellenSunderland Yes Kenlm is used  for generating a language model from 
the parallel corpus. 
In addition, I am including two more precooked models in ARPA format for 
the target language(copied from other MT system).
[This wiki page is 
helpful](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65871630#TheJoshuaPipeline(6.1)-Languagemodel)



---


[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...

2018-03-05 Thread thammegowda
Github user thammegowda commented on the issue:

https://github.com/apache/incubator-joshua/pull/81
  
I am trying to run an experiment with a bunch of big language models, but 
the tuner is taking forever! 

In the code base, I found a few more (possible) bottlenecks:

1. 
https://github.com/apache/incubator-joshua/blob/000298e555fbc71315b1d8719f5c3918a2102e5b/scripts/training/run_tuner.py#L421
2. 
https://github.com/apache/incubator-joshua/blob/a30e95563a20e2ccad574f4065654b955fe8fa25/src/main/java/org/apache/joshua/zmert/ZMERT.java#L61

So, 4000 MB of heap is hardcoded to the Mert. 

Possible explanation: My language models are huge (a big one is ~90GB), 
they definitely don't fit into 4GB, so JVM is spending all the time in garbage 
collection.


---


[GitHub] incubator-joshua pull request #81: Fix: memory overflow in tuning with large...

2018-03-05 Thread thammegowda
GitHub user thammegowda opened a pull request:

https://github.com/apache/incubator-joshua/pull/81

Fix: memory overflow in tuning with larger language models

tuner needs more memory when larger language models are used.
Even though we allocate more memory in the training pipeline script
the value is ignored at one place in pipeline.

This PR fixes it by exporting env var, and passing it to the JVM at the 
right place

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thammegowda/incubator-joshua master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit cad86aa7512f2829c8a964f2ccc3f69c7cd2f1c7
Author: Thamme Gowda <tg@...>
Date:   2018-03-06T01:13:27Z

Fix: memory overflow in tuning

when larger language models are used, the tuner needs more memory.
Even though we allocate more memory in the training pipeline script
the value is ignored at one place in pipeline.

This patch fixes it by exporting env var, and passing it to the JVM




---


[GitHub] incubator-joshua issue #80: Code Reformatting and output of string length ra...

2017-10-25 Thread thammegowda
Github user thammegowda commented on the issue:

https://github.com/apache/incubator-joshua/pull/80
  
There are two commits. 

The first one is for reformatting the code change.
The second one is for computing the length ratios...


---


[GitHub] incubator-joshua pull request #80: Code Reformatting and output of string le...

2017-10-25 Thread thammegowda
GitHub user thammegowda opened a pull request:

https://github.com/apache/incubator-joshua/pull/80

Code Reformatting and output of string length ratios

one new number in the eval output - sentence length (number of words) 
ratios between output and reference.

However, to interpret the existing code, I had to indent the code correctly.
There are a lot of edits, but most of them are due to code reformatting - 
no functional change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thammegowda/incubator-joshua code-clean

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/80.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #80


commit 36e3a5d39702e2f139afa7f0bd67220beef4cfa5
Author: Thamme Gowda <t...@isi.edu>
Date:   2017-10-25T18:46:28Z

code clean and reformat

commit 7eae36dbae9fe64ef929800cb7160e3739f16f77
Author: Thamme Gowda <t...@isi.edu>
Date:   2017-10-25T19:03:06Z

Output Ratio of Lengths of Strings




---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-27 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-80945
  
@lewissmc Thank you very much, Sir.
Joshua in Docker - not yet tried so far, I will definitely try it.

I am much interested in your [PR to 
Tika](https://github.com/apache/tika/pull/112/) to integrate this translator 
:+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-27 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-54428
  
@mjpost 
> Okay, great! That worked. I assume the edits to $JOSHUA/bin/joshua mean 
that eclipse compiled files will override the jar? So I can do fast development 
in Eclipse?

Seems like you have hard-time with eclipse maven integration.
last time when I used eclipse there was a eclipse plugin that recognized 
maven projects. That plugin took care of all the complexities under the hood. 
If you have proper setup, you can just go to "JoshuaDecoder" (or whichever 
class you want to run) and "Run main" (no need to package, because that takes 
time)

P.S. I use Intellij Idea opensource edition and the maven integration is a 
breeze.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: Added the licence header

2016-05-27 Thread thammegowda
GitHub user thammegowda opened a pull request:

https://github.com/apache/incubator-joshua/pull/19

Added the licence header

Added Licence header

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thammegowda/incubator-joshua logger-hotfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19


commit cd78038d898c0993f77bf3a16f840b7c01fcba10
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-27T21:00:53Z

Added the licence header




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-27 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-28146
  
@mjpost I am unable to run `test.sh` script, because - 


Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no 
ken in java.library.path
at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:52)
... 10 more
Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:43)
... 10 more
Exception in thread "main" java.lang.RuntimeException: * FATAL: could not 
find a feature 'LanguageModel'


How do I pass this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-27 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-25251
  
Found the issue with logger config file. Seems like `-Dlog4j.configuration` 
to override is no longer supported. 
Please review  #18 and merge.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-27 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-06844
  
@lewismc @mjpost  Just looked into this. I will debug this issue and come 
back with my findings soon.

+1 for rethinking about log levels.  we can lower some frequent INFO 
messages to DEBUG level.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-26 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-222027825
  
My guess is that `ConcurrentModificationException` occurs when we try to 
loop (eg: for) on a collection and inside the loop the collection is being 
updated (by adding or removing items).
However, based on the exception, I checked 
https://github.com/apache/incubator-joshua/blob/JOSHUA-252/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java#L362
there isnt any such code. I suspect Rule.toString().

That line executes only if DEBUG level is enabled! try disabling DEBUG (by 
setting Log4j level to INFO or higher until we find a fix



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-26 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-222012667
  
@mjpost as you pointed out, you need to edit log4j settings.

The other way is to edit `src/main/resources/log4j.properties`


If you dont want to rebuild the jar after editing the file, you can create 
an updated log4j.properties and  prefix  its path to the class path (so that it 
gets priority over the one inside jar)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge

2016-05-25 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/15#issuecomment-221782749
  
#17 has been created to make the merging job easy by targeting to 
JOSHUA-252.
 So closing this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-262 : Replacing System.{out,...

2016-05-25 Thread thammegowda
GitHub user thammegowda opened a pull request:

https://github.com/apache/incubator-joshua/pull/17

JOSHUA-262 : Replacing System.{out,err}.print* and java.util.log with SLF4j

This is a duplicate of #15 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thammegowda/incubator-joshua JOSHUA-262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17


commit 09fb6a2d363ac78f091b217a88a8712c47edc5f0
Author: Matt Post <p...@cs.jhu.edu>
Date:   2016-05-14T19:27:38Z

don't separately pack the test grammar (done in run bundler)

commit f354c298ff9d1f16b8c034a5d885428d95e43ca3
Author: Matt Post <p...@cs.jhu.edu>
Date:   2016-05-16T22:20:45Z

Merge branch 'JOSHUA-264' of 
https://github.com/thammegowda/incubator-joshua into JOSHUA-264

commit 659e464665254050a8f9ed321dcbdd08eef8a3d7
Author: Matt Post <p...@cs.jhu.edu>
Date:   2016-05-17T00:14:04Z

Merge branch 'jar-with-dependencies' of 
https://github.com/thammegowda/incubator-joshua into JOSHUA-264

commit c21fa9e82db5b1f784b89ea8109735a3645298f2
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T07:02:42Z

Log4j - Slf4j bridge 

+ Removed java.util.log statements
+ SLF4j with string format pattern replacement

commit 9114a007ae4a42d97e2218712defafa3a9761560
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T07:12:24Z

Read me updated

commit 4d04cc2c01669e3b93399758f54aae27e6e2d0ec
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T18:22:28Z

LOG scope is privatized

commit d6efccbc51260028225652c77bfa0f4bdab8061b
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T19:06:20Z

Clean LOGs, no redudant if(enabled) checks, no eager toString()s

commit 8652d19d1094cc6329220984d0693e6dcef4
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T19:14:29Z

Fix spaces

commit d4ac45193450f1c901f23cc938e5981bb64eb8d6
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-21T19:32:39Z

Fix log issues such as redundant checks and spaces

commit 158685310332be4166164bce28058a90f1d168d7
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-23T04:18:21Z

Replaced System.err.print* with logger api

commit 9d6f84d35754a099123c256b9932a89a2bd316aa
Author: Thamme Gowda <tgow...@gmail.com>
Date:   2016-05-26T05:34:57Z

Rebased with JOSHUA-252 and resolved merge conflicts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-25 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-221778184
  
@lewismc thanks for the reply.

rebased #13 , it is ready for merge!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge

2016-05-22 Thread thammegowda
Github user thammegowda commented on a diff in the pull request:

https://github.com/apache/incubator-joshua/pull/15#discussion_r64157970
  
--- Diff: src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java 
---
@@ -587,7 +588,7 @@ else if (fds[1].toLowerCase().equals("http"))
   } else if (parameter
   
.equals(normalize_key(SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME))) {
 fuzzy_matching = Boolean.parseBoolean(fds[1]);
-logger.finest(String.format(fuzzy_matching + ": %s", 
fuzzy_matching));
+LOG.debug("fuzzy_matching: {}", fuzzy_matching);
--- End diff --

Hi Henry, 
I never used TRACE level, probably because I read that it is discouraged 
http://slf4j.org/faq.html#trace . 

I can remap log.finest -> log.trace if more people +1 on this

cc @KellenSunderland @mjpost 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge

2016-05-21 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/15#issuecomment-220796532
  
@KellenSunderland Thanks for the review and feedback. I fixed them.

However, we are left with 200+ `System.out.print` and 200+ 
`System.err.print` more calls. 
I will need another iteration to resolve them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-20 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-220753345
  
@mjpost Sorry for my previous incomplete comment.
 The problem I see with that line is, now we have new package 
`org.apache.joshua.decoder.ff` instead of old `joshua.decoder.ff`.
Since lewis is currently working on this, I didnt dare to raise a new PR to 
fix a single line :-) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-264 System.exit() calls are ...

2016-05-16 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/13#issuecomment-219571959
  
@mjpost Added `maven-assembly-plugin` to make the build cycle easy: 
https://github.com/apache/incubator-joshua/pull/14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...

2016-05-15 Thread thammegowda
Github user thammegowda commented on the pull request:

https://github.com/apache/incubator-joshua/pull/12#issuecomment-219337253
  
this is indeed a big change and I think it's better to split into multiple 
small PRs.

Maybe we can have a 'maven' branch in parallel for a short transition time,
resolve the remaining issues on that branch with smaller PRs and merge it 
with master?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---