[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...
Github user thammegowda commented on the issue: https://github.com/apache/incubator-joshua/pull/81 Thanks for the tips @KellenSunderland . There is plenty of RAM on server so I should let KenLM use more RAM for indexing instead of IO. I will use Java Flight recorder to get more details. ---
[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...
Github user thammegowda commented on the issue: https://github.com/apache/incubator-joshua/pull/81 @KellenSunderland Yes Kenlm is used for generating a language model from the parallel corpus. In addition, I am including two more precooked models in ARPA format for the target language(copied from other MT system). [This wiki page is helpful](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65871630#TheJoshuaPipeline(6.1)-Languagemodel) ---
[GitHub] incubator-joshua issue #81: Fix: memory overflow in tuning with larger langu...
Github user thammegowda commented on the issue: https://github.com/apache/incubator-joshua/pull/81 I am trying to run an experiment with a bunch of big language models, but the tuner is taking forever! In the code base, I found a few more (possible) bottlenecks: 1. https://github.com/apache/incubator-joshua/blob/000298e555fbc71315b1d8719f5c3918a2102e5b/scripts/training/run_tuner.py#L421 2. https://github.com/apache/incubator-joshua/blob/a30e95563a20e2ccad574f4065654b955fe8fa25/src/main/java/org/apache/joshua/zmert/ZMERT.java#L61 So, 4000 MB of heap is hardcoded to the Mert. Possible explanation: My language models are huge (a big one is ~90GB), they definitely don't fit into 4GB, so JVM is spending all the time in garbage collection. ---
[GitHub] incubator-joshua pull request #81: Fix: memory overflow in tuning with large...
GitHub user thammegowda opened a pull request: https://github.com/apache/incubator-joshua/pull/81 Fix: memory overflow in tuning with larger language models tuner needs more memory when larger language models are used. Even though we allocate more memory in the training pipeline script the value is ignored at one place in pipeline. This PR fixes it by exporting env var, and passing it to the JVM at the right place You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/incubator-joshua master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/81.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #81 commit cad86aa7512f2829c8a964f2ccc3f69c7cd2f1c7 Author: Thamme Gowda <tg@...> Date: 2018-03-06T01:13:27Z Fix: memory overflow in tuning when larger language models are used, the tuner needs more memory. Even though we allocate more memory in the training pipeline script the value is ignored at one place in pipeline. This patch fixes it by exporting env var, and passing it to the JVM ---
[GitHub] incubator-joshua issue #80: Code Reformatting and output of string length ra...
Github user thammegowda commented on the issue: https://github.com/apache/incubator-joshua/pull/80 There are two commits. The first one is for reformatting the code change. The second one is for computing the length ratios... ---
[GitHub] incubator-joshua pull request #80: Code Reformatting and output of string le...
GitHub user thammegowda opened a pull request: https://github.com/apache/incubator-joshua/pull/80 Code Reformatting and output of string length ratios one new number in the eval output - sentence length (number of words) ratios between output and reference. However, to interpret the existing code, I had to indent the code correctly. There are a lot of edits, but most of them are due to code reformatting - no functional change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/incubator-joshua code-clean Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/80.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #80 commit 36e3a5d39702e2f139afa7f0bd67220beef4cfa5 Author: Thamme Gowda <t...@isi.edu> Date: 2017-10-25T18:46:28Z code clean and reformat commit 7eae36dbae9fe64ef929800cb7160e3739f16f77 Author: Thamme Gowda <t...@isi.edu> Date: 2017-10-25T19:03:06Z Output Ratio of Lengths of Strings ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-80945 @lewissmc Thank you very much, Sir. Joshua in Docker - not yet tried so far, I will definitely try it. I am much interested in your [PR to Tika](https://github.com/apache/tika/pull/112/) to integrate this translator :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-54428 @mjpost > Okay, great! That worked. I assume the edits to $JOSHUA/bin/joshua mean that eclipse compiled files will override the jar? So I can do fast development in Eclipse? Seems like you have hard-time with eclipse maven integration. last time when I used eclipse there was a eclipse plugin that recognized maven projects. That plugin took care of all the complexities under the hood. If you have proper setup, you can just go to "JoshuaDecoder" (or whichever class you want to run) and "Run main" (no need to package, because that takes time) P.S. I use Intellij Idea opensource edition and the maven integration is a breeze. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Added the licence header
GitHub user thammegowda opened a pull request: https://github.com/apache/incubator-joshua/pull/19 Added the licence header Added Licence header You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/incubator-joshua logger-hotfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/19.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19 commit cd78038d898c0993f77bf3a16f840b7c01fcba10 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-27T21:00:53Z Added the licence header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-28146 @mjpost I am unable to run `test.sh` script, because - Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken in java.library.path at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:52) ... 10 more Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:43) ... 10 more Exception in thread "main" java.lang.RuntimeException: * FATAL: could not find a feature 'LanguageModel' How do I pass this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-25251 Found the issue with logger config file. Seems like `-Dlog4j.configuration` to override is no longer supported. Please review #18 and merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-06844 @lewismc @mjpost Just looked into this. I will debug this issue and come back with my findings soon. +1 for rethinking about log levels. we can lower some frequent INFO messages to DEBUG level. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-222027825 My guess is that `ConcurrentModificationException` occurs when we try to loop (eg: for) on a collection and inside the loop the collection is being updated (by adding or removing items). However, based on the exception, I checked https://github.com/apache/incubator-joshua/blob/JOSHUA-252/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java#L362 there isnt any such code. I suspect Rule.toString(). That line executes only if DEBUG level is enabled! try disabling DEBUG (by setting Log4j level to INFO or higher until we find a fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-222012667 @mjpost as you pointed out, you need to edit log4j settings. The other way is to edit `src/main/resources/log4j.properties` If you dont want to rebuild the jar after editing the file, you can create an updated log4j.properties and prefix its path to the class path (so that it gets priority over the one inside jar) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/15#issuecomment-221782749 #17 has been created to make the merging job easy by targeting to JOSHUA-252. So closing this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-262 : Replacing System.{out,...
GitHub user thammegowda opened a pull request: https://github.com/apache/incubator-joshua/pull/17 JOSHUA-262 : Replacing System.{out,err}.print* and java.util.log with SLF4j This is a duplicate of #15 You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/incubator-joshua JOSHUA-262 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/17.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17 commit 09fb6a2d363ac78f091b217a88a8712c47edc5f0 Author: Matt Post <p...@cs.jhu.edu> Date: 2016-05-14T19:27:38Z don't separately pack the test grammar (done in run bundler) commit f354c298ff9d1f16b8c034a5d885428d95e43ca3 Author: Matt Post <p...@cs.jhu.edu> Date: 2016-05-16T22:20:45Z Merge branch 'JOSHUA-264' of https://github.com/thammegowda/incubator-joshua into JOSHUA-264 commit 659e464665254050a8f9ed321dcbdd08eef8a3d7 Author: Matt Post <p...@cs.jhu.edu> Date: 2016-05-17T00:14:04Z Merge branch 'jar-with-dependencies' of https://github.com/thammegowda/incubator-joshua into JOSHUA-264 commit c21fa9e82db5b1f784b89ea8109735a3645298f2 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T07:02:42Z Log4j - Slf4j bridge + Removed java.util.log statements + SLF4j with string format pattern replacement commit 9114a007ae4a42d97e2218712defafa3a9761560 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T07:12:24Z Read me updated commit 4d04cc2c01669e3b93399758f54aae27e6e2d0ec Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T18:22:28Z LOG scope is privatized commit d6efccbc51260028225652c77bfa0f4bdab8061b Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T19:06:20Z Clean LOGs, no redudant if(enabled) checks, no eager toString()s commit 8652d19d1094cc6329220984d0693e6dcef4 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T19:14:29Z Fix spaces commit d4ac45193450f1c901f23cc938e5981bb64eb8d6 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-21T19:32:39Z Fix log issues such as redundant checks and spaces commit 158685310332be4166164bce28058a90f1d168d7 Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-23T04:18:21Z Replaced System.err.print* with logger api commit 9d6f84d35754a099123c256b9932a89a2bd316aa Author: Thamme Gowda <tgow...@gmail.com> Date: 2016-05-26T05:34:57Z Rebased with JOSHUA-252 and resolved merge conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-221778184 @lewismc thanks for the reply. rebased #13 , it is ready for merge! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge
Github user thammegowda commented on a diff in the pull request: https://github.com/apache/incubator-joshua/pull/15#discussion_r64157970 --- Diff: src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java --- @@ -587,7 +588,7 @@ else if (fds[1].toLowerCase().equals("http")) } else if (parameter .equals(normalize_key(SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME))) { fuzzy_matching = Boolean.parseBoolean(fds[1]); -logger.finest(String.format(fuzzy_matching + ": %s", fuzzy_matching)); +LOG.debug("fuzzy_matching: {}", fuzzy_matching); --- End diff -- Hi Henry, I never used TRACE level, probably because I read that it is discouraged http://slf4j.org/faq.html#trace . I can remap log.finest -> log.trace if more people +1 on this cc @KellenSunderland @mjpost --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Joshua-262: Slf4j - Log4j bridge
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/15#issuecomment-220796532 @KellenSunderland Thanks for the review and feedback. I fixed them. However, we are left with 200+ `System.out.print` and 200+ `System.err.print` more calls. I will need another iteration to resolve them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-220753345 @mjpost Sorry for my previous incomplete comment. The problem I see with that line is, now we have new package `org.apache.joshua.decoder.ff` instead of old `joshua.decoder.ff`. Since lewis is currently working on this, I didnt dare to raise a new PR to fix a single line :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-264 System.exit() calls are ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/13#issuecomment-219571959 @mjpost Added `maven-assembly-plugin` to make the build cycle easy: https://github.com/apache/incubator-joshua/pull/14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: JOSHUA-252 Make it possible to use ...
Github user thammegowda commented on the pull request: https://github.com/apache/incubator-joshua/pull/12#issuecomment-219337253 this is indeed a big change and I think it's better to split into multiple small PRs. Maybe we can have a 'maven' branch in parallel for a short transition time, resolve the remaining issues on that branch with smaller PRs and merge it with master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---