[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Hi @ansell , in my last commit I've pushed a coupe of (hopefully) satisfying additions, namely * removal of open module from CLI (meaning that, by default the open extractor is not executed by default during normal unit test execution) * addition of some class loading logic which improves the flexibility of extractor detection based upon the presence of the open extractor. By default now, open tests are not executed by default... this will dramatically reduce 1) the time of tests, and 2) he memory required to execute the tests. Thanks for any final review. Lewis --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Hi @ansell yes this is a separate module however currently it always builds with CLI module. I'm going to push an update which disables the module tests by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user ansell commented on the issue: https://github.com/apache/any23/pull/34 Is it an optional plugin in the current setup to avoid having users need to load it if they have minimal memory available. I haven't had time to look through it, but I see there is a new openie module. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user ansell commented on the issue: https://github.com/apache/any23/pull/34 My main objections before were about the larger memory requirements for default use and not being able to run the tests without OOM in my mid-range development machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Will commit within next day or so if there are no objections. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Hi @ansell I finally got around to addressing your comments. Just to refresh your memory, use of FileOutputStream (as oppose to ByteArrayOutputStream) within the OpenExtractorTest.java logic is more performant, by around 1/4 second or so. Do you have any further comments on this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user ansell commented on the issue: https://github.com/apache/any23/pull/34 Tests failed for me with OOM: ``` [INFO] Compiling 1 source file to /home/mint/gitrepos/any23/openie/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ apache-any23-openie --- --- T E S T S --- Running org.apache.any23.openie.OpenIEExtractorTest SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mint/.m2/repository/ch/qos/logback/logback-classic/1.1.2/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mint/.m2/repository/org/slf4j/slf4j-log4j12/1.7.21/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder] Loading feature templates. Loading models. Loading lexica. Loading configuration. Loading feature templates. Loading models. Loading feature templates. Loading models. Loading lexica. Loading feature templates. Loading models. Loading feature templates. Loading models. Loading lexica. Loading feature templates. Loading models. Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.977 sec <<< FAILURE! - in org.apache.any23.openie.OpenIEExtractorTest testExtractFromHTMLDocument(org.apache.any23.openie.OpenIEExtractorTest) Time elapsed: 20.282 sec <<< ERROR! java.lang.OutOfMemoryError: Java heap space at org.apache.any23.openie.OpenIEExtractorTest.extract(OpenIEExtractorTest.java:75) at org.apache.any23.openie.OpenIEExtractorTest.testExtractFromHTMLDocument(OpenIEExtractorTest.java:65) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 PING... anyone that is able to provide a review? Would be very much appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Unfortunately... due to the bugs regarding the ```META-INF/service``` directories being filtered out, it means that the plugins for Any23 2.0 are not as useful as they should be as they cannot be dynamically discovered if present on the classpath. We should potentially push Any23 2.1 once this patch is merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Hi @ansell this is now fixed... if you could pull the code and let me know how you get on it would be appreciated. After a good bit of debugging I discovered that some erroneous `` descriptions in plugin pom.xml files meant that the ```META-INF/service``` directories were being filtered out from the generated .jar artifacts... meaning that the ServiceLoader did not discover them. Anyway... if you could pull the code and let me know how you get on it would be appreciated. This is working well for me. One final thing to note, you will see that for the appassembler plugin definition in ```cli/pom.xml``` we increase the JVM arguments to 6000m... this is because OpenIE is pretty memory intensive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 Yep your right. Bang on the money. I'll update the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user ansell commented on the issue: https://github.com/apache/any23/pull/34 The cli module may need the new module added as a dependency to pull it onto the classpath. Strangely enough, it appears as though none of the other plugins are cli dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 OK so implementing ExtractorPlugin is not necessary... none of the other plugins use this logic. I'm trying to get it working via cli appassembler script however no joy yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user ansell commented on the issue: https://github.com/apache/any23/pull/34 I haven't looked at it recently. The META-INF/services should be enough on their own without the explicit plugin support but I can't recall whether there are any other differences that could affect usage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] any23 issue #34: ANY23-304 Add extractor for OpenIE
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/34 @ansell is it necessary to put this new module into ```plugins``` and have the new extractor implement [ExtractorPlugin](http://any23.apache.org/apidocs/index.html?org/apache/any23/plugin/ExtractorPlugin.html)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---