[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-23 Thread lewismc
Github user lewismc closed the pull request at:

https://github.com/apache/any23/pull/33


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-16 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101650632
  
--- Diff: core/src/main/java/org/apache/any23/rdf/RDFUtils.java ---
@@ -564,6 +569,4 @@ public static boolean isAbsoluteIRI(String href) {
 }
 }
 
-private RDFUtils() {}
--- End diff --

Possibly, I will see how the remainder of the implementation goes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-16 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101650651
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/openie/OpenIEExtractor.java ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.any23.extractor.openie;
+
+import java.io.IOException;
+import java.util.List;
+
+import javax.xml.transform.TransformerConfigurationException;
+import javax.xml.transform.TransformerFactoryConfigurationError;
+
+import org.apache.any23.extractor.Extractor;
+import org.apache.any23.extractor.ExtractionContext;
+import org.apache.any23.extractor.ExtractorDescription;
+import org.apache.any23.util.StreamUtils;
+import org.eclipse.rdf4j.model.IRI;
+import org.apache.any23.extractor.ExtractionException;
+import org.apache.any23.extractor.ExtractionParameters;
+import org.apache.any23.extractor.ExtractionResult;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.w3c.dom.Document;
+
+import edu.knowitall.openie.Argument;
+import edu.knowitall.openie.Instance;
+import edu.knowitall.openie.OpenIE;
+import edu.knowitall.tool.parse.ClearParser;
+import edu.knowitall.tool.postag.ClearPostagger;
+import edu.knowitall.tool.srl.ClearSrl;
+import edu.knowitall.tool.tokenize.ClearTokenizer;
+import scala.collection.JavaConversions;
+import scala.collection.Seq;
+
+
+
+/**
+ * An https://github.com/allenai/openie-standalone;>OpenIE 
+ * extractor able to generate RDF statements from 
+ * sentences representing relations in the text.
+ */
+public class OpenIEExtractor implements Extractor.TagSoupDOMExtractor {
+
+private final Logger LOG = LoggerFactory.getLogger(getClass());
+
+private IRI documentRoot;
+
+/**
+ * default constructor
+ */
+OpenIEExtractor() {
+// default constructor
+}
+
+/**
+ * @see org.apache.any23.extractor.Extractor#getDescription()
+ */
+@Override
+public ExtractorDescription getDescription() {
+return OpenIEExtractorFactory.getDescriptionInstance();
+}
+
+@Override
+public void run(ExtractionParameters extractionParameters,
+ExtractionContext context, Document in, ExtractionResult out)
+throws IOException, ExtractionException {
+OpenIE openIE = new OpenIE(new ClearParser(new ClearPostagger(new 
ClearTokenizer())), new ClearSrl(), false, false);
+
+
+Seq extractions = null;
+try {
+extractions = 
openIE.extract(StreamUtils.asString(StreamUtils.documentToInputStream(in)));
+} catch (TransformerConfigurationException | 
TransformerFactoryConfigurationError e) {
+LOG.error("Error during extraction: {}", e);
+}
+
+List listExtractions = 
JavaConversions.seqAsJavaList(extractions);
+for(Instance instance : listExtractions) {
+StringBuilder sb = new StringBuilder();
+
+sb.append(instance.confidence())
+.append('\t')
+.append(instance.extr().context())
+.append('\t')
+.append(instance.extr().arg1().text())
+.append('\t')
+.append(instance.extr().rel().text())
+.append('\t');
+
+List listArg2s = 
JavaConversions.seqAsJavaList(instance.extr().arg2s());
+for(Argument argument : listArg2s) {
+sb.append(argument.text()).append("; ");
+}
+System.out.println(sb.toString());
--- End diff --

Yes, correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact 

[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-16 Thread ansell
Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101641510
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/openie/OpenIEExtractor.java ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.any23.extractor.openie;
+
+import java.io.IOException;
+import java.util.List;
+
+import javax.xml.transform.TransformerConfigurationException;
+import javax.xml.transform.TransformerFactoryConfigurationError;
+
+import org.apache.any23.extractor.Extractor;
+import org.apache.any23.extractor.ExtractionContext;
+import org.apache.any23.extractor.ExtractorDescription;
+import org.apache.any23.util.StreamUtils;
+import org.eclipse.rdf4j.model.IRI;
+import org.apache.any23.extractor.ExtractionException;
+import org.apache.any23.extractor.ExtractionParameters;
+import org.apache.any23.extractor.ExtractionResult;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.w3c.dom.Document;
+
+import edu.knowitall.openie.Argument;
+import edu.knowitall.openie.Instance;
+import edu.knowitall.openie.OpenIE;
+import edu.knowitall.tool.parse.ClearParser;
+import edu.knowitall.tool.postag.ClearPostagger;
+import edu.knowitall.tool.srl.ClearSrl;
+import edu.knowitall.tool.tokenize.ClearTokenizer;
+import scala.collection.JavaConversions;
+import scala.collection.Seq;
+
+
+
+/**
+ * An https://github.com/allenai/openie-standalone;>OpenIE 
+ * extractor able to generate RDF statements from 
+ * sentences representing relations in the text.
+ */
+public class OpenIEExtractor implements Extractor.TagSoupDOMExtractor {
+
+private final Logger LOG = LoggerFactory.getLogger(getClass());
+
+private IRI documentRoot;
+
+/**
+ * default constructor
+ */
+OpenIEExtractor() {
+// default constructor
+}
+
+/**
+ * @see org.apache.any23.extractor.Extractor#getDescription()
+ */
+@Override
+public ExtractorDescription getDescription() {
+return OpenIEExtractorFactory.getDescriptionInstance();
+}
+
+@Override
+public void run(ExtractionParameters extractionParameters,
+ExtractionContext context, Document in, ExtractionResult out)
+throws IOException, ExtractionException {
+OpenIE openIE = new OpenIE(new ClearParser(new ClearPostagger(new 
ClearTokenizer())), new ClearSrl(), false, false);
+
+
+Seq extractions = null;
+try {
+extractions = 
openIE.extract(StreamUtils.asString(StreamUtils.documentToInputStream(in)));
+} catch (TransformerConfigurationException | 
TransformerFactoryConfigurationError e) {
+LOG.error("Error during extraction: {}", e);
+}
+
+List listExtractions = 
JavaConversions.seqAsJavaList(extractions);
+for(Instance instance : listExtractions) {
+StringBuilder sb = new StringBuilder();
+
+sb.append(instance.confidence())
+.append('\t')
+.append(instance.extr().context())
+.append('\t')
+.append(instance.extr().arg1().text())
+.append('\t')
+.append(instance.extr().rel().text())
+.append('\t');
+
+List listArg2s = 
JavaConversions.seqAsJavaList(instance.extr().arg2s());
+for(Argument argument : listArg2s) {
+sb.append(argument.text()).append("; ");
+}
+System.out.println(sb.toString());
--- End diff --

Presuming this is stub code right now that will eventually send results to 
the ExtractorResult.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled 

[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-16 Thread ansell
Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101638032
  
--- Diff: core/src/main/java/org/apache/any23/rdf/RDFUtils.java ---
@@ -564,6 +569,4 @@ public static boolean isAbsoluteIRI(String href) {
 }
 }
 
-private RDFUtils() {}
--- End diff --

Is this signalling an intention to change the scope of this class to 
include non-static methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] any23 pull request #33: ANY23-304 Add extractor for OpenIE 1st pass

2017-02-15 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/33

ANY23-304 Add extractor for OpenIE 1st pass

This issue is a first pass at addressing 
https://issues.apache.org/jira/browse/ANY23-304. There is more work to be done 
on the new extractor. The new dependency which has been introduced has a 
significant impact on the size of Any23 as there are loads of new transitive 
dependencies. @Yongyao this is for context.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-304

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit 1ffd60a68d20e7f313745202c012ac3fd2b49738
Author: Lewis John McGibbney 
Date:   2017-02-16T07:17:40Z

ANY23-304 Add extractor for OpenIE 1st pass




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---