[GitHub] any23 issue #132: ANY23-419 Add J2EE dependencies such that service runs und...

2018-11-14 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/132
  
Any objections to merge?


---


[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...

2018-11-13 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/132
  
OK folks, PR updated. Thank you for the review. I've tested locally and 
inclusion of only of dependency now enables deploying Any23 master in Tomcat 9 
running JDK11.


---


[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...

2018-11-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/132
  
OK, let me rework things on my end and see what I can come up with.


---


[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...

2018-11-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/132
  
First thing is 1) you don't need to access BUILDING.txt however 2) see 
https://apache.org/dev/freebsd-jails#opie


---


[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...

2018-11-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/132
  
The issue I was having was when running it in Tomcat. From what I could 
tell Tomcat does not ship with these libraries either so they need to be added 
manually. Please try it out yourself and see where you get. I am by no means a 
Tomcat, Servlet Container or J2EE expert to I am not 100% sure this fix is 
optimum.


---


[GitHub] any23 issue #131: ANY23-418 improve TikaEncodingDetector

2018-11-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/131
  
You've brought up an excellent topic for conversation. Tika currently has a 
batch, regression job which essentially enables them to run over loads of 
documents and analyze the output. The result being that they know how changes 
to the source are affecting Tika's ability to do what it claims it is doing 
over time. We do not have that in Any23 but I think we should make an effort to 
build bridges withe the Tika community in this regard with the aim of us 
sharing resources (both available computing to run large batch parse jobs, as 
well as dataset(s) we can use to run Any23 over.)

I have been thinking for the longest time now about implementing a 
```tika.triplify``` API which would encapsulate Any23 run it on the Tika data 
streams but I just never got around to it. Maybe now is a better time to bring 
that idea back to life. 

I was thinking we could possibly use common crawl but they do not publish 
the raw data AFAIK it is the Nutch segments or some alternative e.g. the 
WebArchive files.


---


[GitHub] any23 pull request #132: ANY23-419 Add J2EE depednencies such that service r...

2018-11-07 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/132

ANY23-419 Add J2EE depednencies such that service runs under JDK11

This issue addresses https://issues.apache.org/jira/browse/ANY23-419

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-419

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #132


commit c57abb13de5d1bf0b81d5ea9cbe28038355842c7
Author: Lewis John McGibbney 
Date:   2018-11-08T03:00:04Z

ANY23-419 Add J2EE depednencies such that service runs under JDK11




---


[GitHub] any23 issue #131: ANY23-418 improve TikaEncodingDetector

2018-11-07 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/131
  
There is a fair bit of code here but I am not really sure how to test it. 
Unfortunately I am going to say, please provide unit test. I've been aware of 
some encoding detection issues previously with Any23 but never got around to 
logging them here. 


---


[GitHub] any23 issue #124: ANY23-67 test against online microdata test-suite

2018-10-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/124
  
wow OK yes lots of work to be done here... do you think a release is in 
order first before we work on this?


---


[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover

2018-10-03 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/122
  
> The only reason I've hesitated so far about merging this PR is that once 
new interfaces are introduced as part of the core API, I'd prefer to never 
change them again--so I want to get it right the first time!

Totally understandable

I really like the idea of improving the interoperability logic here. 

> The reason being: shouldn't all return types...be, preferably, part of 
our own API, rather than RDF4J's?

Yes that is a very fair statement. What, for example happens if and when 
the RDF4J project ceases to exist? Then of course we would end up implementing 
exactly what you are suggesting. 

I am certainly +1 to a TripleFormat API which is somewhat analogous and 
certainly interoperable with RDF4J's 
[RDFFormat](http://docs.rdf4j.org/javadoc/latest/index.html?org/eclipse/rdf4j/rio/RDFFormat.html).
 Very nice @HansBrende 




---


[GitHub] any23 pull request #123: ANY23-399 Upgrade Apache parent POM to version 21

2018-09-24 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/123

ANY23-399 Upgrade Apache parent POM to version 21

This issue addresses https://issues.apache.org/jira/browse/ANY23-399

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-399

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #123


commit 12640a953c76900ad0bbc990cb162a7aee8d03c3
Author: Lewis John McGibbney 
Date:   2018-09-25T01:59:46Z

ANY23-399 Upgrade Apache parent POM to version 21




---


[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover

2018-09-18 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/122
  
+1 from me


---


[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover

2018-09-16 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/122
  
I think this looks great. I have no further comments for improvement. I 
think the additions looks great.


---


[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover

2018-09-14 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/122
  
I like this.
As you've pointed out the deprecation is a bit harsh but it is certainly 
something we can live with. I would however urge you to add additional 
documentation e.g.
```
/**
 * @deprecated
 * explanation of why function was deprecated, if possible include what 
 * should be used.
 */
```
This looks really nice. 


---


[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X

2018-09-06 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/120
  
@jgrzebyta good question. 
I previously used JDK 8 and Any23 was building absolutely fine. I 
experienced no issues. 
When I upgraded, the issues started... hence this PR.


---


[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X

2018-08-29 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/120
  
@HansBrende thanks for review, from what I remember, this is not packaged 
with Tomcat9 anymore, which is what users should be using to deploy the Any23 
WAR.


---


[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X

2018-08-29 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/120
  
Any comments folks?


---


[GitHub] any23 issue #119: ANY23-392: Add configuration for maven-jetty-plugin

2018-08-29 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/119
  
@jgrzebyta +1


---


[GitHub] any23 issue #118: ANY23-390 implement ICal, JCal, and XCal extractors

2018-08-25 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/118
  
+1


---


[GitHub] any23 issue #119: ANY23-392: Add configuration for maven-jetty-plugin

2018-08-25 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/119
  
+1


---


[GitHub] any23 pull request #120: ANY23-393 Any23 master to build under JDK 10.X

2018-08-25 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/120

ANY23-393 Any23 master to build under JDK 10.X

Issue addresses build under following environment
```
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T11:33:14-07:00)
Maven home: /usr/local/Cellar/maven/3.5.4/libexec
Java version: 10.0.2, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk-10.0.2.jdk/Contents/Home
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.12.6", arch: "x86_64", family: "Mac"
```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/any23 ANY23-393

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #120


commit 537c6e4d2594de8cdcc3cd68862eb2efed9232c0
Author: Lewis John McGibbney 
Date:   2018-08-25T18:07:53Z

ANY23-393 Any23 master to build under JDK 10.X




---


[GitHub] any23 issue #118: ANY23-390 implement ICal, JCal, and XCal extractors

2018-08-21 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/118
  
This is dynamite!
Sorry for not responding on corresponding JIRA issue. This looks great.


---


[GitHub] any23 issue #116: Any23 388

2018-08-16 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/116
  
@HansBrende feel free to merge once you are satisfied.


---


[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa

2018-08-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/104
  
> Remember, that not only the libraries that librdfa uses are old but the 
librdfa project itself has not being active in a while.

Yes but it is compliant with the RDFa test suite which is exactly what we 
want. It can be assumed to be somewhat complete...

> If we are moving to other branch, could someone create a new branch, so I 
can point the PR to the new branch.

@HansBrende can you please create a new branch called ANY23-295, I do not 
have access to a terminal right now. Thank you in advance, 

> I will be glad to keep contributing after GSoC. Honestly, I'd like the 
idea of becoming a commiter, so I have to keep working to earn it.

👍 


---


[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa

2018-08-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/104
  
> @lewismc should we not test more thoroughly before merging to master? 
Maybe a separate branch instead?

Yes absolutely. We should also definitely extend the integration tests as 
well such that we can make more confident comparisons regarding runtime 
execution. 


---


[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa

2018-08-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/104
  
@JulioCCBUcuenca Please address the issue of making librdfa-rdf4j a module 
within Any23 then push your changes. At that stage I will most likely approve 
the PR and we can test more thoroughly. Thanks. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-08 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208642324
  
--- Diff: test-resources/src/test/resources/html/rdfa/basic.html ---
@@ -14,7 +14,7 @@
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
-http://www.w3.org/1999/xhtml;>
+http://www.w3.org/1999/xhtml;>
--- End diff --

Thats fine thanks for the explanation. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-08 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208642202
  
--- Diff: core/pom.xml ---
@@ -335,7 +343,14 @@
 
 
   
-
+  
+  
+
+  librdfa-rdf4j
+  
https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/
--- End diff --

librdfa-rdf4j should definitely be a separate module of Any23, please make 
sure that it is and that the require JAR dependencies are located at build and 
compile/runtime such.


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056664
  
--- Diff: librdfa-rdf4j/src/main/c/CMakeLists.txt ---
@@ -0,0 +1,39 @@
+cmake_minimum_required(VERSION 2.8)
--- End diff --

Missing license header. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208055581
  
--- Diff: core/pom.xml ---
@@ -335,7 +343,14 @@
 
 
   
-
+  
+  
+
+  librdfa-rdf4j
+  
https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/
--- End diff --

This is a bit hacky... what is meant to reside at this URL?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208055205
  
--- Diff: core/pom.xml ---
@@ -62,6 +62,14 @@
 
 
 
+
+
+${project.groupId}
+apache-any23-librdfa
+1.0.0
--- End diff --

Ensure that the versioning is consistent with the rest of the codebase e.g. 
2.3-SNAPSHOT or whatever it is currently. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208055086
  
--- Diff: core/pom.xml ---
@@ -62,6 +62,14 @@
 
 
 
+
+
+${project.groupId}
--- End diff --

Please use 2 space indents the same as the rest of the file.


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056226
  
--- Diff: librdfa-rdf4j/README.MD ---
@@ -0,0 +1,63 @@
+# Librdfa - RDF4J
+
+RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse 
RDFa to triples.
+
+## Prerequisites
+
+You need to install the [librdfa](https://github.com/rdfa/librdfa) library.
+
+## Install
+
+``` mvn
+
+   org.apache.any23
+   apache-any23-librdfa
+   1.0.0
+
+
+
+   
+   librdfa-rdf4j
+   
https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/
--- End diff --

... again, this looks quite untidy. What resides at this URL?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057089
  
--- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h ---
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2008 Digital Bazaar, Inc.
+ *
+ * This file is part of librdfa.
+ *
+ * librdfa is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation, either version 3 of the
+ * License, or (at your option) any later version.
+ *
+ * librdfa is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with librdfa. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _RDFA_PARSER_H_
+#define _RDFA_PARSER_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct rdfacontext;
+
+class Callback {
+public:
+
+virtual ~Callback() {
+//std::cout << "Callback::~Callback()" << std::endl;
+}
+
+virtual void default_graph(char* subject, char* predicate, char* 
object, int object_type, char* datatype, char* language) {
+}
+
+virtual void processor_graph(char* subject, char* predicate, char* 
object, int object_type, char* datatype, char* language) {
+}
+
+virtual char* fill_data(size_t buffer_length) {
+}
+
+virtual size_t fill_len() {
+}
+};
+
+/**
+ * The RdfaParser class is a wrapper class for Java to provide a
+ * simple API for using librdfa in Java.
+ */
+class RdfaParser {
+private:
+Callback *_callback;
+public:
+/**
+ * The base URI that will be used when resolving relative pathnames
+ * in the document.
+ */
+std::string mBaseUri;
+
+/**
+ * The base RDFa context to use when setting the triple handler 
callback,
+ * buffer filler callback, and executing the parser call.
+ */
+rdfacontext* mBaseContext;
+
+RdfaParser(const char* baseUri) : _callback(0) {
+mBaseUri = baseUri;
+mBaseContext = rdfa_create_context(baseUri);
+}
+
+/**
+ * Standard destructor.
--- End diff --

```constructor```


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057348
  
--- Diff: librdfa-rdf4j/src/main/c/main.java ---
@@ -0,0 +1,82 @@
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import org.apache.any23.rdf.librdfa.Callback;
+import org.apache.any23.rdf.librdfa.RdfaParser;
+
+public class main {
+
+  public static void main(String argv[]) {
+System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on 
Windows) or libexample.so (on Linux)
+
+System.out.println("Adding and calling a normal C++ callback");
+System.out.println("");
+String ds = "\n"
++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n"
++ "http://www.w3.org/1999/xhtml\"\n;
++ "  xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n"
++ "\n"
++ "Test 0001\n"
++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\&quot</a>;
 charset=\"UTF-8\">\n"
++ "\n"
++ "This photo was taken by Mark Birbeck.\n"
++ "\n"
++ "";
+
+RdfaParser caller = new RdfaParser("http://www.google.com/;);
--- End diff --

This should never be hardcoded. It should be read from main arg[]... 
correct?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056466
  
--- Diff: librdfa-rdf4j/pom.xml ---
@@ -0,0 +1,139 @@
+
--- End diff --

Add Apache License header.


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057792
  
--- Diff: 
librdfa-rdf4j/src/main/resources/META-INF/services/org.eclipse.rdf4j.rio.RDFParserFactory
 ---
@@ -0,0 +1 @@
+org.apache.any23.rdf.rdfa.LibrdfaRDFaParserFactory
--- End diff --

License header if possible. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057284
  
--- Diff: librdfa-rdf4j/src/main/c/main.java ---
@@ -0,0 +1,82 @@
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import org.apache.any23.rdf.librdfa.Callback;
+import org.apache.any23.rdf.librdfa.RdfaParser;
+
+public class main {
+
+  public static void main(String argv[]) {
+System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on 
Windows) or libexample.so (on Linux)
+
+System.out.println("Adding and calling a normal C++ callback");
+System.out.println("");
+String ds = "\n"
--- End diff --

This is very messy, it is not required please remove. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056890
  
--- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h ---
@@ -0,0 +1,124 @@
+/*
--- End diff --

Is there not an Apache License v2.0 version of this file?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056821
  
--- Diff: librdfa-rdf4j/src/main/c/RdfaParser.cpp ---
@@ -0,0 +1,19 @@
+/*
--- End diff --

Is there not an ALv2.0 version of this file?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208055953
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/rdfa/LibRdfaExtractorFactory.java 
---
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2018 The Apache Software Foundation.
--- End diff --

Please remove Copyright line


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208055002
  
--- Diff: api/src/main/resources/default-configuration.properties ---
@@ -47,6 +47,14 @@ any23.extraction.metadata.domain.per.entity=off
 # registered the RDFa 1.0 legacy one 
(org.deri.any23.extractor.rdfa.RDFaExtractor).
 any23.extraction.rdfa.programmatic=on
 
+# Allows to enable Librdfa Extractor.
+# If 'on' will override the extractors with the programmatic option,
+# RDFa 1.1 Extractor (org.deri.any23.extractor.rdfa.RDFa11Extractor) and 
--- End diff --

Replace ```deri``` with ```apache```


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057557
  
--- Diff: librdfa-rdf4j/src/main/c/rdfa.i ---
@@ -0,0 +1,40 @@
+%module(directors="1") rdfa
--- End diff --

Missing license header. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056181
  
--- Diff: librdfa-rdf4j/README.MD ---
@@ -0,0 +1,63 @@
+# Librdfa - RDF4J
+
+RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse 
RDFa to triples.
+
+## Prerequisites
+
+You need to install the [librdfa](https://github.com/rdfa/librdfa) library.
+
+## Install
+
+``` mvn
+
+   org.apache.any23
+   apache-any23-librdfa
+   1.0.0
--- End diff --

Make sure the versioning is inline with existing Any23 versioning.


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057394
  
--- Diff: librdfa-rdf4j/src/main/c/main.java ---
@@ -0,0 +1,82 @@
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import org.apache.any23.rdf.librdfa.Callback;
+import org.apache.any23.rdf.librdfa.RdfaParser;
+
+public class main {
+
+  public static void main(String argv[]) {
+System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on 
Windows) or libexample.so (on Linux)
+
+System.out.println("Adding and calling a normal C++ callback");
+System.out.println("");
+String ds = "\n"
++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n"
++ "http://www.w3.org/1999/xhtml\"\n;
++ "  xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n"
++ "\n"
++ "Test 0001\n"
++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\&quot</a>;
 charset=\"UTF-8\">\n"
++ "\n"
++ "This photo was taken by Mark Birbeck.\n"
++ "\n"
++ "";
+
+RdfaParser caller = new RdfaParser("http://www.google.com/;);
+caller.init();
+Callback callback = new JavaCallback(new 
ByteArrayInputStream(ds.getBytes(StandardCharsets.UTF_8)));
+caller.setCallback(callback);
+
+caller.parse();
+//rdfa.set_rdfa_parser(caller);
--- End diff --

If not in use, please remove. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208058008
  
--- Diff: test-resources/src/test/resources/html/rdfa/basic.html ---
@@ -14,7 +14,7 @@
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
-http://www.w3.org/1999/xhtml;>
+http://www.w3.org/1999/xhtml;>
--- End diff --

Seems like a strange addition. Can you explain?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208054967
  
--- Diff: api/src/main/resources/default-configuration.properties ---
@@ -47,6 +47,14 @@ any23.extraction.metadata.domain.per.entity=off
 # registered the RDFa 1.0 legacy one 
(org.deri.any23.extractor.rdfa.RDFaExtractor).
--- End diff --

There should be no mention of ```org.deri.any23``` in this file. It should 
be ```org.apache.any23```, can you please correct this. Thanks


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056541
  
--- Diff: librdfa-rdf4j/pom.xml ---
@@ -0,0 +1,139 @@
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
--- End diff --

Make sure XML is formatted at 2 space indents to match existing XML for 
Any23 codebase. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056434
  
--- Diff: librdfa-rdf4j/README.MD ---
@@ -0,0 +1,63 @@
+# Librdfa - RDF4J
+
+RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse 
RDFa to triples.
+
+## Prerequisites
+
+You need to install the [librdfa](https://github.com/rdfa/librdfa) library.
+
+## Install
+
+``` mvn
+
+   org.apache.any23
+   apache-any23-librdfa
+   1.0.0
+
+
+
+   
+   librdfa-rdf4j
+   
https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/
+   
+
+```
+
+## Compile
+
+`mvn clean install`
+
+## Use
+
+Add the library and you can parse an `InputStream` as you would do with 
[`Rio`](http://docs.rdf4j.org/javadoc/2.1/org/eclipse/rdf4j/rio/Rio.html).
+
+``` java
+RDFParser rdfParser = Rio.createParser(RDFFormat.RDFA);
+Model model = new LinkedHashModel();
+rdfParser.setRDFHandler(new StatementCollector(model));
+rdfParser.parse(in, "http://www.example.org./;);
+```
+
+## Benchmarking
+
+In general librdfa is 2-5 seconds faster than semargl.  
+
+### librdfa-rdf4j
+- round: 0.11 [+- 0.00] 
--- End diff --

What is the unit of these floating points?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056935
  
--- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h ---
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2008 Digital Bazaar, Inc.
+ *
+ * This file is part of librdfa.
+ *
+ * librdfa is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation, either version 3 of the
+ * License, or (at your option) any later version.
+ *
+ * librdfa is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with librdfa. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _RDFA_PARSER_H_
+#define _RDFA_PARSER_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct rdfacontext;
+
+class Callback {
+public:
+
+virtual ~Callback() {
+//std::cout << "Callback::~Callback()" << std::endl;
--- End diff --

Please remove if not in use. 


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057524
  
--- Diff: librdfa-rdf4j/src/main/c/main.java ---
@@ -0,0 +1,82 @@
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import org.apache.any23.rdf.librdfa.Callback;
+import org.apache.any23.rdf.librdfa.RdfaParser;
+
+public class main {
+
+  public static void main(String argv[]) {
+System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on 
Windows) or libexample.so (on Linux)
+
+System.out.println("Adding and calling a normal C++ callback");
+System.out.println("");
+String ds = "\n"
++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n"
++ "http://www.w3.org/1999/xhtml\"\n;
++ "  xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n"
++ "\n"
++ "Test 0001\n"
++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\&quot</a>;
 charset=\"UTF-8\">\n"
++ "\n"
++ "This photo was taken by Mark Birbeck.\n"
++ "\n"
++ "";
+
+RdfaParser caller = new RdfaParser("http://www.google.com/;);
+caller.init();
+Callback callback = new JavaCallback(new 
ByteArrayInputStream(ds.getBytes(StandardCharsets.UTF_8)));
+caller.setCallback(callback);
+
+caller.parse();
+//rdfa.set_rdfa_parser(caller);
+
+  }
+}
+
+class JavaCallback extends Callback {
+
+  BufferedReader bis = null;
+  int len = 0;
+
+  public JavaCallback(InputStream is) {
+super();
+bis = new BufferedReader(new InputStreamReader(is));
+  }
+
+  @Override
+  public void default_graph(String subject, String predicate, String 
object, int object_type, String datatype, String language) {
+System.out.println("default_graph(...)");
--- End diff --

Using ```System``` calls is never particularly safe... are all of these 
calls necessary?


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208056589
  
--- Diff: librdfa-rdf4j/pom.xml ---
@@ -0,0 +1,139 @@
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+org.apache.any23
+apache-any23-librdfa
+0.0.1-SNAPSHOT
--- End diff --

This version needs to match the master versioning for Any23.


---


[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa

2018-08-06 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/104#discussion_r208057153
  
--- Diff: librdfa-rdf4j/src/main/c/main.java ---
@@ -0,0 +1,82 @@
+
--- End diff --

Missing license header.


---


[GitHub] any23 issue #111: Any23 295 librdfa module

2018-08-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/111
  
Which of the PR's are you attempting to use?
I think this PR is not suitable for merging. Please close it off and 
resubmit a clean PR, thank you @JulioCCBUcuenca 


---


[GitHub] any23 issue #102: ANY23-367 update 'latest.stable.released' property

2018-07-31 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/102
  
+1 thank you @HansBrende 


---


[GitHub] any23 issue #91: ANY23-356 update all dependencies

2018-07-05 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/91
  
+1 please merge excellent job

On Thu, Jul 5, 2018 at 14:16 Hans Brende  wrote:

> @lewismc <https://github.com/lewismc> I just ran mvn clean install myself
> after my latest commit, and the project built successfully. Can you 
confirm?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/any23/pull/91#issuecomment-402855293>, or mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABHJl9A3HDYVRl5sN91OaREAeKbkSHpQks5uDoI0gaJpZM4U9s3L>
> .
>
-- 

*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney



---


[GitHub] any23 issue #91: ANY23-356 update all dependencies

2018-07-05 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/91
  
@HansBrende see attached file. The exception is present towards the end of 
the file. Thsi output is generated directly on the terminal. Can you reproduce 
this

```mvn clean install > build.txt```

[build_out.txt](https://github.com/apache/any23/files/2167923/build_out.txt)



---


[GitHub] any23 issue #91: ANY23-356 update all dependencies

2018-07-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/91
  
Hi @HansBrende please see the attachment, there are some NoDefClassFound 
issues which I think are probably leading to broken test suite. 

[build.txt](https://github.com/apache/any23/files/2156812/build.txt)



---


[GitHub] any23 issue #91: ANY23-356 update all dependencies

2018-07-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/91
  
Wow @HansBrende this looks great. I'll scope it out right now. 


---


[GitHub] any23 issue #90: ANY23-355: Deprecate RDFa11Parser since Rio implementations...

2018-07-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/90
  
+1


---


[GitHub] any23 issue #88: ANY23-347 fixed RDFParseExceptions caused by unbound xml na...

2018-06-28 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/88
  
+1


---


[GitHub] any23 issue #89: ANY23-353 fallback on string datatype for langstring with n...

2018-06-28 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/89
  
+1


---


[GitHub] any23 issue #86: ANY23-351 fixed NullPointerException in HCardExtractor

2018-06-27 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/86
  
Pull PR and tested locally. I've inspected it and have no comments for 
improvement. +1 from me.


---


[GitHub] any23 issue #85: ANY23-348 handle malformed microdata types gracefully

2018-05-22 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/85
  
I think this all sounds reasonable. +1


---


[GitHub] any23 issue #83: ANY23-352 updated to rdf4j version 2.3.2

2018-05-16 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/83
  
+1 nice work


---


[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.

2018-05-09 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/82
  
Thats fine @JulioCCBUcuenca I will deal with it all... thank you for the 
patch.
Can you make sure that you are populating the reporting page on the Any23 
wiki and sending me a hyperlink to your weekly report at the end of every week? 
Thank you


---


[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.

2018-05-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/82
  
@JulioCCBUcuenca very nice very nice indeed. A lot cleaner.

> How can I update the site documentation?


https://cwiki.apache.org/confluence/display/ANY23/Building+Apache+Any23+Website+HOW_TO

The Website source lives at 
https://github.com/apache/any23/tree/master/src/site


---


[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.

2018-05-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/82
  
This looks really nice @JulioCCBUcuenca I am going to suggest the following

1. Lets retain the JSONWriter... please add your code additions to a new 
JSONLDWriter.
1. Add a new JSONLDWriterTest.java for the same


---


[GitHub] any23 pull request #81: ANY23-322 Any23 embedded service is broken

2018-04-18 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/81

ANY23-322 Any23 embedded service is broken

This issue addresses https://issues.apache.org/jira/browse/ANY23-322

@HansBrende 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-322

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit c271cdb33855f1d9e35c89e39e61df5216449952
Author: Lewis John McGibbney <lewis.mcgibbney@...>
Date:   2018-04-18T20:10:04Z

ANY23-322 Any23 embedded service is broken




---


[GitHub] any23 pull request #71: ANY23-341 Remove dependency on defunct commons-httpc...

2018-04-03 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/71#discussion_r178925133
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/html/HTMLDocument.java ---
@@ -375,15 +376,16 @@ public String getDefaultLanguage() {
 
 private java.net.URI getBaseIRI() throws ExtractionException {
 if (baseIRI == null) {
+String uri = (document instanceof Document ? 
(Document)document : document.getOwnerDocument()).getDocumentURI();
 try {
-if (document.getBaseURI() == null) {
-log.warn("document.getBaseURI() is null, this should 
not happen");
+if (uri == null) {
+log.warn("document.getBaseURI() is null, this should 
not happen", new Exception());
--- End diff --

Can the Exception be more specific?


---


[GitHub] any23 issue #68: ANY23-340 Removes doctypes to allow extraction of additiona...

2018-04-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/68
  
+1


---


[GitHub] any23 issue #69: ANY23-336 Hacky patch to tide us over until jsonldjava 0.11...

2018-04-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/69
  
I pulled this locally and tested. In terms of the extraction performance, I 
can confirm a significant improvement. I would be +1 to merging into master.


---


[GitHub] any23 issue #67: ANY23-339 fixes itemscope hashcode collision problem, allow...

2018-03-30 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/67
  
I haven't read The Reality Dysfunction... but the PR look good :)


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-27 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
I logged an issue in JIRA about this. I’ve also been over to see that
disappointingly there is not Maven artifact available for OpenIE 5.
I didn’t hear anything back from the OpenIE team either
https://github.com/dair-iitd/OpenIE-standalone/issues/6
I suppose one of us could go and contribute the module, however at that
time, I believe I just started working on the Any23 Service + OpenIE
integration. As I said it was a while ago.

On Tue, Feb 27, 2018 at 06:34 Hans Brende <notificati...@github.com> wrote:

> On a slightly unrelated topic, according to this github page
> <https://github.com/knowitall/openie>, the current version of OpenIE is
> OpenIE 5, located here <https://github.com/dair-iitd/OpenIE-standalone>,
> whereas it looks like we are using OpenIE 4.2.6.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/any23/pull/56#issuecomment-368897453>, or mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABHJl-Ek4C9mQYf7-23oCUzOq8erMT21ks5tZBJegaJpZM4RROYY>
> .
>
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc



---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-26 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
> My main concern, if this will be deployed to any23.org, would be the RAM 
usage.

We can always request more RAM for the dedicated VM powering any23.org

> One solution might be to add a configuration flag that either enables or 
disables OpenIE processing (the OpenIE toggle would only be displayed if the 
flag were enabled)--and disable the flag by default. 

This is exactly whats provided in the patch. I guessed the it should be 
disabled by default. You can see examples of the toggle functionality in the 
snapshot's below

https://user-images.githubusercontent.com/1165719/36713077-a68e77ea-1b40-11e8-9a7c-68fa7aaa73b9.png;>

https://user-images.githubusercontent.com/1165719/36713083-aa8dc080-1b40-11e8-8590-4084b5996c0f.png;>



---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-26 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
Any comments here folks?


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
OK folks, so this issue is now available for testing. I can see two 
immediate issues

1. OpenIE is memory-intensive and can prety easily exhaust >6GB memory for 
larger jobs. This is an issue if we are running this as a service via any23.org,
1. There is a considerable amount of bloat on the resulting service 
artifacts e.g. WAR, tarballs, zip files, etc. See below
```
lmcgibbn@LMC-056430 /usr/local/any23/service/target(ANY23-321) $ ls -lh
total 13499344
drwxr-xr-x  5 lmcgibbn  wheel   170B Feb 23 17:47 
apache-any23-service-2.2-SNAPSHOT
-rw-r--r--  1 lmcgibbn  wheel   851M Feb 23 17:50 
apache-any23-service-2.2-SNAPSHOT-server-embedded.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   851M Feb 23 17:50 
apache-any23-service-2.2-SNAPSHOT-server-embedded.zip
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:48 
apache-any23-service-2.2-SNAPSHOT-with-deps.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:49 
apache-any23-service-2.2-SNAPSHOT-with-deps.zip
-rw-r--r--  1 lmcgibbn  wheel   784M Feb 23 17:49 
apache-any23-service-2.2-SNAPSHOT-without-deps.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   783M Feb 23 17:48 
apache-any23-service-2.2-SNAPSHOT-without-deps.war
-rw-r--r--  1 lmcgibbn  wheel   784M Feb 23 17:49 
apache-any23-service-2.2-SNAPSHOT-without-deps.zip
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:47 
apache-any23-service-2.2-SNAPSHOT.war
drwxr-xr-x  2 lmcgibbn  wheel68B Feb 23 17:47 archive-tmp
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 classes
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 generated-sources
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 generated-test-sources
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 maven-archiver
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 maven-status
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 test-classes
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 war-legals
```


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
I've messed this PR up now so I will close and reopen another once I have 
the dependency issue resolved. I'll also provide an update to the documentation 
as to how dynamic classloading is done per example.  


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
Hi @HansBrende your suggestion fixed the code so I have pushed for further 
review.
@ansell thank you for the suggestions I think you are absolutely right... 
I've posted a 
[question](https://lists.apache.org/thread.html/3dfa2a8fbe170efc88274b639ce0b1032d838046c1c20a361195c330@%3Cusers.maven.apache.org%3E)
 on users@maven along with my 
[followup](https://lists.apache.org/thread.html/4ce2d66fe7fff8794f2a08b0c8b3450a1bcbdcb1e40214ab1d1c5059@%3Cusers.maven.apache.org%3E)
 so hopefully I can keep the service pom clean by obtaining the correct 
solution.
I'll post here once I know. Thanks all.



---


[GitHub] any23 issue #64: ANY23-329 fixed pom.xml version to 2.2-SNAPSHOT

2018-02-13 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/64
  
I've pused these changes remotely @ferrerod 

> I thought master should always be buildable.

It should always be yes.


---


[GitHub] any23 issue #64: ANY23-329 fixed pom.xml version to 2.2-SNAPSHOT

2018-02-12 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/64
  
This will be fixed when I run a new release candidate folks. When we 
release 2.2 we will increment version to 2.3-SNAPSHOT


---


[GitHub] any23 issue #65: ANY23-328 Strip comments from json-ld to make parsing more ...

2018-02-12 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/65
  
+1, tested locally. Anyone else have comments?


---


[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...

2018-02-09 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/60
  
1. Look at the call to 
[extractJSONLDScript](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractor.java#L81-L82)
 we use it further down. 
2. Yes this could probably be cleaned up, it is a bit messy right now.
3. Right now yes it is correct but I am not sure if it is the intended 
behavior. When i originally added the logic, it was what I wrote... if it does 
not make sense then please send a PR and we can clean it up.

For reference on extraction results, please see the following test files 
[html-embedded-jsonld-extractor-multiple.html

](https://github.com/apache/any23/blob/master/test-resources/src/test/resources/html/html-embedded-jsonld-extractor-multiple.html)
 and [html-embedded-jsonld-extractor.html

](https://github.com/apache/any23/blob/master/test-resources/src/test/resources/html/html-embedded-jsonld-extractor.html)



---


[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...

2018-02-09 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/60
  
```
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
```


---


[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...

2018-02-09 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/60
  
Hi @ferrerod which Java version are you running?
Master build is stable
https://builds.apache.org/view/A/view/Any23/job/Any23-trunk/


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
@HansBrende not yet sorry, I have been overloaded :( I will try tonight.


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-03 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
Thanks @HansBrende :)
Once you have built the Webservice and deployed to Tomcat, you should make 
sure that you check the **openie** checkbox in the GUI.
You can set a breakpoint at the [openie conditional toggle on in 
Servlet.java](https://github.com/lewismc/any23/blob/ANY23-321/service/src/main/java/org/apache/any23/servlet/Servlet.java#L93)
 which will enable you to see that the parameter is being received on server 
side. From here, you need to look into how the Any23PluginManager does dynamic 
classloading.
Let me know how you get on and I can provide a bit more context as to how 
the directory openie structure works for dynamic loading.
Thanks in advance, 


---


[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service

2018-02-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/56
  
@HansBrende @jgrzebyta can you both please have a look at dynamic plugin 
loading in this patch? I am at a loss, having tried to debug this with no 
progress. Thanks in advance, 


---


[GitHub] any23 issue #62: ANY23-327 Change log level to debug for RDFUtils.isAbsolute...

2018-02-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/62
  
@jgrzebyta can you merge to master?


---


[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...

2018-01-24 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/59#discussion_r163680939
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java ---
@@ -105,7 +109,24 @@ public void run(
 
parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES);
 //ByteBuffer seems to represent incorrect content. Need to 
make sure it is the content
 //of the 

[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...

2018-01-24 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/59#discussion_r163679147
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java ---
@@ -105,7 +109,24 @@ public void run(
 
parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES);
 //ByteBuffer seems to represent incorrect content. Need to 
make sure it is the content
 //of the 

[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...

2018-01-24 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/59#discussion_r163678949
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java ---
@@ -22,16 +22,20 @@
 import org.apache.any23.extractor.ExtractionParameters;
 import org.apache.any23.extractor.ExtractionResult;
 import org.apache.any23.extractor.Extractor;
-import org.eclipse.rdf4j.rio.RDFHandlerException;
-import org.eclipse.rdf4j.rio.RDFParseException;
-import org.eclipse.rdf4j.rio.RDFParser;
-import org.eclipse.rdf4j.rio.RioSetting;
+import org.eclipse.rdf4j.rio.*;
--- End diff --

Can you please make explicit imports rather than wildcard.


---


[GitHub] any23 issue #58: ANY23-324 Changed default html parser from NekoHTML to Jsou...

2018-01-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/58
  
@HansBrende thank you very much for providing the above tests, I also ran 
them locally and can confirm a runtime performance improvement as well. I've 
merged this into master however I've noted that this issue does not fix 
ANy23-317 and ANY23-326. We still have work to do there. Thank you very much 
for the PR this is a significant improvement for Any23.


---


[GitHub] any23 pull request #58: ANY23-324 Changed default html parser from NekoHTML ...

2018-01-23 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/58#discussion_r163327423
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/html/TagSoupParsingConfiguration.java
 ---
@@ -0,0 +1,224 @@
+package org.apache.any23.extractor.html;
--- End diff --

Please add license header


---


[GitHub] any23 issue #58: ANY23-324 Changed default html parser from NekoHTML to Jsou...

2018-01-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/58
  
This is great @HansBrende I will review when i get a chance.


---


[GitHub] any23 issue #57: Fix HTTP repository link: change to HTTPS

2018-01-08 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/57
  
Thank you @frankier for your PR. 
Can I please ask you to open a JIRA issue the next time such that Github 
development is linked to our project management. This should be included within 
the contributing guidelines.
Thank you again,


---


[GitHub] any23 pull request #56: ANY23-321 Add openie toggle functionality to service

2018-01-02 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/56

ANY23-321 Add openie toggle functionality to service

This issue is a large step towards addressing 
https://issues.apache.org/jira/browse/ANY23-321
I am however not able to currently register the OpenIEExtractor within the 
ExtractorGroup/Factory when undertaking an extractor. I've debugged this down 
to the code in the URLClassLoader.addURL of 
[Any23PluginManager](https://github.com/apache/any23/blob/master/api/src/main/java/org/apache/any23/plugin/Any23PluginManager.java#L453).
 The OpenIE JAR's are dynamically loaded however the Extractor implementation 
does not seem to be registered when the extraction is executed.

@ansell if you are able to pull this code and debug it would be greatly 
appreciated. I have specific lines for debugging if this would be helpful. 
Thank you in advance for any assistance here.

P.S. you will also see that I've been making attempts to update the [plugin 
documentation](http://any23.apache.org/any23-plugins.html). 

P.P.S I actually remember encountering a similar issue previously when 
attempting to register a plugin via the command line... I think dynamic 
ClassLoading is broken in Any23 right now. I am keen to fix it so any help here 
is appreciated folks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-321

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/56.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #56


commit 706e891cf582736f90cfbe83bc1ef5d629e6dfd7
Author: Lewis John McGibbney <lewis.mcgibbney@...>
Date:   2018-01-03T00:05:39Z

ANY23-321 Add openie toggle functionality to service




---


[GitHub] any23 pull request #55: ANY23-320 Address @Ignore tests in Any23 and ANY23-1...

2017-12-31 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/55

ANY23-320 Address @Ignore tests in Any23 and ANY23-131 Nested Microdata are 
not extracted

Hi Folks, this PR addresses both 
https://issues.apache.org/jira/projects/ANY23/issues/ANY23-320 and 
https://issues.apache.org/jira/projects/ANY23/issues/ANY23-131

Additionally, as usual this patch contains a significant volume of 
formatting and code convention improvements I encountered whilst attempting to 
fix the bugs associated with the above JIRA issues.

It would be greatly appreciated if folks could pull this PR and test it out.
Thanks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/55.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #55


commit 60e93a76748e53c413529409fb545e2245013639
Author: Lewis John McGibbney <lewis.mcgibbney@...>
Date:   2018-01-01T02:58:36Z

ANY23-320 Address @Ignore tests in Any23 and ANY23-131 Nested Microdata are 
not extracted




---


[GitHub] any23 pull request #54: ANY23-319 Upgrade jsonld-java dependency to 0.11.1

2017-12-29 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/54

ANY23-319 Upgrade jsonld-java dependency to 0.11.1



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-319

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit 0adafd17501e0c22d8ff546f105303b0d1b53e65
Author: Lewis John McGibbney <lewis.mcgibbney@...>
Date:   2017-12-30T02:13:14Z

ANY23-319 Upgrade jsonld-java dependency to 0.11.1




---


[GitHub] any23 pull request #53: ANY23-140 - Revise Any23 tests to remove fetching of...

2017-12-29 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/53

ANY23-140 - Revise Any23 tests to remove fetching of web content

This issue re-enables the 'online' tests as essentially the jsonld tests 
require fetching of Web content anyways.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-140

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/53.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #53


commit 6b262ad3f89ed45f2d731076d1dd865e1e95bcaa
Author: Lewis John McGibbney <lewis.mcgibbney@...>
Date:   2017-12-27T21:41:39Z

ANY23-140 - Revise Any23 tests to remove fetching of web content




---


  1   2   >