[GitHub] any23 issue #132: ANY23-419 Add J2EE dependencies such that service runs und...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/132 Any objections to merge? ---
[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/132 OK folks, PR updated. Thank you for the review. I've tested locally and inclusion of only of dependency now enables deploying Any23 master in Tomcat 9 running JDK11. ---
[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/132 OK, let me rework things on my end and see what I can come up with. ---
[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/132 First thing is 1) you don't need to access BUILDING.txt however 2) see https://apache.org/dev/freebsd-jails#opie ---
[GitHub] any23 issue #132: ANY23-419 Add J2EE depednencies such that service runs und...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/132 The issue I was having was when running it in Tomcat. From what I could tell Tomcat does not ship with these libraries either so they need to be added manually. Please try it out yourself and see where you get. I am by no means a Tomcat, Servlet Container or J2EE expert to I am not 100% sure this fix is optimum. ---
[GitHub] any23 issue #131: ANY23-418 improve TikaEncodingDetector
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/131 You've brought up an excellent topic for conversation. Tika currently has a batch, regression job which essentially enables them to run over loads of documents and analyze the output. The result being that they know how changes to the source are affecting Tika's ability to do what it claims it is doing over time. We do not have that in Any23 but I think we should make an effort to build bridges withe the Tika community in this regard with the aim of us sharing resources (both available computing to run large batch parse jobs, as well as dataset(s) we can use to run Any23 over.) I have been thinking for the longest time now about implementing a ```tika.triplify``` API which would encapsulate Any23 run it on the Tika data streams but I just never got around to it. Maybe now is a better time to bring that idea back to life. I was thinking we could possibly use common crawl but they do not publish the raw data AFAIK it is the Nutch segments or some alternative e.g. the WebArchive files. ---
[GitHub] any23 pull request #132: ANY23-419 Add J2EE depednencies such that service r...
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/132 ANY23-419 Add J2EE depednencies such that service runs under JDK11 This issue addresses https://issues.apache.org/jira/browse/ANY23-419 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-419 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #132 commit c57abb13de5d1bf0b81d5ea9cbe28038355842c7 Author: Lewis John McGibbney Date: 2018-11-08T03:00:04Z ANY23-419 Add J2EE depednencies such that service runs under JDK11 ---
[GitHub] any23 issue #131: ANY23-418 improve TikaEncodingDetector
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/131 There is a fair bit of code here but I am not really sure how to test it. Unfortunately I am going to say, please provide unit test. I've been aware of some encoding detection issues previously with Any23 but never got around to logging them here. ---
[GitHub] any23 issue #124: ANY23-67 test against online microdata test-suite
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/124 wow OK yes lots of work to be done here... do you think a release is in order first before we work on this? ---
[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/122 > The only reason I've hesitated so far about merging this PR is that once new interfaces are introduced as part of the core API, I'd prefer to never change them again--so I want to get it right the first time! Totally understandable I really like the idea of improving the interoperability logic here. > The reason being: shouldn't all return types...be, preferably, part of our own API, rather than RDF4J's? Yes that is a very fair statement. What, for example happens if and when the RDF4J project ceases to exist? Then of course we would end up implementing exactly what you are suggesting. I am certainly +1 to a TripleFormat API which is somewhat analogous and certainly interoperable with RDF4J's [RDFFormat](http://docs.rdf4j.org/javadoc/latest/index.html?org/eclipse/rdf4j/rio/RDFFormat.html). Very nice @HansBrende ---
[GitHub] any23 pull request #123: ANY23-399 Upgrade Apache parent POM to version 21
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/123 ANY23-399 Upgrade Apache parent POM to version 21 This issue addresses https://issues.apache.org/jira/browse/ANY23-399 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #123 commit 12640a953c76900ad0bbc990cb162a7aee8d03c3 Author: Lewis John McGibbney Date: 2018-09-25T01:59:46Z ANY23-399 Upgrade Apache parent POM to version 21 ---
[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/122 +1 from me ---
[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/122 I think this looks great. I have no further comments for improvement. I think the additions looks great. ---
[GitHub] any23 issue #122: ANY23-396 allow mapping/filtering TripleHandlers in Rover
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/122 I like this. As you've pointed out the deprecation is a bit harsh but it is certainly something we can live with. I would however urge you to add additional documentation e.g. ``` /** * @deprecated * explanation of why function was deprecated, if possible include what * should be used. */ ``` This looks really nice. ---
[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/120 @jgrzebyta good question. I previously used JDK 8 and Any23 was building absolutely fine. I experienced no issues. When I upgraded, the issues started... hence this PR. ---
[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/120 @HansBrende thanks for review, from what I remember, this is not packaged with Tomcat9 anymore, which is what users should be using to deploy the Any23 WAR. ---
[GitHub] any23 issue #120: ANY23-393 Any23 master to build under JDK 10.X
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/120 Any comments folks? ---
[GitHub] any23 issue #119: ANY23-392: Add configuration for maven-jetty-plugin
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/119 @jgrzebyta +1 ---
[GitHub] any23 issue #118: ANY23-390 implement ICal, JCal, and XCal extractors
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/118 +1 ---
[GitHub] any23 issue #119: ANY23-392: Add configuration for maven-jetty-plugin
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/119 +1 ---
[GitHub] any23 pull request #120: ANY23-393 Any23 master to build under JDK 10.X
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/120 ANY23-393 Any23 master to build under JDK 10.X Issue addresses build under following environment ``` Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T11:33:14-07:00) Maven home: /usr/local/Cellar/maven/3.5.4/libexec Java version: 10.0.2, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk-10.0.2.jdk/Contents/Home Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.12.6", arch: "x86_64", family: "Mac" ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-393 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/120.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #120 commit 537c6e4d2594de8cdcc3cd68862eb2efed9232c0 Author: Lewis John McGibbney Date: 2018-08-25T18:07:53Z ANY23-393 Any23 master to build under JDK 10.X ---
[GitHub] any23 issue #118: ANY23-390 implement ICal, JCal, and XCal extractors
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/118 This is dynamite! Sorry for not responding on corresponding JIRA issue. This looks great. ---
[GitHub] any23 issue #116: Any23 388
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/116 @HansBrende feel free to merge once you are satisfied. ---
[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/104 > Remember, that not only the libraries that librdfa uses are old but the librdfa project itself has not being active in a while. Yes but it is compliant with the RDFa test suite which is exactly what we want. It can be assumed to be somewhat complete... > If we are moving to other branch, could someone create a new branch, so I can point the PR to the new branch. @HansBrende can you please create a new branch called ANY23-295, I do not have access to a terminal right now. Thank you in advance, > I will be glad to keep contributing after GSoC. Honestly, I'd like the idea of becoming a commiter, so I have to keep working to earn it. ð ---
[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/104 > @lewismc should we not test more thoroughly before merging to master? Maybe a separate branch instead? Yes absolutely. We should also definitely extend the integration tests as well such that we can make more confident comparisons regarding runtime execution. ---
[GitHub] any23 issue #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/104 @JulioCCBUcuenca Please address the issue of making librdfa-rdf4j a module within Any23 then push your changes. At that stage I will most likely approve the PR and we can test more thoroughly. Thanks. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208642324 --- Diff: test-resources/src/test/resources/html/rdfa/basic.html --- @@ -14,7 +14,7 @@ See the License for the specific language governing permissions and limitations under the License. --> -http://www.w3.org/1999/xhtml;> +http://www.w3.org/1999/xhtml;> --- End diff -- Thats fine thanks for the explanation. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208642202 --- Diff: core/pom.xml --- @@ -335,7 +343,14 @@ - + + + + librdfa-rdf4j + https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/ --- End diff -- librdfa-rdf4j should definitely be a separate module of Any23, please make sure that it is and that the require JAR dependencies are located at build and compile/runtime such. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056664 --- Diff: librdfa-rdf4j/src/main/c/CMakeLists.txt --- @@ -0,0 +1,39 @@ +cmake_minimum_required(VERSION 2.8) --- End diff -- Missing license header. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208055581 --- Diff: core/pom.xml --- @@ -335,7 +343,14 @@ - + + + + librdfa-rdf4j + https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/ --- End diff -- This is a bit hacky... what is meant to reside at this URL? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208055205 --- Diff: core/pom.xml --- @@ -62,6 +62,14 @@ + + +${project.groupId} +apache-any23-librdfa +1.0.0 --- End diff -- Ensure that the versioning is consistent with the rest of the codebase e.g. 2.3-SNAPSHOT or whatever it is currently. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208055086 --- Diff: core/pom.xml --- @@ -62,6 +62,14 @@ + + +${project.groupId} --- End diff -- Please use 2 space indents the same as the rest of the file. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056226 --- Diff: librdfa-rdf4j/README.MD --- @@ -0,0 +1,63 @@ +# Librdfa - RDF4J + +RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse RDFa to triples. + +## Prerequisites + +You need to install the [librdfa](https://github.com/rdfa/librdfa) library. + +## Install + +``` mvn + + org.apache.any23 + apache-any23-librdfa + 1.0.0 + + + + + librdfa-rdf4j + https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/ --- End diff -- ... again, this looks quite untidy. What resides at this URL? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057089 --- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h --- @@ -0,0 +1,124 @@ +/* + * Copyright (c) 2008 Digital Bazaar, Inc. + * + * This file is part of librdfa. + * + * librdfa is free software: you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation, either version 3 of the + * License, or (at your option) any later version. + * + * librdfa is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with librdfa. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef _RDFA_PARSER_H_ +#define _RDFA_PARSER_H_ + +#include +#include +#include +#include +#include +#include + +struct rdfacontext; + +class Callback { +public: + +virtual ~Callback() { +//std::cout << "Callback::~Callback()" << std::endl; +} + +virtual void default_graph(char* subject, char* predicate, char* object, int object_type, char* datatype, char* language) { +} + +virtual void processor_graph(char* subject, char* predicate, char* object, int object_type, char* datatype, char* language) { +} + +virtual char* fill_data(size_t buffer_length) { +} + +virtual size_t fill_len() { +} +}; + +/** + * The RdfaParser class is a wrapper class for Java to provide a + * simple API for using librdfa in Java. + */ +class RdfaParser { +private: +Callback *_callback; +public: +/** + * The base URI that will be used when resolving relative pathnames + * in the document. + */ +std::string mBaseUri; + +/** + * The base RDFa context to use when setting the triple handler callback, + * buffer filler callback, and executing the parser call. + */ +rdfacontext* mBaseContext; + +RdfaParser(const char* baseUri) : _callback(0) { +mBaseUri = baseUri; +mBaseContext = rdfa_create_context(baseUri); +} + +/** + * Standard destructor. --- End diff -- ```constructor``` ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057348 --- Diff: librdfa-rdf4j/src/main/c/main.java --- @@ -0,0 +1,82 @@ + +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.nio.charset.StandardCharsets; +import org.apache.any23.rdf.librdfa.Callback; +import org.apache.any23.rdf.librdfa.RdfaParser; + +public class main { + + public static void main(String argv[]) { +System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on Windows) or libexample.so (on Linux) + +System.out.println("Adding and calling a normal C++ callback"); +System.out.println(""); +String ds = "\n" ++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n" ++ "http://www.w3.org/1999/xhtml\"\n; ++ " xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n" ++ "\n" ++ "Test 0001\n" ++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\"</a>; charset=\"UTF-8\">\n" ++ "\n" ++ "This photo was taken by Mark Birbeck.\n" ++ "\n" ++ ""; + +RdfaParser caller = new RdfaParser("http://www.google.com/;); --- End diff -- This should never be hardcoded. It should be read from main arg[]... correct? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056466 --- Diff: librdfa-rdf4j/pom.xml --- @@ -0,0 +1,139 @@ + --- End diff -- Add Apache License header. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057792 --- Diff: librdfa-rdf4j/src/main/resources/META-INF/services/org.eclipse.rdf4j.rio.RDFParserFactory --- @@ -0,0 +1 @@ +org.apache.any23.rdf.rdfa.LibrdfaRDFaParserFactory --- End diff -- License header if possible. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057284 --- Diff: librdfa-rdf4j/src/main/c/main.java --- @@ -0,0 +1,82 @@ + +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.nio.charset.StandardCharsets; +import org.apache.any23.rdf.librdfa.Callback; +import org.apache.any23.rdf.librdfa.RdfaParser; + +public class main { + + public static void main(String argv[]) { +System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on Windows) or libexample.so (on Linux) + +System.out.println("Adding and calling a normal C++ callback"); +System.out.println(""); +String ds = "\n" --- End diff -- This is very messy, it is not required please remove. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056890 --- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h --- @@ -0,0 +1,124 @@ +/* --- End diff -- Is there not an Apache License v2.0 version of this file? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056821 --- Diff: librdfa-rdf4j/src/main/c/RdfaParser.cpp --- @@ -0,0 +1,19 @@ +/* --- End diff -- Is there not an ALv2.0 version of this file? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208055953 --- Diff: core/src/main/java/org/apache/any23/extractor/rdfa/LibRdfaExtractorFactory.java --- @@ -0,0 +1,51 @@ +/* + * Copyright 2018 The Apache Software Foundation. --- End diff -- Please remove Copyright line ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208055002 --- Diff: api/src/main/resources/default-configuration.properties --- @@ -47,6 +47,14 @@ any23.extraction.metadata.domain.per.entity=off # registered the RDFa 1.0 legacy one (org.deri.any23.extractor.rdfa.RDFaExtractor). any23.extraction.rdfa.programmatic=on +# Allows to enable Librdfa Extractor. +# If 'on' will override the extractors with the programmatic option, +# RDFa 1.1 Extractor (org.deri.any23.extractor.rdfa.RDFa11Extractor) and --- End diff -- Replace ```deri``` with ```apache``` ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057557 --- Diff: librdfa-rdf4j/src/main/c/rdfa.i --- @@ -0,0 +1,40 @@ +%module(directors="1") rdfa --- End diff -- Missing license header. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056181 --- Diff: librdfa-rdf4j/README.MD --- @@ -0,0 +1,63 @@ +# Librdfa - RDF4J + +RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse RDFa to triples. + +## Prerequisites + +You need to install the [librdfa](https://github.com/rdfa/librdfa) library. + +## Install + +``` mvn + + org.apache.any23 + apache-any23-librdfa + 1.0.0 --- End diff -- Make sure the versioning is inline with existing Any23 versioning. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057394 --- Diff: librdfa-rdf4j/src/main/c/main.java --- @@ -0,0 +1,82 @@ + +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.nio.charset.StandardCharsets; +import org.apache.any23.rdf.librdfa.Callback; +import org.apache.any23.rdf.librdfa.RdfaParser; + +public class main { + + public static void main(String argv[]) { +System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on Windows) or libexample.so (on Linux) + +System.out.println("Adding and calling a normal C++ callback"); +System.out.println(""); +String ds = "\n" ++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n" ++ "http://www.w3.org/1999/xhtml\"\n; ++ " xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n" ++ "\n" ++ "Test 0001\n" ++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\"</a>; charset=\"UTF-8\">\n" ++ "\n" ++ "This photo was taken by Mark Birbeck.\n" ++ "\n" ++ ""; + +RdfaParser caller = new RdfaParser("http://www.google.com/;); +caller.init(); +Callback callback = new JavaCallback(new ByteArrayInputStream(ds.getBytes(StandardCharsets.UTF_8))); +caller.setCallback(callback); + +caller.parse(); +//rdfa.set_rdfa_parser(caller); --- End diff -- If not in use, please remove. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208058008 --- Diff: test-resources/src/test/resources/html/rdfa/basic.html --- @@ -14,7 +14,7 @@ See the License for the specific language governing permissions and limitations under the License. --> -http://www.w3.org/1999/xhtml;> +http://www.w3.org/1999/xhtml;> --- End diff -- Seems like a strange addition. Can you explain? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208054967 --- Diff: api/src/main/resources/default-configuration.properties --- @@ -47,6 +47,14 @@ any23.extraction.metadata.domain.per.entity=off # registered the RDFa 1.0 legacy one (org.deri.any23.extractor.rdfa.RDFaExtractor). --- End diff -- There should be no mention of ```org.deri.any23``` in this file. It should be ```org.apache.any23```, can you please correct this. Thanks ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056541 --- Diff: librdfa-rdf4j/pom.xml --- @@ -0,0 +1,139 @@ + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> --- End diff -- Make sure XML is formatted at 2 space indents to match existing XML for Any23 codebase. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056434 --- Diff: librdfa-rdf4j/README.MD --- @@ -0,0 +1,63 @@ +# Librdfa - RDF4J + +RDF4J parser that uses [librdfa](https://github.com/rdfa/librdfa) to parse RDFa to triples. + +## Prerequisites + +You need to install the [librdfa](https://github.com/rdfa/librdfa) library. + +## Install + +``` mvn + + org.apache.any23 + apache-any23-librdfa + 1.0.0 + + + + + librdfa-rdf4j + https://raw.github.com/JulioCCBUcuenca/librdfa-java/repository/ + + +``` + +## Compile + +`mvn clean install` + +## Use + +Add the library and you can parse an `InputStream` as you would do with [`Rio`](http://docs.rdf4j.org/javadoc/2.1/org/eclipse/rdf4j/rio/Rio.html). + +``` java +RDFParser rdfParser = Rio.createParser(RDFFormat.RDFA); +Model model = new LinkedHashModel(); +rdfParser.setRDFHandler(new StatementCollector(model)); +rdfParser.parse(in, "http://www.example.org./;); +``` + +## Benchmarking + +In general librdfa is 2-5 seconds faster than semargl. + +### librdfa-rdf4j +- round: 0.11 [+- 0.00] --- End diff -- What is the unit of these floating points? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056935 --- Diff: librdfa-rdf4j/src/main/c/RdfaParser.h --- @@ -0,0 +1,124 @@ +/* + * Copyright (c) 2008 Digital Bazaar, Inc. + * + * This file is part of librdfa. + * + * librdfa is free software: you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation, either version 3 of the + * License, or (at your option) any later version. + * + * librdfa is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with librdfa. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef _RDFA_PARSER_H_ +#define _RDFA_PARSER_H_ + +#include +#include +#include +#include +#include +#include + +struct rdfacontext; + +class Callback { +public: + +virtual ~Callback() { +//std::cout << "Callback::~Callback()" << std::endl; --- End diff -- Please remove if not in use. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057524 --- Diff: librdfa-rdf4j/src/main/c/main.java --- @@ -0,0 +1,82 @@ + +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.nio.charset.StandardCharsets; +import org.apache.any23.rdf.librdfa.Callback; +import org.apache.any23.rdf.librdfa.RdfaParser; + +public class main { + + public static void main(String argv[]) { +System.loadLibrary("rdfaJava"); // Attempts to load example.dll (on Windows) or libexample.so (on Linux) + +System.out.println("Adding and calling a normal C++ callback"); +System.out.println(""); +String ds = "\n" ++ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\;>\n" ++ "http://www.w3.org/1999/xhtml\"\n; ++ " xmlns:dc=\"http://purl.org/dc/elements/1.1/\;>\n" ++ "\n" ++ "Test 0001\n" ++ "https://ff.kis.v2.scr.kaspersky-labs.com/89CF2878-190C-6147-88B9-F9B46CAB8AD0/main.js\"</a>; charset=\"UTF-8\">\n" ++ "\n" ++ "This photo was taken by Mark Birbeck.\n" ++ "\n" ++ ""; + +RdfaParser caller = new RdfaParser("http://www.google.com/;); +caller.init(); +Callback callback = new JavaCallback(new ByteArrayInputStream(ds.getBytes(StandardCharsets.UTF_8))); +caller.setCallback(callback); + +caller.parse(); +//rdfa.set_rdfa_parser(caller); + + } +} + +class JavaCallback extends Callback { + + BufferedReader bis = null; + int len = 0; + + public JavaCallback(InputStream is) { +super(); +bis = new BufferedReader(new InputStreamReader(is)); + } + + @Override + public void default_graph(String subject, String predicate, String object, int object_type, String datatype, String language) { +System.out.println("default_graph(...)"); --- End diff -- Using ```System``` calls is never particularly safe... are all of these calls necessary? ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208056589 --- Diff: librdfa-rdf4j/pom.xml --- @@ -0,0 +1,139 @@ + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 +org.apache.any23 +apache-any23-librdfa +0.0.1-SNAPSHOT --- End diff -- This version needs to match the master versioning for Any23. ---
[GitHub] any23 pull request #104: Any23 295: Implement ability to use librdfa
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/104#discussion_r208057153 --- Diff: librdfa-rdf4j/src/main/c/main.java --- @@ -0,0 +1,82 @@ + --- End diff -- Missing license header. ---
[GitHub] any23 issue #111: Any23 295 librdfa module
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/111 Which of the PR's are you attempting to use? I think this PR is not suitable for merging. Please close it off and resubmit a clean PR, thank you @JulioCCBUcuenca ---
[GitHub] any23 issue #102: ANY23-367 update 'latest.stable.released' property
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/102 +1 thank you @HansBrende ---
[GitHub] any23 issue #91: ANY23-356 update all dependencies
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/91 +1 please merge excellent job On Thu, Jul 5, 2018 at 14:16 Hans Brende wrote: > @lewismc <https://github.com/lewismc> I just ran mvn clean install myself > after my latest commit, and the project built successfully. Can you confirm? > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/any23/pull/91#issuecomment-402855293>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ABHJl9A3HDYVRl5sN91OaREAeKbkSHpQks5uDoI0gaJpZM4U9s3L> > . > -- *Lewis* Dr. Lewis J. McGibbney Ph.D, B.Sc *Skype*: lewis.john.mcgibbney ---
[GitHub] any23 issue #91: ANY23-356 update all dependencies
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/91 @HansBrende see attached file. The exception is present towards the end of the file. Thsi output is generated directly on the terminal. Can you reproduce this ```mvn clean install > build.txt``` [build_out.txt](https://github.com/apache/any23/files/2167923/build_out.txt) ---
[GitHub] any23 issue #91: ANY23-356 update all dependencies
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/91 Hi @HansBrende please see the attachment, there are some NoDefClassFound issues which I think are probably leading to broken test suite. [build.txt](https://github.com/apache/any23/files/2156812/build.txt) ---
[GitHub] any23 issue #91: ANY23-356 update all dependencies
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/91 Wow @HansBrende this looks great. I'll scope it out right now. ---
[GitHub] any23 issue #90: ANY23-355: Deprecate RDFa11Parser since Rio implementations...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/90 +1 ---
[GitHub] any23 issue #88: ANY23-347 fixed RDFParseExceptions caused by unbound xml na...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/88 +1 ---
[GitHub] any23 issue #89: ANY23-353 fallback on string datatype for langstring with n...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/89 +1 ---
[GitHub] any23 issue #86: ANY23-351 fixed NullPointerException in HCardExtractor
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/86 Pull PR and tested locally. I've inspected it and have no comments for improvement. +1 from me. ---
[GitHub] any23 issue #85: ANY23-348 handle malformed microdata types gracefully
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/85 I think this all sounds reasonable. +1 ---
[GitHub] any23 issue #83: ANY23-352 updated to rdf4j version 2.3.2
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/83 +1 nice work ---
[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/82 Thats fine @JulioCCBUcuenca I will deal with it all... thank you for the patch. Can you make sure that you are populating the reporting page on the Any23 wiki and sending me a hyperlink to your weekly report at the end of every week? Thank you ---
[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/82 @JulioCCBUcuenca very nice very nice indeed. A lot cleaner. > How can I update the site documentation? https://cwiki.apache.org/confluence/display/ANY23/Building+Apache+Any23+Website+HOW_TO The Website source lives at https://github.com/apache/any23/tree/master/src/site ---
[GitHub] any23 issue #82: ANY23-231: Use RDF4J writer to return JSON-LD outputs.
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/82 This looks really nice @JulioCCBUcuenca I am going to suggest the following 1. Lets retain the JSONWriter... please add your code additions to a new JSONLDWriter. 1. Add a new JSONLDWriterTest.java for the same ---
[GitHub] any23 pull request #81: ANY23-322 Any23 embedded service is broken
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/81 ANY23-322 Any23 embedded service is broken This issue addresses https://issues.apache.org/jira/browse/ANY23-322 @HansBrende You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-322 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/81.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #81 commit c271cdb33855f1d9e35c89e39e61df5216449952 Author: Lewis John McGibbney <lewis.mcgibbney@...> Date: 2018-04-18T20:10:04Z ANY23-322 Any23 embedded service is broken ---
[GitHub] any23 pull request #71: ANY23-341 Remove dependency on defunct commons-httpc...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/71#discussion_r178925133 --- Diff: core/src/main/java/org/apache/any23/extractor/html/HTMLDocument.java --- @@ -375,15 +376,16 @@ public String getDefaultLanguage() { private java.net.URI getBaseIRI() throws ExtractionException { if (baseIRI == null) { +String uri = (document instanceof Document ? (Document)document : document.getOwnerDocument()).getDocumentURI(); try { -if (document.getBaseURI() == null) { -log.warn("document.getBaseURI() is null, this should not happen"); +if (uri == null) { +log.warn("document.getBaseURI() is null, this should not happen", new Exception()); --- End diff -- Can the Exception be more specific? ---
[GitHub] any23 issue #68: ANY23-340 Removes doctypes to allow extraction of additiona...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/68 +1 ---
[GitHub] any23 issue #69: ANY23-336 Hacky patch to tide us over until jsonldjava 0.11...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/69 I pulled this locally and tested. In terms of the extraction performance, I can confirm a significant improvement. I would be +1 to merging into master. ---
[GitHub] any23 issue #67: ANY23-339 fixes itemscope hashcode collision problem, allow...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/67 I haven't read The Reality Dysfunction... but the PR look good :) ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 I logged an issue in JIRA about this. Iâve also been over to see that disappointingly there is not Maven artifact available for OpenIE 5. I didnât hear anything back from the OpenIE team either https://github.com/dair-iitd/OpenIE-standalone/issues/6 I suppose one of us could go and contribute the module, however at that time, I believe I just started working on the Any23 Service + OpenIE integration. As I said it was a while ago. On Tue, Feb 27, 2018 at 06:34 Hans Brende <notificati...@github.com> wrote: > On a slightly unrelated topic, according to this github page > <https://github.com/knowitall/openie>, the current version of OpenIE is > OpenIE 5, located here <https://github.com/dair-iitd/OpenIE-standalone>, > whereas it looks like we are using OpenIE 4.2.6. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/any23/pull/56#issuecomment-368897453>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ABHJl-Ek4C9mQYf7-23oCUzOq8erMT21ks5tZBJegaJpZM4RROYY> > . > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 > My main concern, if this will be deployed to any23.org, would be the RAM usage. We can always request more RAM for the dedicated VM powering any23.org > One solution might be to add a configuration flag that either enables or disables OpenIE processing (the OpenIE toggle would only be displayed if the flag were enabled)--and disable the flag by default. This is exactly whats provided in the patch. I guessed the it should be disabled by default. You can see examples of the toggle functionality in the snapshot's below https://user-images.githubusercontent.com/1165719/36713077-a68e77ea-1b40-11e8-9a7c-68fa7aaa73b9.png;> https://user-images.githubusercontent.com/1165719/36713083-aa8dc080-1b40-11e8-8590-4084b5996c0f.png;> ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 Any comments here folks? ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 OK folks, so this issue is now available for testing. I can see two immediate issues 1. OpenIE is memory-intensive and can prety easily exhaust >6GB memory for larger jobs. This is an issue if we are running this as a service via any23.org, 1. There is a considerable amount of bloat on the resulting service artifacts e.g. WAR, tarballs, zip files, etc. See below ``` lmcgibbn@LMC-056430 /usr/local/any23/service/target(ANY23-321) $ ls -lh total 13499344 drwxr-xr-x 5 lmcgibbn wheel 170B Feb 23 17:47 apache-any23-service-2.2-SNAPSHOT -rw-r--r-- 1 lmcgibbn wheel 851M Feb 23 17:50 apache-any23-service-2.2-SNAPSHOT-server-embedded.tar.gz -rw-r--r-- 1 lmcgibbn wheel 851M Feb 23 17:50 apache-any23-service-2.2-SNAPSHOT-server-embedded.zip -rw-r--r-- 1 lmcgibbn wheel 846M Feb 23 17:48 apache-any23-service-2.2-SNAPSHOT-with-deps.tar.gz -rw-r--r-- 1 lmcgibbn wheel 846M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-with-deps.zip -rw-r--r-- 1 lmcgibbn wheel 784M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-without-deps.tar.gz -rw-r--r-- 1 lmcgibbn wheel 783M Feb 23 17:48 apache-any23-service-2.2-SNAPSHOT-without-deps.war -rw-r--r-- 1 lmcgibbn wheel 784M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-without-deps.zip -rw-r--r-- 1 lmcgibbn wheel 846M Feb 23 17:47 apache-any23-service-2.2-SNAPSHOT.war drwxr-xr-x 2 lmcgibbn wheel68B Feb 23 17:47 archive-tmp drwxr-xr-x 4 lmcgibbn wheel 136B Feb 23 17:47 classes drwxr-xr-x 3 lmcgibbn wheel 102B Feb 23 17:47 generated-sources drwxr-xr-x 3 lmcgibbn wheel 102B Feb 23 17:47 generated-test-sources drwxr-xr-x 3 lmcgibbn wheel 102B Feb 23 17:47 maven-archiver drwxr-xr-x 3 lmcgibbn wheel 102B Feb 23 17:47 maven-status drwxr-xr-x 4 lmcgibbn wheel 136B Feb 23 17:47 test-classes drwxr-xr-x 4 lmcgibbn wheel 136B Feb 23 17:47 war-legals ``` ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 I've messed this PR up now so I will close and reopen another once I have the dependency issue resolved. I'll also provide an update to the documentation as to how dynamic classloading is done per example. ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 Hi @HansBrende your suggestion fixed the code so I have pushed for further review. @ansell thank you for the suggestions I think you are absolutely right... I've posted a [question](https://lists.apache.org/thread.html/3dfa2a8fbe170efc88274b639ce0b1032d838046c1c20a361195c330@%3Cusers.maven.apache.org%3E) on users@maven along with my [followup](https://lists.apache.org/thread.html/4ce2d66fe7fff8794f2a08b0c8b3450a1bcbdcb1e40214ab1d1c5059@%3Cusers.maven.apache.org%3E) so hopefully I can keep the service pom clean by obtaining the correct solution. I'll post here once I know. Thanks all. ---
[GitHub] any23 issue #64: ANY23-329 fixed pom.xml version to 2.2-SNAPSHOT
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/64 I've pused these changes remotely @ferrerod > I thought master should always be buildable. It should always be yes. ---
[GitHub] any23 issue #64: ANY23-329 fixed pom.xml version to 2.2-SNAPSHOT
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/64 This will be fixed when I run a new release candidate folks. When we release 2.2 we will increment version to 2.3-SNAPSHOT ---
[GitHub] any23 issue #65: ANY23-328 Strip comments from json-ld to make parsing more ...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/65 +1, tested locally. Anyone else have comments? ---
[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/60 1. Look at the call to [extractJSONLDScript](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractor.java#L81-L82) we use it further down. 2. Yes this could probably be cleaned up, it is a bit messy right now. 3. Right now yes it is correct but I am not sure if it is the intended behavior. When i originally added the logic, it was what I wrote... if it does not make sense then please send a PR and we can clean it up. For reference on extraction results, please see the following test files [html-embedded-jsonld-extractor-multiple.html ](https://github.com/apache/any23/blob/master/test-resources/src/test/resources/html/html-embedded-jsonld-extractor-multiple.html) and [html-embedded-jsonld-extractor.html ](https://github.com/apache/any23/blob/master/test-resources/src/test/resources/html/html-embedded-jsonld-extractor.html) ---
[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/60 ``` java version "1.8.0_131" Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) ``` ---
[GitHub] any23 issue #60: ANY23-291 Allow JSONLD scripts to be located anywhere in do...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/60 Hi @ferrerod which Java version are you running? Master build is stable https://builds.apache.org/view/A/view/Any23/job/Any23-trunk/ ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 @HansBrende not yet sorry, I have been overloaded :( I will try tonight. ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 Thanks @HansBrende :) Once you have built the Webservice and deployed to Tomcat, you should make sure that you check the **openie** checkbox in the GUI. You can set a breakpoint at the [openie conditional toggle on in Servlet.java](https://github.com/lewismc/any23/blob/ANY23-321/service/src/main/java/org/apache/any23/servlet/Servlet.java#L93) which will enable you to see that the parameter is being received on server side. From here, you need to look into how the Any23PluginManager does dynamic classloading. Let me know how you get on and I can provide a bit more context as to how the directory openie structure works for dynamic loading. Thanks in advance, ---
[GitHub] any23 issue #56: ANY23-321 Add openie toggle functionality to service
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/56 @HansBrende @jgrzebyta can you both please have a look at dynamic plugin loading in this patch? I am at a loss, having tried to debug this with no progress. Thanks in advance, ---
[GitHub] any23 issue #62: ANY23-327 Change log level to debug for RDFUtils.isAbsolute...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/62 @jgrzebyta can you merge to master? ---
[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/59#discussion_r163680939 --- Diff: core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java --- @@ -105,7 +109,24 @@ public void run( parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES); //ByteBuffer seems to represent incorrect content. Need to make sure it is the content //of the
[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/59#discussion_r163679147 --- Diff: core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java --- @@ -105,7 +109,24 @@ public void run( parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES); //ByteBuffer seems to represent incorrect content. Need to make sure it is the content //of the
[GitHub] any23 pull request #59: ANY23-326 fixed rdfa issue with unclosed input & met...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/59#discussion_r163678949 --- Diff: core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java --- @@ -22,16 +22,20 @@ import org.apache.any23.extractor.ExtractionParameters; import org.apache.any23.extractor.ExtractionResult; import org.apache.any23.extractor.Extractor; -import org.eclipse.rdf4j.rio.RDFHandlerException; -import org.eclipse.rdf4j.rio.RDFParseException; -import org.eclipse.rdf4j.rio.RDFParser; -import org.eclipse.rdf4j.rio.RioSetting; +import org.eclipse.rdf4j.rio.*; --- End diff -- Can you please make explicit imports rather than wildcard. ---
[GitHub] any23 issue #58: ANY23-324 Changed default html parser from NekoHTML to Jsou...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/58 @HansBrende thank you very much for providing the above tests, I also ran them locally and can confirm a runtime performance improvement as well. I've merged this into master however I've noted that this issue does not fix ANy23-317 and ANY23-326. We still have work to do there. Thank you very much for the PR this is a significant improvement for Any23. ---
[GitHub] any23 pull request #58: ANY23-324 Changed default html parser from NekoHTML ...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/any23/pull/58#discussion_r163327423 --- Diff: core/src/main/java/org/apache/any23/extractor/html/TagSoupParsingConfiguration.java --- @@ -0,0 +1,224 @@ +package org.apache.any23.extractor.html; --- End diff -- Please add license header ---
[GitHub] any23 issue #58: ANY23-324 Changed default html parser from NekoHTML to Jsou...
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/58 This is great @HansBrende I will review when i get a chance. ---
[GitHub] any23 issue #57: Fix HTTP repository link: change to HTTPS
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/57 Thank you @frankier for your PR. Can I please ask you to open a JIRA issue the next time such that Github development is linked to our project management. This should be included within the contributing guidelines. Thank you again, ---
[GitHub] any23 pull request #56: ANY23-321 Add openie toggle functionality to service
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/56 ANY23-321 Add openie toggle functionality to service This issue is a large step towards addressing https://issues.apache.org/jira/browse/ANY23-321 I am however not able to currently register the OpenIEExtractor within the ExtractorGroup/Factory when undertaking an extractor. I've debugged this down to the code in the URLClassLoader.addURL of [Any23PluginManager](https://github.com/apache/any23/blob/master/api/src/main/java/org/apache/any23/plugin/Any23PluginManager.java#L453). The OpenIE JAR's are dynamically loaded however the Extractor implementation does not seem to be registered when the extraction is executed. @ansell if you are able to pull this code and debug it would be greatly appreciated. I have specific lines for debugging if this would be helpful. Thank you in advance for any assistance here. P.S. you will also see that I've been making attempts to update the [plugin documentation](http://any23.apache.org/any23-plugins.html). P.P.S I actually remember encountering a similar issue previously when attempting to register a plugin via the command line... I think dynamic ClassLoading is broken in Any23 right now. I am keen to fix it so any help here is appreciated folks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-321 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/56.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #56 commit 706e891cf582736f90cfbe83bc1ef5d629e6dfd7 Author: Lewis John McGibbney <lewis.mcgibbney@...> Date: 2018-01-03T00:05:39Z ANY23-321 Add openie toggle functionality to service ---
[GitHub] any23 pull request #55: ANY23-320 Address @Ignore tests in Any23 and ANY23-1...
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/55 ANY23-320 Address @Ignore tests in Any23 and ANY23-131 Nested Microdata are not extracted Hi Folks, this PR addresses both https://issues.apache.org/jira/projects/ANY23/issues/ANY23-320 and https://issues.apache.org/jira/projects/ANY23/issues/ANY23-131 Additionally, as usual this patch contains a significant volume of formatting and code convention improvements I encountered whilst attempting to fix the bugs associated with the above JIRA issues. It would be greatly appreciated if folks could pull this PR and test it out. Thanks You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-320 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/55.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #55 commit 60e93a76748e53c413529409fb545e2245013639 Author: Lewis John McGibbney <lewis.mcgibbney@...> Date: 2018-01-01T02:58:36Z ANY23-320 Address @Ignore tests in Any23 and ANY23-131 Nested Microdata are not extracted ---
[GitHub] any23 pull request #54: ANY23-319 Upgrade jsonld-java dependency to 0.11.1
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/54 ANY23-319 Upgrade jsonld-java dependency to 0.11.1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-319 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/54.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #54 commit 0adafd17501e0c22d8ff546f105303b0d1b53e65 Author: Lewis John McGibbney <lewis.mcgibbney@...> Date: 2017-12-30T02:13:14Z ANY23-319 Upgrade jsonld-java dependency to 0.11.1 ---
[GitHub] any23 pull request #53: ANY23-140 - Revise Any23 tests to remove fetching of...
GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/53 ANY23-140 - Revise Any23 tests to remove fetching of web content This issue re-enables the 'online' tests as essentially the jsonld tests require fetching of Web content anyways. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-140 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/53.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #53 commit 6b262ad3f89ed45f2d731076d1dd865e1e95bcaa Author: Lewis John McGibbney <lewis.mcgibbney@...> Date: 2017-12-27T21:41:39Z ANY23-140 - Revise Any23 tests to remove fetching of web content ---