[jira] [Commented] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661590#comment-16661590
 ] 

Hudson commented on ANY23-404:
--

SUCCESS: Integrated in Jenkins build Any23-trunk #1628 (See 
[https://builds.apache.org/job/Any23-trunk/1628/])
ANY23-404 hardcode default microdata registry (hans: rev 
6b1469152ccd30f93b0686a73bd1ba02955d6411)
* (edit) 
core/src/test/java/org/apache/any23/extractor/microdata/MicrodataExtractorTest.java
* (add) test-resources/src/test/resources/microdata/example2.html
* (add) test-resources/src/test/resources/microdata/example5.html
* (edit) 
core/src/main/java/org/apache/any23/extractor/microdata/MicrodataExtractor.java


> Make MicrodataExtractor compliant with default registry
> ---
>
> Key: ANY23-404
> URL: https://issues.apache.org/jira/browse/ANY23-404
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: microdata
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> Default registry located here: 
> http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-404.
---
Resolution: Fixed

> Make MicrodataExtractor compliant with default registry
> ---
>
> Key: ANY23-404
> URL: https://issues.apache.org/jira/browse/ANY23-404
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: microdata
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> Default registry located here: 
> http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-67) Microdata extraction using obsolete RDF conversion scheme

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661541#comment-16661541
 ] 

ASF GitHub Bot commented on ANY23-67:
-

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/124
  
## Update 
Merging ANY23-404 into master resulted in a reduction of failed tests from 
30 to 28.

Failed tests are now as follows:
Test 0002: Item with no itemtype and 2 elements with equivalent itemprop
Test 0003: Item with itemprop having two properties
Test 0046: Use of time with `@datetime` xsd:time
Test 0047: Use of time with `@datetime` xsd:dateTime
Test 0048: Use of time with `@datetime` xsd:duration
Test 0049: Use of time with `@datetime` invalid
Test 0051: relative URL as itemid
Test 0052: token property no `@itemtype`
Test 0053: token property empty `@itemtype`
Test 0054: token property and relative `@itemtype`
Test 0056: token property and multiple `@itemtype`s from different 
vocabularies
Test 0062: `@itemref` to single id
Test 0063: `@itemref` generates property values
Test 0064: `@itemref` to single id with different types
Test 0065: `@itemref` to multiple ids
Test 0066: `@itemref` with chaining
Test 0067: Shared `@itemref`
Test 0073: Vocabulary Expansion test with rdfs:subPropertyOf
Test 0074: Vocabulary Expansion test with owl:equivalentProperty
Test 0075: Use of data and xsd:float
Test 0076: Use of data and xsd:integer
Test 0077: Use of data and string
Test 0078: Use of meter and xsd:double
Test 0079: Use of meter and xsd:integer
Test 0080: Use of meter and xsd:string
Test 0081: Simple @itemprop-reverse (experimental)
Test 0082: `@itemprop-reverse` with `@itemscope` value (experimental)
Test 0084: `@itemprop-reverse` with `@itemprop` (experimental)


> Microdata extraction using obsolete RDF conversion scheme
> -
>
> Key: ANY23-67
> URL: https://issues.apache.org/jira/browse/ANY23-67
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Affects Versions: 0.7.0
>Reporter: Hannes Mühleisen
>Priority: Major
> Fix For: 2.3
>
>
> There is now a more-or-less final Microdata to RDF algorithm published[1] 
> which is different than the one in the current, official HTML5 draft [2] 
> (that Ian Hickson has publicly revoked). However, Any23s extractor uses the 
> old scheme according to a comment in its source code, which refers to [2]. 
> However, this is exactly the algorithm that Ian Hickson rescinded at some 
> point. Unfortunately, the official working drafts have not been updated for a 
> very long time, but if you look at the editor's draft [3], you will see that 
> that section has been entirely removed. Instead, there was a Semantic Web 
> Interest group task force that discussed the issues, and [1] is the result of 
> this discussion. It would be nice if this would be reflected in Any23 in the 
> future.
> [Condensed from an E-Mail conversation with Ivan Herman]
> [1] http://www.w3.org/TR/microdata-rdf/
> [2] http://www.w3.org/TR/microdata/#rdf
> [3] http://dev.w3.org/html5/md/Overview.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 issue #124: ANY23-67 test against online microdata test-suite

2018-10-23 Thread HansBrende
Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/124
  
## Update 
Merging ANY23-404 into master resulted in a reduction of failed tests from 
30 to 28.

Failed tests are now as follows:
Test 0002: Item with no itemtype and 2 elements with equivalent itemprop
Test 0003: Item with itemprop having two properties
Test 0046: Use of time with `@datetime` xsd:time
Test 0047: Use of time with `@datetime` xsd:dateTime
Test 0048: Use of time with `@datetime` xsd:duration
Test 0049: Use of time with `@datetime` invalid
Test 0051: relative URL as itemid
Test 0052: token property no `@itemtype`
Test 0053: token property empty `@itemtype`
Test 0054: token property and relative `@itemtype`
Test 0056: token property and multiple `@itemtype`s from different 
vocabularies
Test 0062: `@itemref` to single id
Test 0063: `@itemref` generates property values
Test 0064: `@itemref` to single id with different types
Test 0065: `@itemref` to multiple ids
Test 0066: `@itemref` with chaining
Test 0067: Shared `@itemref`
Test 0073: Vocabulary Expansion test with rdfs:subPropertyOf
Test 0074: Vocabulary Expansion test with owl:equivalentProperty
Test 0075: Use of data and xsd:float
Test 0076: Use of data and xsd:integer
Test 0077: Use of data and string
Test 0078: Use of meter and xsd:double
Test 0079: Use of meter and xsd:integer
Test 0080: Use of meter and xsd:string
Test 0081: Simple @itemprop-reverse (experimental)
Test 0082: `@itemprop-reverse` with `@itemscope` value (experimental)
Test 0084: `@itemprop-reverse` with `@itemprop` (experimental)


---


[jira] [Commented] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661530#comment-16661530
 ] 

ASF GitHub Bot commented on ANY23-404:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/125


> Make MicrodataExtractor compliant with default registry
> ---
>
> Key: ANY23-404
> URL: https://issues.apache.org/jira/browse/ANY23-404
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: microdata
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> Default registry located here: 
> http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #125: ANY23-404 hardcode default microdata registry

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/125


---


[jira] [Commented] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661526#comment-16661526
 ] 

ASF GitHub Bot commented on ANY23-404:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/125

ANY23-404 hardcode default microdata registry

This PR should ensure that our microdata extractor is compliant with the 
standard default microdata registry in terms of vocabulary expansion and 
property URI generation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-404

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #125


commit 6b1469152ccd30f93b0686a73bd1ba02955d6411
Author: Hans 
Date:   2018-10-24T00:37:37Z

ANY23-404 hardcode default microdata registry




> Make MicrodataExtractor compliant with default registry
> ---
>
> Key: ANY23-404
> URL: https://issues.apache.org/jira/browse/ANY23-404
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: microdata
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> Default registry located here: 
> http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #125: ANY23-404 hardcode default microdata registry

2018-10-23 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/125

ANY23-404 hardcode default microdata registry

This PR should ensure that our microdata extractor is compliant with the 
standard default microdata registry in terms of vocabulary expansion and 
property URI generation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-404

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #125


commit 6b1469152ccd30f93b0686a73bd1ba02955d6411
Author: Hans 
Date:   2018-10-24T00:37:37Z

ANY23-404 hardcode default microdata registry




---


[jira] [Updated] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende updated ANY23-404:
--
Issue Type: Sub-task  (was: Improvement)
Parent: ANY23-67

> Make MicrodataExtractor compliant with default registry
> ---
>
> Key: ANY23-404
> URL: https://issues.apache.org/jira/browse/ANY23-404
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: microdata
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> Default registry located here: 
> http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ANY23-404) Make MicrodataExtractor compliant with default registry

2018-10-23 Thread Hans Brende (JIRA)
Hans Brende created ANY23-404:
-

 Summary: Make MicrodataExtractor compliant with default registry
 Key: ANY23-404
 URL: https://issues.apache.org/jira/browse/ANY23-404
 Project: Apache Any23
  Issue Type: Improvement
  Components: microdata
Affects Versions: 2.3
Reporter: Hans Brende
Assignee: Hans Brende
 Fix For: 2.3


Default registry located here: 

http://w3c.github.io/microdata-rdf/#default-registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : Any23-trunk #1627

2018-10-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (ANY23-67) Microdata extraction using obsolete RDF conversion scheme

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661321#comment-16661321
 ] 

ASF GitHub Bot commented on ANY23-67:
-

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/124
  
@lewismc Yes and no. While we certainly don't need to address all of these 
test failures before the next release, I want to make sure that property URI 
generation works as expected for all namespaces in the default registry, at 
least. That should be a quick fix.


> Microdata extraction using obsolete RDF conversion scheme
> -
>
> Key: ANY23-67
> URL: https://issues.apache.org/jira/browse/ANY23-67
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Affects Versions: 0.7.0
>Reporter: Hannes Mühleisen
>Priority: Major
> Fix For: 2.3
>
>
> There is now a more-or-less final Microdata to RDF algorithm published[1] 
> which is different than the one in the current, official HTML5 draft [2] 
> (that Ian Hickson has publicly revoked). However, Any23s extractor uses the 
> old scheme according to a comment in its source code, which refers to [2]. 
> However, this is exactly the algorithm that Ian Hickson rescinded at some 
> point. Unfortunately, the official working drafts have not been updated for a 
> very long time, but if you look at the editor's draft [3], you will see that 
> that section has been entirely removed. Instead, there was a Semantic Web 
> Interest group task force that discussed the issues, and [1] is the result of 
> this discussion. It would be nice if this would be reflected in Any23 in the 
> future.
> [Condensed from an E-Mail conversation with Ivan Herman]
> [1] http://www.w3.org/TR/microdata-rdf/
> [2] http://www.w3.org/TR/microdata/#rdf
> [3] http://dev.w3.org/html5/md/Overview.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 issue #124: ANY23-67 test against online microdata test-suite

2018-10-23 Thread HansBrende
Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/124
  
@lewismc Yes and no. While we certainly don't need to address all of these 
test failures before the next release, I want to make sure that property URI 
generation works as expected for all namespaces in the default registry, at 
least. That should be a quick fix.


---


[jira] [Resolved] (ANY23-373) Web page /install.html: software version variable was not decoded.

2018-10-23 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-373.
---
Resolution: Fixed

> Web page /install.html: software version variable was not decoded.
> --
>
> Key: ANY23-373
> URL: https://issues.apache.org/jira/browse/ANY23-373
> Project: Apache Any23
>  Issue Type: Bug
>  Components: documentation, site
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Minor
> Fix For: 2.3
>
>
> Web page {{/install.html}} contains unsolved {{$project.version}} variable in 
> text.
> For example:
> {code}
> Unzip the distribution archive, i.e. apache-any23-$project.version-bin.zip to 
> the directory you ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-373) Web page /install.html: software version variable was not decoded.

2018-10-23 Thread Hans Brende (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661306#comment-16661306
 ] 

Hans Brende commented on ANY23-373:
---

[~grzebyta.dev] I was able to verify that the updated site docs in the project 
work as expected. However, I wasn't able to redeploy the site because of errors 
with certain plugins. So, this issue is fixed, but we just need to redeploy the 
site to synchronize it with the current version of the project.

> Web page /install.html: software version variable was not decoded.
> --
>
> Key: ANY23-373
> URL: https://issues.apache.org/jira/browse/ANY23-373
> Project: Apache Any23
>  Issue Type: Bug
>  Components: documentation, site
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Minor
> Fix For: 2.3
>
>
> Web page {{/install.html}} contains unsolved {{$project.version}} variable in 
> text.
> For example:
> {code}
> Unzip the distribution archive, i.e. apache-any23-$project.version-bin.zip to 
> the directory you ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-397) Add ModelWriter

2018-10-23 Thread Hans Brende (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661297#comment-16661297
 ] 

Hans Brende commented on ANY23-397:
---

[~jgrzebyta] no, I haven't implemented this.

> Add ModelWriter
> ---
>
> Key: ANY23-397
> URL: https://issues.apache.org/jira/browse/ANY23-397
> Project: Apache Any23
>  Issue Type: New Feature
>  Components: api, core
>Reporter: Jacek Grzebyta
>Priority: Trivial
> Fix For: 2.3
>
>
> It would be useful if there was a RDFHandler able to write data into RDF4J 
> Model.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-397) Add ModelWriter

2018-10-23 Thread Jacek Grzebyta (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661263#comment-16661263
 ] 

Jacek Grzebyta commented on ANY23-397:
--

[~HansBrende] Did you solved that issue as well? I am not sure.

> Add ModelWriter
> ---
>
> Key: ANY23-397
> URL: https://issues.apache.org/jira/browse/ANY23-397
> Project: Apache Any23
>  Issue Type: New Feature
>  Components: api, core
>Reporter: Jacek Grzebyta
>Priority: Trivial
> Fix For: 2.3
>
>
> It would be useful if there was a RDFHandler able to write data into RDF4J 
> Model.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #121: ANY23-396 Add ability to run extractors in flow

2018-10-23 Thread jgrzebyta
Github user jgrzebyta closed the pull request at:

https://github.com/apache/any23/pull/121


---


Build failed in Jenkins: Any23-trunk #1626

2018-10-23 Thread Apache Jenkins Server
See 


Changes:

[Hans] ANY23-396 Overhaul WriterFactory API

--
[...truncated 590.69 KB...]
Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Building index for all the packages and classes...
Generating 

Generating 

Generating 

Building index for all classes...
Generating 

Generating 

Generating 

Generating 

Generating 

2 errors
41 warnings
[JENKINS] Archiving aggregated javadoc
[WARNING] Attempt to (de-)serialize anonymous class hudson.FilePath$32; see: 
https://jenkins.io/redirect/serialization-of-anonymous-classes/
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Any23 2.3-SNAPSHOT .. FAILURE [ 48.014 s]
[INFO] Apache Any23 :: Base API ... SUCCESS [01:09 min]
[INFO] Apache Any23 :: Test Resources . SUCCESS [ 12.782 s]
[INFO] Apache Any23 :: CSV Utilities .. SUCCESS [  9.696 s]
[INFO] Apache Any23 :: Mime Type Detection  SUCCESS [ 20.610 s]
[INFO] Apache Any23 :: Encoding Detection . SUCCESS [ 11.882 s]
[INFO] Apache Any23 :: Core ... SUCCESS [ 47.740 s]
[INFO] Apache Any23 :: Plugins :: Office Scraper .. SUCCESS [ 23.784 s]
[INFO] Apache Any23 :: Plugins :: HTML Scraper  SUCCESS [ 17.467 s]
[INFO] Apache Any23 :: CLI  SUCCESS [ 53.919 s]
[INFO] Apache Any23 :: Plugins :: Basic Crawler ... SUCCESS [01:03 min]
[INFO] Apache Any23 :: Plugins :: OpenIE .. SUCCESS [ 24.410 s]
[INFO] Apache Any23 :: Plugins :: Integration Test  SUCCESS [02:05 min]
[INFO] Apache Any23 :: Service 2.3-SNAPSHOT ... SUCCESS [14:33 min]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 26:33 min
[INFO] Finished at: 2018-10-23T19:48:14Z
[INFO] 
Waiting for Jenkins to finish collecting data
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) on 
project apache-any23: An error has occurred in Javadoc report generation:
[ERROR] Exit code: 1 - 
:37:
 warning: no @return
[ERROR] RDFFormat getRdfFormat();
[ERROR] ^
[ERROR] 
:45:
 warning: no @return
[ERROR] String getMimeType();
[ERROR] ^
[ERROR] 
:51:
 warning: no @param for os
[ERROR] FormatWriter getRdfWriter(OutputStream os);
[ERROR] ^
[ERROR] 
:51:
 warning: no @return
[ERROR] FormatWriter getRdfWriter(OutputStream os);
[ERROR] ^
[ERROR] 
:46:
 warning: no @param for 
[ERROR] public static  Key newKey(String identifier, Class valueType) {
[ERROR] ^
[ERROR] 
:227:
 

[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661209#comment-16661209
 ] 

Hudson commented on ANY23-396:
--

FAILURE: Integrated in Jenkins build Any23-trunk #1626 (See 
[https://builds.apache.org/job/Any23-trunk/1626/])
ANY23-396 Overhaul WriterFactory API (hans: rev 
692c583f848c5b7ae5a7940c857bfb0a9542c0d5)
* (edit) core/src/main/java/org/apache/any23/writer/TurtleWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/URIListWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/URIListWriterFactory.java
* (edit) cli/src/main/java/org/apache/any23/cli/Rover.java
* (add) test-resources/src/test/resources/cli/basic-with-stylesheet.html
* (edit) core/src/main/java/org/apache/any23/writer/JSONLDWriterFactory.java
* (add) api/src/test/java/org/apache/any23/configuration/SettingsTest.java
* (edit) core/src/main/java/org/apache/any23/writer/package-info.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONLDWriter.java
* (add) api/src/main/java/org/apache/any23/writer/DecoratingWriterFactory.java
* (add) 
cli/src/test/resources/META-INF/services/org.apache.any23.writer.WriterFactory
* (edit) cli/src/test/java/org/apache/any23/cli/RoverTest.java
* (add) core/src/main/java/org/apache/any23/writer/WriterSettings.java
* (add) cli/src/test/java/org/apache/any23/cli/flows/PeopleExtractor.java
* (add) api/src/main/java/org/apache/any23/writer/TripleFormat.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFWriterTripleHandler.java
* (edit) api/src/main/java/org/apache/any23/writer/WriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONWriterFactory.java
* (add) cli/src/test/java/org/apache/any23/cli/flows/PeopleExtractorFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NTriplesWriterFactory.java
* (add) api/src/main/java/org/apache/any23/writer/TripleWriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NQuadsWriter.java
* (edit) core/src/test/java/org/apache/any23/writer/JSONWriterTest.java
* (add) api/src/test/java/org/apache/any23/writer/TripleFormatTest.java
* (edit) service/src/main/java/org/apache/any23/servlet/WebResponder.java
* (add) core/src/main/java/org/apache/any23/writer/TripleWriterHandler.java
* (edit) core/src/test/java/org/apache/any23/writer/WriterRegistryTest.java
* (edit) api/src/main/java/org/apache/any23/writer/WriterFactoryRegistry.java
* (edit) core/src/main/java/org/apache/any23/writer/NTriplesWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFXMLWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/TriXWriter.java
* (add) api/src/main/java/org/apache/any23/configuration/Setting.java
* (add) api/src/main/java/org/apache/any23/writer/TripleWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFXMLWriterFactory.java
* (add) cli/src/test/java/org/apache/any23/cli/ExtractorsFlowTest.java
* (edit) core/src/main/java/org/apache/any23/writer/TriXWriterFactory.java
* (edit) api/pom.xml
* (edit) core/src/main/java/org/apache/any23/writer/TurtleWriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NQuadsWriterFactory.java
* (add) api/src/main/java/org/apache/any23/configuration/Settings.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONWriter.java


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing 

[jira] [Assigned] (ANY23-388) It should be possible to configure the NTriplesWriter to use unicode points

2018-10-23 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende reassigned ANY23-388:
-

Assignee: Hans Brende

> It should be possible to configure the NTriplesWriter to use unicode points
> ---
>
> Key: ANY23-388
> URL: https://issues.apache.org/jira/browse/ANY23-388
> Project: Apache Any23
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 2.2
>Reporter: Lars G. Svensson
>Assignee: Hans Brende
>Priority: Minor
> Fix For: 2.3
>
>
> When using the NTriplesWriter, I wanted to configure it to write unicode 
> points as escape sequences. I tried to subclass 
> org.apache.any23.writer.TripleHandler and overwrite the access to the 
> org.eclipse.rdf4j.rio.ntriples.NTriplesWriter but couldn't do that since the 
> access to the NTriplesWriter is package protected. I ended up copying the 
> code which seems a bit clunky...
> I was [asked to create a pull 
> request|https://mail-archives.apache.org/mod_mbox/any23-user/201808.mbox/%3CCAGaRif3KAQbnK6XSKnVUZAJOn8aR_iCmJmGv_L05yg6nBkB%3DwA%40mail.gmail.com%3E],
>  this issue is there to track that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-396.
---
Resolution: Fixed

> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661152#comment-16661152
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/121
  
Now that ANY23-396 has been implemented in #122 and merged into master, can 
we close this PR? @lewismc ? @jgrzebyta ? I don't have the required permissions 
to close issues myself.


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 issue #121: ANY23-396 Add ability to run extractors in flow

2018-10-23 Thread HansBrende
Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/121
  
Now that ANY23-396 has been implemented in #122 and merged into master, can 
we close this PR? @lewismc ? @jgrzebyta ? I don't have the required permissions 
to close issues myself.


---


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661141#comment-16661141
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/122


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #122: ANY23-396 Overhaul WriterFactory API

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/122


---


[GitHub] any23 issue #124: ANY23-67 test against online microdata test-suite

2018-10-23 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/124
  
wow OK yes lots of work to be done here... do you think a release is in 
order first before we work on this?


---