[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661209#comment-16661209
 ] 

Hudson commented on ANY23-396:
--

FAILURE: Integrated in Jenkins build Any23-trunk #1626 (See 
[https://builds.apache.org/job/Any23-trunk/1626/])
ANY23-396 Overhaul WriterFactory API (hans: rev 
692c583f848c5b7ae5a7940c857bfb0a9542c0d5)
* (edit) core/src/main/java/org/apache/any23/writer/TurtleWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/URIListWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/URIListWriterFactory.java
* (edit) cli/src/main/java/org/apache/any23/cli/Rover.java
* (add) test-resources/src/test/resources/cli/basic-with-stylesheet.html
* (edit) core/src/main/java/org/apache/any23/writer/JSONLDWriterFactory.java
* (add) api/src/test/java/org/apache/any23/configuration/SettingsTest.java
* (edit) core/src/main/java/org/apache/any23/writer/package-info.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONLDWriter.java
* (add) api/src/main/java/org/apache/any23/writer/DecoratingWriterFactory.java
* (add) 
cli/src/test/resources/META-INF/services/org.apache.any23.writer.WriterFactory
* (edit) cli/src/test/java/org/apache/any23/cli/RoverTest.java
* (add) core/src/main/java/org/apache/any23/writer/WriterSettings.java
* (add) cli/src/test/java/org/apache/any23/cli/flows/PeopleExtractor.java
* (add) api/src/main/java/org/apache/any23/writer/TripleFormat.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFWriterTripleHandler.java
* (edit) api/src/main/java/org/apache/any23/writer/WriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONWriterFactory.java
* (add) cli/src/test/java/org/apache/any23/cli/flows/PeopleExtractorFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NTriplesWriterFactory.java
* (add) api/src/main/java/org/apache/any23/writer/TripleWriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NQuadsWriter.java
* (edit) core/src/test/java/org/apache/any23/writer/JSONWriterTest.java
* (add) api/src/test/java/org/apache/any23/writer/TripleFormatTest.java
* (edit) service/src/main/java/org/apache/any23/servlet/WebResponder.java
* (add) core/src/main/java/org/apache/any23/writer/TripleWriterHandler.java
* (edit) core/src/test/java/org/apache/any23/writer/WriterRegistryTest.java
* (edit) api/src/main/java/org/apache/any23/writer/WriterFactoryRegistry.java
* (edit) core/src/main/java/org/apache/any23/writer/NTriplesWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFXMLWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/TriXWriter.java
* (add) api/src/main/java/org/apache/any23/configuration/Setting.java
* (add) api/src/main/java/org/apache/any23/writer/TripleWriter.java
* (edit) core/src/main/java/org/apache/any23/writer/RDFXMLWriterFactory.java
* (add) cli/src/test/java/org/apache/any23/cli/ExtractorsFlowTest.java
* (edit) core/src/main/java/org/apache/any23/writer/TriXWriterFactory.java
* (edit) api/pom.xml
* (edit) core/src/main/java/org/apache/any23/writer/TurtleWriterFactory.java
* (edit) core/src/main/java/org/apache/any23/writer/NQuadsWriterFactory.java
* (add) api/src/main/java/org/apache/any23/configuration/Settings.java
* (edit) core/src/main/java/org/apache/any23/writer/JSONWriter.java


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing 

[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661152#comment-16661152
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/121
  
Now that ANY23-396 has been implemented in #122 and merged into master, can 
we close this PR? @lewismc ? @jgrzebyta ? I don't have the required permissions 
to close issues myself.


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661141#comment-16661141
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/122


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659444#comment-16659444
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/122
  
@lewismc If you don't have any more comments, I'm ready to merge this in.

One question: would you prefer I squash the commits before merging, or not?


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658444#comment-16658444
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/122
  
@lewismc the reason I ask is that my **cleanup item 4** allows me to 
specify a new method in `TripleWriter` which accepts a group as a `Resource` 
rather than as an `IRI`. That's what I've done in my latest cleanup commit. 

I'll leave this PR open for at least another day before merging to master 
just in case anyone comes up with any further comments or concerns.


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This 

[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657268#comment-16657268
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/122
  
I've reviewed this patch a few times now and I think it is looking great. 
The above thread is very helpful to see how these API's changes cam about so 
thank you for the in-depth thought process you've engaged in. It makes a huge 
difference.
Pull locally, tested, evaluated again I am +1 to merge.


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657011#comment-16657011
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on a diff in the pull request:

https://github.com/apache/any23/pull/122#discussion_r226702275
  
--- Diff: api/src/main/java/org/apache/any23/configuration/Setting.java ---
@@ -118,7 +128,22 @@ private Type getValueType() {
 }
 }
 
-protected abstract V checkedValue(Setting original, V newValue) 
throws Exception;
+/**
+ * Subclasses may override this method to check that new settings 
for this key are valid,
+ * and/or to decorate new setting values, using, for example, 
{@link Collections#unmodifiableList(List)}.
+ * The default implementation of this method throws a {@link 
NullPointerException} if the new value is null and the initial value was 
non-null.
+ *
+ * @param initial the setting containing the initial value for 
this key, or null if the setting has not yet been initialized
+ * @param newValue the new value for this setting
+ * @return the new value for this setting
+ * @throws Exception if the new value for this setting was invalid
+ */
+protected V checkedValue(Setting initial, V newValue) throws 
Exception {
+if (newValue == null && initial != null && initial.value != 
null) {
+throw new NullPointerException();
+}
+return newValue;
+}
--- End diff --

Actually, we should not allow keys to decorate values. Consider the 
following scenario: user copies the value from one setting into another 
setting. Now the key is decorating a value that has *already been decorated*. 
This could lead to an unfortunate chain of, e.g., 
```

Collections.unmodifiableList(Collections.unmodifiableList(Collections.unmodifiableList(...
 )))
```

Therefore, any decorating should happen *before* the setting is created, 
and if the value is not appropriately decorated, the key should throw an 
exception in the value check. 

Also we should change this method's signature to:
```
protected void checkValue(Setting initial, V newValue) throws Exception
```


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} 

[jira] [Commented] (ANY23-396) Overhaul WriterFactory API

2018-10-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655969#comment-16655969
 ] 

ASF GitHub Bot commented on ANY23-396:
--

Github user HansBrende commented on the issue:

https://github.com/apache/any23/pull/122
  
@lewismc I just added some unit tests & javadoc for the `Settings` API. Let 
me know your thoughts!


> Overhaul WriterFactory API
> --
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2
>Reporter: Jacek Grzebyta
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)