[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user stain commented on the issue: https://github.com/apache/commons-rdf/pull/43 You may have better ideas on how to do something like [ParserConfigImpl](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/ParserConfigImpl.java) so it is serializable. For instance the [ParserSource](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/ParserSource.java) beans might or might not be serializable depending on the implementation. (Surely one that is just connected to an open InputStream is not serializable, but one that has just got an IRI should be serializable. I made the implementations package private: [IRIParserSource](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/IRIParserSource.java) which meant I had to make a new [IRIImpl](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/IRIImpl.java) to avoid Simple dependency) --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user stain commented on the issue: https://github.com/apache/commons-rdf/pull/43 Thanks, @ajs6f ! I found some old code where I had tried to make a fluent interface.. I managed to update it to the current master and sorted some issues. I pushed it to the `fluent-parser` branch See https://github.com/apache/commons-rdf/compare/fluent-parser I haven't implemented the `Parser` yet or written any tests. Basically the idea is this: ```java Parsedp = rdf.parserBuilder() .syntax(RDFSyntax.JSONLD) .source("http://example.com/data.jsonld;) .parse(); ``` or: ```java rdf.parserBuilder() .syntax(RDFSyntax.TURTLE) .target(quad -> System.out.println(quad.getSubject())) .source(Paths.get("/tmp/file.ttl"). .async().parseAsync(); ``` Now there is a set of interfaces, one for each step along the way, e.g. `NeedTarget`, and some internal `_` package interfaces to ensure consistency (but this can be flattened). Note that it is easier in this code to explore this in Eclipse with auto-complete as the interfaces have not been flattened yet. It is implemented by a single `AbstractParserBuilder` which keeps all its state (except async executor) in a `ParserConfig` bean. The builder can be made immutable using `.build()` after which any change will make it mutable again. While it's mutable it will mutate the bean without any copies. There is also a more low-level `Parser` which takes a `ParserConfig` - this is basically how the RDF implementations can be invoked. I have not moved over the preflight checks in AbstractRDFParser there. Feel free to use it as a starting ground or inspiration! It's quite hard to do fluent interfaces.. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 Okay, I'll get to work on factoring out a (serializable) factory. Thanks for the discussion everyone! I think the parser types will be much stronger for it. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 Hm... if @afs and @stain are both feeling reluctant to go this way... I would still be happy to merge it as-is and then work forward to the fluent-ish factory design (since @ansell has given a LGTM) and there is agreement that having fields not be `Optional`s is good in any case. @stain, would you like to see some checks against `null`-valued fields? I can certainly add those easily enough. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user stain commented on the issue: https://github.com/apache/commons-rdf/pull/43 Picking up this - @ajs6f do you still think we should proceed along those lines? I am also reluctant to making the abstract factory (builder) serializable, but I can see the reasoning, particularly if you want to use this in Hadoop or something where you have a pre-made parser builder and then tell a different node to run it. One thing I feel I need to check more is that there is no reading of the now-usually-null fields beyond the getters - I might rename them to make that clear. I have put this PR into upstream branch COMMONSRDF-49 we can contribute to. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 @wikier My sense is that the right move here immediately is to merge this PR and then open a ticket to factor config (and a builder therefor) out of `AbstractRDFParser`. (A ticket I will be happy to work!) --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ansell commented on the issue: https://github.com/apache/commons-rdf/pull/43 Having ``Optional`` fields isn't impossible to serialise (as I said erroneously in the referenced comment), as you could always write custom serialise/deserialise code to support it, but it isn't supported by default, intentionally: http://mail.openjdk.java.net/pipermail/jdk8-dev/2013-September/003274.html I haven't looked at the code/framework for a long time, but my recollection was that the parser was based on a single-threaded model after it came out of the multi-thread callable factory. If you are only expecting to run on a single thread, you may not have the support structures in place to run over multiple machines in a typical fashion, which is what ``Serializable`` has as one of its implied meanings. That would be a broader discussion though and shouldn't stop users from benefiting from serialisation in cases where they know it is safe/possible. Overall, changing the instance variables to nullable raw types, and then using their accessors to see them as ``Optional`` is a compromise that allows serialisation without custom coding for it, and hence this PR looks good to me. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 ð --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user wikier commented on the issue: https://github.com/apache/commons-rdf/pull/43 Having a little understanding on the concrete use case behind this PR, generally speaking I normally prefer having the factories/builders implementing `Serializable`. So in this concrete case I probably agree that the parser doesn't need that configuration. But again, I lack the background that motivated this request. Maybe having @ansell clarifying [`COMMONSRDF-49`](https://issues.apache.org/jira/browse/COMMONSRDF-49) may help to lead to a conclusion. So, regarding the ongoing `0.5.0` release, I think it'd be better to keep this out of `RC2`, refine the requirements, and then reach an agreement that could be shipped in a future release. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 I'd like to get @wikier an answer to his question. It sounds like we are _not_ comfortable merging this for `RC2` and he should go ahead without it, correct? If the consensus is that serializable config should be factored out of `AbstractRDFParser` (and I agree with that approach) than I think we should close `COMMONSRDF-49` and open a new ticket that is more exact (or edit `COMMONSRDF-49`) so that @ansell understands what to expect. I don't have the mojo in Commons RDF's Jira to do that. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user stain commented on the issue: https://github.com/apache/commons-rdf/pull/43 Agree with @afs that only the builder-factory bit should be serialized - obviously the actual parser which may be in progress of parsing is tricky to serialize. So we would need to perhaps clean up the class names/method to separate those conserns, without going too much towards a `AbstractBuilderFactorySingletonFactory`. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 So the question is whether what is here now is a step forward. I think it is, and I would want to do what is here whether or not we go onto a builder factoring. I intentionally did not add the marker interface to the type, and I think it's okay for the moment (especially as this is in the `experimental` package). I can and will do a builder refactoring, but I don't want to slow anyone down waiting for that. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user afs commented on the issue: https://github.com/apache/commons-rdf/pull/43 All I did was point out that it isn't logically serializable at all points in its usage. My suggestion is that there is design with a builder (serializable) and instance (not serializable) which separates these two concerns. YMMV. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 My impression is that @afs objects to the current design, but I haven't heard feedback on [my suggestion](https://github.com/apache/commons-rdf/pull/43#issuecomment-341721718). --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 Well, it's hard for me to say, because I'm trying to predict the architecture of a system I am only beginning to design. :( @afs, @sebbASF, how about I break out a separate type for serializable config and a builder for it and we take that forward? It might be enough and if it isn't we can revisit the question then. I could leave `AbstractRDFParser` as-is, although I'd rather alter it, both for the reasons @kinow gave in Jira and if I break out a config-builder to indicate the preferred usage. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user afs commented on the issue: https://github.com/apache/commons-rdf/pull/43 AbstractRDFParser is stateful at some points in its lifecycle - input stream and the threadpool/futures. Is this mixing a builder for the configuration, where serialization makes sense, with parser instantiation and binding to local resources, where serialization (of the clone) doesn't make sense? --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user ajs6f commented on the issue: https://github.com/apache/commons-rdf/pull/43 From my POV, serializing a parser is useful for very long or very large ETL when you may need to either pause and resume parsing (persisting the state of the task in between) or move parsing tasks from one portion of the execution environment to another. I'm not saying it's a no-brainer. :) It may be better to have a subtype that is serializable but leave this type alone? --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable
Github user afs commented on the issue: https://github.com/apache/commons-rdf/pull/43 Relaying a question from sebb on the JIRA: https://issues.apache.org/jira/browse/COMMONSRDF-49#comment-16236065 including pointing out the long term restriction that comes with it. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org