[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2018-02-14 Thread stain
Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
You may have better ideas on how to do something like 
[ParserConfigImpl](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/ParserConfigImpl.java)
 so it is serializable.  For instance the 
[ParserSource](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/ParserSource.java)
 beans might or might not be serializable depending on the implementation.

(Surely one that is just connected to an open InputStream is not 
serializable, but one that has just got an IRI should be serializable. I made 
the implementations package private: 
[IRIParserSource](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/IRIParserSource.java)
 which meant I had to make a new 
[IRIImpl](https://github.com/apache/commons-rdf/blob/fluent-parser/commons-rdf-api/src/main/java/org/apache/commons/rdf/api/io/IRIImpl.java)
 to avoid Simple dependency)


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2018-02-14 Thread stain
Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Thanks, @ajs6f !

I found some old code where I had tried to make a fluent interface.. I 
managed to update it to the current master and sorted some issues.

I pushed it to the `fluent-parser` branch

See https://github.com/apache/commons-rdf/compare/fluent-parser 

I haven't implemented the `Parser` yet or written any tests.

Basically the idea is this:

```java
  Parsed p = rdf.parserBuilder()
  .syntax(RDFSyntax.JSONLD)
  .source("http://example.com/data.jsonld;)
  .parse();
```

or:

```java
rdf.parserBuilder()
  .syntax(RDFSyntax.TURTLE)
  .target(quad -> System.out.println(quad.getSubject()))  
  .source(Paths.get("/tmp/file.ttl").
  .async().parseAsync();
```

Now there is a set of interfaces, one for each step along the way, e.g. 
`NeedTarget`, and 
some internal `_` package interfaces to ensure consistency (but this can be 
flattened). Note that it is easier in this code to explore this in Eclipse with 
auto-complete as the interfaces have not been flattened yet.

It is implemented by a single `AbstractParserBuilder` which keeps all its 
state (except async executor) in a `ParserConfig` bean. The builder can be made 
immutable using `.build()` after which any change will make it mutable again.  
While it's mutable it will mutate the bean without any copies.

There is also a more low-level `Parser` which takes a `ParserConfig` - this 
is basically how the RDF implementations can be invoked.

I have not moved over the preflight checks in AbstractRDFParser there.

Feel free to use it as a starting ground or inspiration! It's quite hard to 
do fluent interfaces..


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2018-02-14 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Okay, I'll get to work on factoring out a (serializable) factory. Thanks 
for the discussion everyone! I think the parser types will be much stronger for 
it.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2018-02-14 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Hm... if @afs and @stain are both feeling reluctant to go this way... I 
would still be happy to merge it as-is and then work forward to the fluent-ish 
factory design (since @ansell has given a LGTM) and there is agreement that 
having fields not be `Optional`s is good in any case. @stain, would you like to 
see some checks against `null`-valued fields? I can certainly add those easily 
enough.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2018-02-12 Thread stain
Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Picking up this - @ajs6f do you still think we should proceed along those 
lines? I am also reluctant to making the abstract factory (builder) 
serializable, but I can see the reasoning, particularly if you want to use this 
in Hadoop or something where you have a pre-made parser builder and then tell a 
different node to run it.

One thing I feel I need to check more is that there is no reading of the 
now-usually-null fields beyond the getters - I might rename them to make that 
clear.

I have put this PR into upstream branch COMMONSRDF-49 we can contribute to.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-19 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
@wikier My sense is that the right move here immediately is to merge this 
PR and then open a ticket to factor config (and a builder therefor) out of 
`AbstractRDFParser`. (A ticket I will be happy to work!)


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-15 Thread ansell
Github user ansell commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Having ``Optional`` fields isn't impossible to serialise (as I said 
erroneously in the referenced comment), as you could always write custom 
serialise/deserialise code to support it, but it isn't supported by default, 
intentionally:

http://mail.openjdk.java.net/pipermail/jdk8-dev/2013-September/003274.html 

I haven't looked at the code/framework for a long time, but my recollection 
was that the parser was based on a single-threaded model after it came out of 
the multi-thread callable factory. If you are only expecting to run on a single 
thread, you may not have the support structures in place to run over multiple 
machines in a typical fashion, which is what ``Serializable`` has as one of its 
implied meanings. That would be a broader discussion though and shouldn't stop 
users from benefiting from serialisation in cases where they know it is 
safe/possible.

Overall, changing the instance variables to nullable raw types, and then 
using their accessors to see them as ``Optional`` is a compromise that allows 
serialisation without custom coding for it, and hence this PR looks good to me.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-15 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
👍 


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-15 Thread wikier
Github user wikier commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Having a little understanding on the concrete use case behind this PR, 
generally speaking I normally prefer having the factories/builders implementing 
`Serializable`. So in this concrete case I probably agree that the parser 
doesn't need that configuration. But again, I lack the background that 
motivated this request. Maybe having @ansell clarifying 
[`COMMONSRDF-49`](https://issues.apache.org/jira/browse/COMMONSRDF-49) may help 
to lead to a conclusion.

So, regarding the ongoing `0.5.0` release, I think it'd be better to keep 
this out of `RC2`, refine the requirements, and then reach an agreement that 
could be shipped in a future release.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-15 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
I'd like to get @wikier an answer to his question. It sounds like we are 
_not_ comfortable merging this for `RC2` and he should go ahead without it, 
correct? 

If the consensus is that serializable config should be factored out of 
`AbstractRDFParser` (and I agree with that approach) than I think we should 
close `COMMONSRDF-49` and open a new ticket that is more exact (or edit 
`COMMONSRDF-49`) so that @ansell understands what to expect. I don't have the 
mojo in Commons RDF's Jira to do that.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-15 Thread stain
Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Agree with @afs that only the builder-factory bit should be serialized - 
obviously the actual parser which may be in progress of parsing is tricky to 
serialize. So we would need to perhaps clean up the class names/method to 
separate those conserns, without going too much towards a 
`AbstractBuilderFactorySingletonFactory`.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-14 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
So the question is whether what is here now is a step forward.

I think it is, and I would want to do what is here whether or not we go 
onto a builder factoring. I intentionally did not add the marker interface to 
the type, and I think it's okay for the moment (especially as this is in the 
`experimental` package). I can and will do a builder refactoring, but I don't 
want to slow anyone down waiting for that.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-13 Thread afs
Github user afs commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
All I did was point out that it isn't logically serializable at all points 
in its usage. 

My suggestion is that there is design with a builder (serializable) and 
instance (not serializable) which separates these two concerns. YMMV.



---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-13 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
My impression is that @afs objects to the current design, but I haven't 
heard feedback on [my 
suggestion](https://github.com/apache/commons-rdf/pull/43#issuecomment-341721718).


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-03 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Well, it's hard for me to say, because I'm trying to predict the 
architecture of a system I am only beginning to design.  :(

@afs, @sebbASF, how about I break out a separate type for serializable 
config and a builder for it and we take that forward? It might be enough and if 
it isn't we can revisit the question then. I could leave `AbstractRDFParser` 
as-is, although I'd rather alter it, both for the reasons @kinow gave in Jira 
and if I break out a config-builder to indicate the preferred usage. 


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-03 Thread afs
Github user afs commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
AbstractRDFParser is stateful at some points in its lifecycle - input 
stream and the threadpool/futures.

Is this mixing a builder for the configuration, where serialization makes 
sense, with parser instantiation and binding to local resources, where 
serialization (of the clone) doesn't make sense?



---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-03 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
From my POV, serializing a parser is useful for very long or very large ETL 
when you may need to either pause and resume parsing (persisting the state of 
the task in between) or move parsing tasks from one portion of the execution 
environment to another. I'm not saying it's a no-brainer. :)

It may be better to have a subtype that is serializable but leave this type 
alone?


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-rdf issue #43: COMMONSRDF-49: Make AbstractRDFParser serializable

2017-11-03 Thread afs
Github user afs commented on the issue:

https://github.com/apache/commons-rdf/pull/43
  
Relaying a question from sebb on the JIRA: 
https://issues.apache.org/jira/browse/COMMONSRDF-49#comment-16236065 including 
pointing out the long term restriction that comes with it.


---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org