Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-26 Thread Steve Lawrence
Sounds good to me. +1

On 3/26/20 11:52 AM, Beckerle, Mike wrote:
> You definitely cannot share a dp across threads this way.
> 
> I propose just deprecating the setExternalVariable API (along with the other 
> setters) and that pattern generally, and a new more functional style API  for 
> establishing tunables, external variables, and validation modes, that will 
> keep us out of trouble.
> 
> 
> 
> From: Steve Lawrence 
> Sent: Thursday, March 26, 2020 7:39 AM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> Even if we did add the locking, it's still possible to use these
> incorrectly, right? For example, say we had two threads like this:
> 
>   Thread {
> dp.setExternalVariable("foo", 1)
> dp.parse(foo1)
>   }
> 
>   Thread {
> dp.setExternalVariable("foo", 2)
> dp.parse(foo2)
>   }
> 
> The order of evaluation could be:
> 
>   dp.setExternalVariable("foo", 1)
>   dp.setExternalVariable("foo", 2)
>   dp.parse(foo1)
>   dp.parse(foo2)
> 
> So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
> use of setExternalVariable within threads can cause unexpected behavior.
> Locking needs to happen outside of Daffodil's control to make it work as
> expected. And if someone adds their own locking around everything, then
> our internal locking isn't necessary.
> 
> It feels to me like setExternalVariable is just fundamentally broken,
> and so trying to fix it is just going to add complication that still
> doesn't solve the fundamental problems.
> 
> 
> On 3/25/20 3:50 PM, Beckerle, Mike wrote:
>> It is possible to modify in a way that is 100% compatible with existing 
>> "correct" practice, which is to lock and unlock the object. Lock the state 
>> when any parse/unparse call is in flight, unlock when none are in flight.
>>
>> setters can then block waiting for the lock to be freed.
>>
>> That's guaranteed to be compatible with any correct current practice.
>>
>> I just hate to put all that complexity in there.
>> 
>> From: Steve Lawrence 
>> Sent: Wednesday, March 25, 2020 3:44 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> My one minor concern about the locking after parse/unparse is that it
>> changes behavior, and could potentially break things (I assume use after
>> lock is an assertion of some kind?).
>>
>> It's definitely possible to use the DataProcessor in a safe manner while
>> mutating the state currently if someone is careful, so to not allow
>> mutating it after calling parse/unparse could potentially break someones
>> existing use of Daffodil.
>>
>> What are your thoughts on @deprecating those functions, the deprecation
>> message can be something like "This is not thread-safe, be very careful
>> if you do not use it". And rather than locking and asserting, we could
>> just create a warning and say "You're using this in a potentially
>> dangerous manner, you should use the new API"?
>>
>> - Steve
>>
>> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>>> A fully functional API is probably good to have, and I'd like to define 
>>> that also and cut over to using that within Daffodil.
>>>
>>> We would need a dataProcessor.reset() method that creates a new dp which 
>>> has had all external vars, tunables, basically any of those formerly 
>>> settable things, forgotten so you can start clean.
>>>
>>> (Would clean() be a better name than reset() ?)
>>>
>>> This reset() subsumes my clone() method idea.
>>>
>>> I want to deprecate (literally with "@deprecated" annotations) the setters 
>>> generally, but my compromise is once you parse/unparse the object becomes 
>>> immutable permanently. So the setters are just a different way to 
>>> parameterize the object.
>>>
>>> The other thing that is impacted by this change it the 
>>> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
>>> with them.  That's ok for now. I can retrofit that later. It will still 
>>> work, just get deprecation warnings.
>>>
>>> 
>>> From: Steve Lawrence 
>>> Sent: Wednesday, March 25, 2020 12:05 PM
>>> To: dev@daffodil.apache.org 
>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-26 Thread Beckerle, Mike
You definitely cannot share a dp across threads this way.

I propose just deprecating the setExternalVariable API (along with the other 
setters) and that pattern generally, and a new more functional style API  for 
establishing tunables, external variables, and validation modes, that will keep 
us out of trouble.



From: Steve Lawrence 
Sent: Thursday, March 26, 2020 7:39 AM
To: dev@daffodil.apache.org 
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

Even if we did add the locking, it's still possible to use these
incorrectly, right? For example, say we had two threads like this:

  Thread {
dp.setExternalVariable("foo", 1)
dp.parse(foo1)
  }

  Thread {
dp.setExternalVariable("foo", 2)
dp.parse(foo2)
  }

The order of evaluation could be:

  dp.setExternalVariable("foo", 1)
  dp.setExternalVariable("foo", 2)
  dp.parse(foo1)
  dp.parse(foo2)

So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
use of setExternalVariable within threads can cause unexpected behavior.
Locking needs to happen outside of Daffodil's control to make it work as
expected. And if someone adds their own locking around everything, then
our internal locking isn't necessary.

It feels to me like setExternalVariable is just fundamentally broken,
and so trying to fix it is just going to add complication that still
doesn't solve the fundamental problems.


On 3/25/20 3:50 PM, Beckerle, Mike wrote:
> It is possible to modify in a way that is 100% compatible with existing 
> "correct" practice, which is to lock and unlock the object. Lock the state 
> when any parse/unparse call is in flight, unlock when none are in flight.
>
> setters can then block waiting for the lock to be freed.
>
> That's guaranteed to be compatible with any correct current practice.
>
> I just hate to put all that complexity in there.
> 
> From: Steve Lawrence 
> Sent: Wednesday, March 25, 2020 3:44 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> My one minor concern about the locking after parse/unparse is that it
> changes behavior, and could potentially break things (I assume use after
> lock is an assertion of some kind?).
>
> It's definitely possible to use the DataProcessor in a safe manner while
> mutating the state currently if someone is careful, so to not allow
> mutating it after calling parse/unparse could potentially break someones
> existing use of Daffodil.
>
> What are your thoughts on @deprecating those functions, the deprecation
> message can be something like "This is not thread-safe, be very careful
> if you do not use it". And rather than locking and asserting, we could
> just create a warning and say "You're using this in a potentially
> dangerous manner, you should use the new API"?
>
> - Steve
>
> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>> A fully functional API is probably good to have, and I'd like to define that 
>> also and cut over to using that within Daffodil.
>>
>> We would need a dataProcessor.reset() method that creates a new dp which has 
>> had all external vars, tunables, basically any of those formerly settable 
>> things, forgotten so you can start clean.
>>
>> (Would clean() be a better name than reset() ?)
>>
>> This reset() subsumes my clone() method idea.
>>
>> I want to deprecate (literally with "@deprecated" annotations) the setters 
>> generally, but my compromise is once you parse/unparse the object becomes 
>> immutable permanently. So the setters are just a different way to 
>> parameterize the object.
>>
>> The other thing that is impacted by this change it the 
>> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
>> with them.  That's ok for now. I can retrofit that later. It will still 
>> work, just get deprecation warnings.
>>
>> 
>> From: Steve Lawrence 
>> Sent: Wednesday, March 25, 2020 12:05 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> This feels like the right approach. Thoughts on instead of having to
>> manually clone() and having some sort of locking mechanism when parse is
>> called, we just make a DataProcessor completely immutable, and setting a
>> variable will implicitly create a copy with modified state, something like:
>>
>> val dp = dataProcessor
>>.withTunable("tunableFoo", "1")
>>.withVariable("variableFoo", 2")
>>.withVar

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-26 Thread Steve Lawrence
Even if we did add the locking, it's still possible to use these
incorrectly, right? For example, say we had two threads like this:

  Thread {
dp.setExternalVariable("foo", 1)
dp.parse(foo1)
  }

  Thread {
dp.setExternalVariable("foo", 2)
dp.parse(foo2)
  }

The order of evaluation could be:

  dp.setExternalVariable("foo", 1)
  dp.setExternalVariable("foo", 2)
  dp.parse(foo1)
  dp.parse(foo2)

So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
use of setExternalVariable within threads can cause unexpected behavior.
Locking needs to happen outside of Daffodil's control to make it work as
expected. And if someone adds their own locking around everything, then
our internal locking isn't necessary.

It feels to me like setExternalVariable is just fundamentally broken,
and so trying to fix it is just going to add complication that still
doesn't solve the fundamental problems.


On 3/25/20 3:50 PM, Beckerle, Mike wrote:
> It is possible to modify in a way that is 100% compatible with existing 
> "correct" practice, which is to lock and unlock the object. Lock the state 
> when any parse/unparse call is in flight, unlock when none are in flight.
> 
> setters can then block waiting for the lock to be freed.
> 
> That's guaranteed to be compatible with any correct current practice.
> 
> I just hate to put all that complexity in there.
> 
> From: Steve Lawrence 
> Sent: Wednesday, March 25, 2020 3:44 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> My one minor concern about the locking after parse/unparse is that it
> changes behavior, and could potentially break things (I assume use after
> lock is an assertion of some kind?).
> 
> It's definitely possible to use the DataProcessor in a safe manner while
> mutating the state currently if someone is careful, so to not allow
> mutating it after calling parse/unparse could potentially break someones
> existing use of Daffodil.
> 
> What are your thoughts on @deprecating those functions, the deprecation
> message can be something like "This is not thread-safe, be very careful
> if you do not use it". And rather than locking and asserting, we could
> just create a warning and say "You're using this in a potentially
> dangerous manner, you should use the new API"?
> 
> - Steve
> 
> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>> A fully functional API is probably good to have, and I'd like to define that 
>> also and cut over to using that within Daffodil.
>>
>> We would need a dataProcessor.reset() method that creates a new dp which has 
>> had all external vars, tunables, basically any of those formerly settable 
>> things, forgotten so you can start clean.
>>
>> (Would clean() be a better name than reset() ?)
>>
>> This reset() subsumes my clone() method idea.
>>
>> I want to deprecate (literally with "@deprecated" annotations) the setters 
>> generally, but my compromise is once you parse/unparse the object becomes 
>> immutable permanently. So the setters are just a different way to 
>> parameterize the object.
>>
>> The other thing that is impacted by this change it the 
>> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
>> with them.  That's ok for now. I can retrofit that later. It will still 
>> work, just get deprecation warnings.
>>
>> 
>> From: Steve Lawrence 
>> Sent: Wednesday, March 25, 2020 12:05 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> This feels like the right approach. Thoughts on instead of having to
>> manually clone() and having some sort of locking mechanism when parse is
>> called, we just make a DataProcessor completely immutable, and setting a
>> variable will implicitly create a copy with modified state, something like:
>>
>> val dp = dataProcessor
>>.withTunable("tunableFoo", "1")
>>.withVariable("variableFoo", 2")
>>.withVariable("variableBar", 2")
>>
>> It's not as java-y, but feels safer. I can imagine a case where someone
>> sets a bunch of state, calls parse and locks things, and then way down
>> the line they get a "data processor is locked" error because they tried
>> to change some state.
>>
>>
>> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>>> Here's my concrete proposal to fix DataProcesor's API.
>>>
>>> We add 

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Beckerle, Mike
It is possible to modify in a way that is 100% compatible with existing 
"correct" practice, which is to lock and unlock the object. Lock the state when 
any parse/unparse call is in flight, unlock when none are in flight.

setters can then block waiting for the lock to be freed.

That's guaranteed to be compatible with any correct current practice.

I just hate to put all that complexity in there.

From: Steve Lawrence 
Sent: Wednesday, March 25, 2020 3:44 PM
To: dev@daffodil.apache.org 
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

My one minor concern about the locking after parse/unparse is that it
changes behavior, and could potentially break things (I assume use after
lock is an assertion of some kind?).

It's definitely possible to use the DataProcessor in a safe manner while
mutating the state currently if someone is careful, so to not allow
mutating it after calling parse/unparse could potentially break someones
existing use of Daffodil.

What are your thoughts on @deprecating those functions, the deprecation
message can be something like "This is not thread-safe, be very careful
if you do not use it". And rather than locking and asserting, we could
just create a warning and say "You're using this in a potentially
dangerous manner, you should use the new API"?

- Steve

On 3/25/20 3:35 PM, Beckerle, Mike wrote:
> A fully functional API is probably good to have, and I'd like to define that 
> also and cut over to using that within Daffodil.
>
> We would need a dataProcessor.reset() method that creates a new dp which has 
> had all external vars, tunables, basically any of those formerly settable 
> things, forgotten so you can start clean.
>
> (Would clean() be a better name than reset() ?)
>
> This reset() subsumes my clone() method idea.
>
> I want to deprecate (literally with "@deprecated" annotations) the setters 
> generally, but my compromise is once you parse/unparse the object becomes 
> immutable permanently. So the setters are just a different way to 
> parameterize the object.
>
> The other thing that is impacted by this change it the 
> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
> with them.  That's ok for now. I can retrofit that later. It will still work, 
> just get deprecation warnings.
>
> 
> From: Steve Lawrence 
> Sent: Wednesday, March 25, 2020 12:05 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> This feels like the right approach. Thoughts on instead of having to
> manually clone() and having some sort of locking mechanism when parse is
> called, we just make a DataProcessor completely immutable, and setting a
> variable will implicitly create a copy with modified state, something like:
>
> val dp = dataProcessor
>.withTunable("tunableFoo", "1")
>.withVariable("variableFoo", 2")
>.withVariable("variableBar", 2")
>
> It's not as java-y, but feels safer. I can imagine a case where someone
> sets a bunch of state, calls parse and locks things, and then way down
> the line they get a "data processor is locked" error because they tried
> to change some state.
>
>
> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>> Here's my concrete proposal to fix DataProcesor's API.
>>
>> We add a clone() method. It copies the object sharing the big/expensive 
>> stuff like the compiled schema but gives you a separate copy of all the 
>> state that is settable via setValidationMode, setExternalDFDLVariable, and 
>> setTunable.
>>
>> So you get a DataProcessor either from a ProcessorFactory, or from 
>> Compiler.reload().
>>
>> You set state.
>> You can call parse/unparse on various threads
>>
>> If you want to change state settings for another parse call, you must call 
>> clone() to get another instance of the DataProcessor. This will share the 
>> expensive/big stuff like the compiled schema, but give you a separate copy 
>> of all the state that is settable via setValidationMode, 
>> setExternalDFDLVariable, and setTunable.
>>
>> Then you set state
>> You can call parse/unparse on various threads.
>>
>> I hate to get into locking and such, but once you call parse/unparse, that 
>> should lock the object state, and any further setter calls should 
>> fail/throw. You should have to clone it to "change" the settings.
>>
>> This makes the use of the setters a java-ish way of achieving the same 
>> reentrancy one would get if there were no setters and simply mor

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Steve Lawrence
My one minor concern about the locking after parse/unparse is that it
changes behavior, and could potentially break things (I assume use after
lock is an assertion of some kind?).

It's definitely possible to use the DataProcessor in a safe manner while
mutating the state currently if someone is careful, so to not allow
mutating it after calling parse/unparse could potentially break someones
existing use of Daffodil.

What are your thoughts on @deprecating those functions, the deprecation
message can be something like "This is not thread-safe, be very careful
if you do not use it". And rather than locking and asserting, we could
just create a warning and say "You're using this in a potentially
dangerous manner, you should use the new API"?

- Steve

On 3/25/20 3:35 PM, Beckerle, Mike wrote:
> A fully functional API is probably good to have, and I'd like to define that 
> also and cut over to using that within Daffodil.
> 
> We would need a dataProcessor.reset() method that creates a new dp which has 
> had all external vars, tunables, basically any of those formerly settable 
> things, forgotten so you can start clean.
> 
> (Would clean() be a better name than reset() ?)
> 
> This reset() subsumes my clone() method idea.
> 
> I want to deprecate (literally with "@deprecated" annotations) the setters 
> generally, but my compromise is once you parse/unparse the object becomes 
> immutable permanently. So the setters are just a different way to 
> parameterize the object.
> 
> The other thing that is impacted by this change it the 
> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
> with them.  That's ok for now. I can retrofit that later. It will still work, 
> just get deprecation warnings.
> 
> 
> From: Steve Lawrence 
> Sent: Wednesday, March 25, 2020 12:05 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> This feels like the right approach. Thoughts on instead of having to
> manually clone() and having some sort of locking mechanism when parse is
> called, we just make a DataProcessor completely immutable, and setting a
> variable will implicitly create a copy with modified state, something like:
> 
> val dp = dataProcessor
>.withTunable("tunableFoo", "1")
>.withVariable("variableFoo", 2")
>.withVariable("variableBar", 2")
> 
> It's not as java-y, but feels safer. I can imagine a case where someone
> sets a bunch of state, calls parse and locks things, and then way down
> the line they get a "data processor is locked" error because they tried
> to change some state.
> 
> 
> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>> Here's my concrete proposal to fix DataProcesor's API.
>>
>> We add a clone() method. It copies the object sharing the big/expensive 
>> stuff like the compiled schema but gives you a separate copy of all the 
>> state that is settable via setValidationMode, setExternalDFDLVariable, and 
>> setTunable.
>>
>> So you get a DataProcessor either from a ProcessorFactory, or from 
>> Compiler.reload().
>>
>> You set state.
>> You can call parse/unparse on various threads
>>
>> If you want to change state settings for another parse call, you must call 
>> clone() to get another instance of the DataProcessor. This will share the 
>> expensive/big stuff like the compiled schema, but give you a separate copy 
>> of all the state that is settable via setValidationMode, 
>> setExternalDFDLVariable, and setTunable.
>>
>> Then you set state
>> You can call parse/unparse on various threads.
>>
>> I hate to get into locking and such, but once you call parse/unparse, that 
>> should lock the object state, and any further setter calls should 
>> fail/throw. You should have to clone it to "change" the settings.
>>
>> This makes the use of the setters a java-ish way of achieving the same 
>> reentrancy one would get if there were no setters and simply more arguments 
>> to the parse/unparse calls.
>>
>> 
>> From: Beckerle, Mike 
>> Sent: Wednesday, March 25, 2020 11:29 AM
>> To: dev@daffodil.apache.org 
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> To respond to my own thread here, I think given that we allow multiple 
>> simultaneous calls to parse/unparse from different threads, we must say that 
>> the DataProcessor object is immutable once parse or unparse are called.
>>
>> I suppose we could say that it is mu

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Beckerle, Mike
A fully functional API is probably good to have, and I'd like to define that 
also and cut over to using that within Daffodil.

We would need a dataProcessor.reset() method that creates a new dp which has 
had all external vars, tunables, basically any of those formerly settable 
things, forgotten so you can start clean.

(Would clean() be a better name than reset() ?)

This reset() subsumes my clone() method idea.

I want to deprecate (literally with "@deprecated" annotations) the setters 
generally, but my compromise is once you parse/unparse the object becomes 
immutable permanently. So the setters are just a different way to parameterize 
the object.

The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester 
that implements these same APIs but drives IBM DFDL with them.  That's ok for 
now. I can retrofit that later. It will still work, just get deprecation 
warnings.


From: Steve Lawrence 
Sent: Wednesday, March 25, 2020 12:05 PM
To: dev@daffodil.apache.org 
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

This feels like the right approach. Thoughts on instead of having to
manually clone() and having some sort of locking mechanism when parse is
called, we just make a DataProcessor completely immutable, and setting a
variable will implicitly create a copy with modified state, something like:

val dp = dataProcessor
   .withTunable("tunableFoo", "1")
   .withVariable("variableFoo", 2")
   .withVariable("variableBar", 2")

It's not as java-y, but feels safer. I can imagine a case where someone
sets a bunch of state, calls parse and locks things, and then way down
the line they get a "data processor is locked" error because they tried
to change some state.


On 3/25/20 11:50 AM, Beckerle, Mike wrote:
> Here's my concrete proposal to fix DataProcesor's API.
>
> We add a clone() method. It copies the object sharing the big/expensive stuff 
> like the compiled schema but gives you a separate copy of all the state that 
> is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>
> So you get a DataProcessor either from a ProcessorFactory, or from 
> Compiler.reload().
>
> You set state.
> You can call parse/unparse on various threads
>
> If you want to change state settings for another parse call, you must call 
> clone() to get another instance of the DataProcessor. This will share the 
> expensive/big stuff like the compiled schema, but give you a separate copy of 
> all the state that is settable via setValidationMode, 
> setExternalDFDLVariable, and setTunable.
>
> Then you set state
> You can call parse/unparse on various threads.
>
> I hate to get into locking and such, but once you call parse/unparse, that 
> should lock the object state, and any further setter calls should fail/throw. 
> You should have to clone it to "change" the settings.
>
> This makes the use of the setters a java-ish way of achieving the same 
> reentrancy one would get if there were no setters and simply more arguments 
> to the parse/unparse calls.
>
> ________
> From: Beckerle, Mike 
> Sent: Wednesday, March 25, 2020 11:29 AM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> To respond to my own thread here, I think given that we allow multiple 
> simultaneous calls to parse/unparse from different threads, we must say that 
> the DataProcessor object is immutable once parse or unparse are called.
>
> I suppose we could say that it is mutable, but behavior is undetermined if 
> any parse or unparse calls are active on any thread.
>
> But this is just asking for trouble IMHO.
>
> I think we started out with a stateful, non-thread capable API. The idea is 
> that one thread would be invoking a data processor at a time. A data procesor 
> was the state-block of an execution.
>
> The need to share compiled processor reloads, because the compile schemas 
> were expensive to create, tempted us to allow multiple parse/unparse calls on 
> different threads.
>
> Fact is, I think we should have said no to this, provided a 
> DataProcessor.clone() to create instances that shared the reloaded compiled 
> schema binary, but otherwise had separate state, and said that parse/unparse 
> were synchronized methods on their DP instance.
>
> Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
> due to mutable state in what turns out to be a cross-thread shared object.
>
> 
> From: Beckerle, Mike 
> Sent: Wednesday, March 25, 2020 10:55 AM
> To: dev@daffodil.apache.org 
> Subject: Compiler.setExternalDFDLVariab

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Steve Lawrence
This feels like the right approach. Thoughts on instead of having to
manually clone() and having some sort of locking mechanism when parse is
called, we just make a DataProcessor completely immutable, and setting a
variable will implicitly create a copy with modified state, something like:

val dp = dataProcessor
   .withTunable("tunableFoo", "1")
   .withVariable("variableFoo", 2")
   .withVariable("variableBar", 2")

It's not as java-y, but feels safer. I can imagine a case where someone
sets a bunch of state, calls parse and locks things, and then way down
the line they get a "data processor is locked" error because they tried
to change some state.


On 3/25/20 11:50 AM, Beckerle, Mike wrote:
> Here's my concrete proposal to fix DataProcesor's API.
> 
> We add a clone() method. It copies the object sharing the big/expensive stuff 
> like the compiled schema but gives you a separate copy of all the state that 
> is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
> 
> So you get a DataProcessor either from a ProcessorFactory, or from 
> Compiler.reload().
> 
> You set state.
> You can call parse/unparse on various threads
> 
> If you want to change state settings for another parse call, you must call 
> clone() to get another instance of the DataProcessor. This will share the 
> expensive/big stuff like the compiled schema, but give you a separate copy of 
> all the state that is settable via setValidationMode, 
> setExternalDFDLVariable, and setTunable.
> 
> Then you set state
> You can call parse/unparse on various threads.
> 
> I hate to get into locking and such, but once you call parse/unparse, that 
> should lock the object state, and any further setter calls should fail/throw. 
> You should have to clone it to "change" the settings.
> 
> This makes the use of the setters a java-ish way of achieving the same 
> reentrancy one would get if there were no setters and simply more arguments 
> to the parse/unparse calls.
> 
> ________
> From: Beckerle, Mike 
> Sent: Wednesday, March 25, 2020 11:29 AM
> To: dev@daffodil.apache.org 
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> To respond to my own thread here, I think given that we allow multiple 
> simultaneous calls to parse/unparse from different threads, we must say that 
> the DataProcessor object is immutable once parse or unparse are called.
> 
> I suppose we could say that it is mutable, but behavior is undetermined if 
> any parse or unparse calls are active on any thread.
> 
> But this is just asking for trouble IMHO.
> 
> I think we started out with a stateful, non-thread capable API. The idea is 
> that one thread would be invoking a data processor at a time. A data procesor 
> was the state-block of an execution.
> 
> The need to share compiled processor reloads, because the compile schemas 
> were expensive to create, tempted us to allow multiple parse/unparse calls on 
> different threads.
> 
> Fact is, I think we should have said no to this, provided a 
> DataProcessor.clone() to create instances that shared the reloaded compiled 
> schema binary, but otherwise had separate state, and said that parse/unparse 
> were synchronized methods on their DP instance.
> 
> Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
> due to mutable state in what turns out to be a cross-thread shared object.
> 
> 
> From: Beckerle, Mike 
> Sent: Wednesday, March 25, 2020 10:55 AM
> To: dev@daffodil.apache.org 
> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> Why does the API for Daffodil have
> 
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
> 
> on it.
> 
> I believe we should deprecate this.
> 
> Compilers are parameterized by some of the tunables I understand.
> 
> But the external DFDL variables? These cannot affect compilation. The schema 
> compiler needs to know statically the information about variables found in 
> the schema itself in the dfdl:defineVariable statement annotations.
> 
> But the compiler doesn't need external variable bindings. In fact if it did 
> know and use them, it would be building assumptions into the compiled schema 
> that it shouldn't be building in.
> 
> Setting external var bindings on the Compiler just causes problems when that 
> compiler instance is reused in a context where those settings aren't 
> appropriate. (JIRA DAFFODIL-2302 is one such problem)
> 
> I believe setExternalDFDLVariable(s) methods should be deprecated, and 
> external variables bindi

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Steve Lawrence
Agreed on deprecating setting variables in the Compiler. It really just
doesn't make sense to be doing that.

Definitely seems reasonable to add the ability to set variables when
calling parse/unparse as that's the most threadsafe way to handle it,
and feels natural to me. Note that we still have other mutable state in
a DataProcessor like tuanbles, validationeMode, debbuging, that we might
want to think about as well. Though those do feel a little different
than variables.

The idea of having to call a clone() method to change a DataProcessors
variables seems a little unintuitive to me. But if we take a more
functional approach and make it entirely non-mutable, it maybe seems
better, something like:

Any thoughts on what the API might like going the parse route vs the
DataProcessor route?


On 3/25/20 10:55 AM, Beckerle, Mike wrote:
> Why does the API for Daffodil have
> 
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
> 
> on it.
> 
> I believe we should deprecate this.
> 
> Compilers are parameterized by some of the tunables I understand.
> 
> But the external DFDL variables? These cannot affect compilation. The schema 
> compiler needs to know statically the information about variables found in 
> the 
> schema itself in the dfdl:defineVariable statement annotations.
> 
> But the compiler doesn't need external variable bindings. In fact if it did 
> know 
> and use them, it would be building assumptions into the compiled schema that 
> it 
> shouldn't be building in.
> 
> Setting external var bindings on the Compiler just causes problems when that 
> compiler instance is reused in a context where those settings aren't 
> appropriate. (JIRA DAFFODIL-2302 is one such problem)
> 
> I believe setExternalDFDLVariable(s) methods should be deprecated, and 
> external 
> variables bindings should be an optional argument to the parse/unparse 
> methods.
> 
> The setters cause thread safety issues because the DP is stateful, even 
> though 
> we want multiple calls to parse/unparse to be executable on different threads.
> 
> Consider: if we allow ordinary setExternalDFDLVariables and add a 
> resetExternalDFDLVariables to clear them, then imagine one wants to make two 
> parse calls on separate threads with different external variables bindings:
> 
> so on main thread.
>   dp.setExternalVariables(...bindings 1...)
>   spawn the thread 1
> on the thread 1
>   dp.parse()
> back on main thread
>   dp.resetExternalVariables() // race condition. Did the parse call read 
> the 
> external variables before this reset or not?
>   dp.setExternalVariables(...bindings 2)
>  .
> 
> However, if we make the external variable bindings an argument to parse, we 
> avoid all of this.
> 
> Alternatively, since DataProcessor has setExternalDFDLVariable, we can 
> prohibit 
> multiple calls on the same DataProcessor object simultaneously. We can 
> provide a 
> clone() method that preserves the loaded/reloaded processor, but constructs 
> another DataProcessor object, thereby allowing separate external variables 
> state 
> per DataProcessor instance.
> 
> 
> Comments?
> 
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> Owl Cyber Defense 
> /is now a part of Owl 
> /
> P +1-781-330-0412
> W owlcyberdefense.com 
> 



Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Beckerle, Mike
Here's my concrete proposal to fix DataProcesor's API.

We add a clone() method. It copies the object sharing the big/expensive stuff 
like the compiled schema but gives you a separate copy of all the state that is 
settable via setValidationMode, setExternalDFDLVariable, and setTunable.

So you get a DataProcessor either from a ProcessorFactory, or from 
Compiler.reload().

You set state.
You can call parse/unparse on various threads

If you want to change state settings for another parse call, you must call 
clone() to get another instance of the DataProcessor. This will share the 
expensive/big stuff like the compiled schema, but give you a separate copy of 
all the state that is settable via setValidationMode, setExternalDFDLVariable, 
and setTunable.

Then you set state
You can call parse/unparse on various threads.

I hate to get into locking and such, but once you call parse/unparse, that 
should lock the object state, and any further setter calls should fail/throw. 
You should have to clone it to "change" the settings.

This makes the use of the setters a java-ish way of achieving the same 
reentrancy one would get if there were no setters and simply more arguments to 
the parse/unparse calls.


From: Beckerle, Mike 
Sent: Wednesday, March 25, 2020 11:29 AM
To: dev@daffodil.apache.org 
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

To respond to my own thread here, I think given that we allow multiple 
simultaneous calls to parse/unparse from different threads, we must say that 
the DataProcessor object is immutable once parse or unparse are called.

I suppose we could say that it is mutable, but behavior is undetermined if any 
parse or unparse calls are active on any thread.

But this is just asking for trouble IMHO.

I think we started out with a stateful, non-thread capable API. The idea is 
that one thread would be invoking a data processor at a time. A data procesor 
was the state-block of an execution.

The need to share compiled processor reloads, because the compile schemas were 
expensive to create, tempted us to allow multiple parse/unparse calls on 
different threads.

Fact is, I think we should have said no to this, provided a 
DataProcessor.clone() to create instances that shared the reloaded compiled 
schema binary, but otherwise had separate state, and said that parse/unparse 
were synchronized methods on their DP instance.

Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
due to mutable state in what turns out to be a cross-thread shared object.


From: Beckerle, Mike 
Sent: Wednesday, March 25, 2020 10:55 AM
To: dev@daffodil.apache.org 
Subject: Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema 
compiler needs to know statically the information about variables found in the 
schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did 
know and use them, it would be building assumptions into the compiled schema 
that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that 
compiler instance is reused in a context where those settings aren't 
appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external 
variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though 
we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a 
resetExternalDFDLVariables to clear them, then imagine one wants to make two 
parse calls on separate threads with different external variables bindings:

so on main thread.
 dp.setExternalVariables(...bindings 1...)
 spawn the thread 1
on the thread 1
 dp.parse()
back on main thread
 dp.resetExternalVariables() // race condition. Did the parse call read the 
external variables before this reset or not?
 dp.setExternalVariables(...bindings 2)
.

However, if we make the external variable bindings an argument to parse, we 
avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit 
multiple calls on the same DataProcessor object simultaneously. We can provide 
a clone() method that preserves the loaded/reloaded processor, but constructs 
another DataProcessor object, thereby allowing separate external variables 
state per DataProcessor instance.


Comments?





Mik

Re: Compiler.setExternalDFDLVariable(s) considered challenged

2020-03-25 Thread Beckerle, Mike
To respond to my own thread here, I think given that we allow multiple 
simultaneous calls to parse/unparse from different threads, we must say that 
the DataProcessor object is immutable once parse or unparse are called.

I suppose we could say that it is mutable, but behavior is undetermined if any 
parse or unparse calls are active on any thread.

But this is just asking for trouble IMHO.

I think we started out with a stateful, non-thread capable API. The idea is 
that one thread would be invoking a data processor at a time. A data procesor 
was the state-block of an execution.

The need to share compiled processor reloads, because the compile schemas were 
expensive to create, tempted us to allow multiple parse/unparse calls on 
different threads.

Fact is, I think we should have said no to this, provided a 
DataProcessor.clone() to create instances that shared the reloaded compiled 
schema binary, but otherwise had separate state, and said that parse/unparse 
were synchronized methods on their DP instance.

Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
due to mutable state in what turns out to be a cross-thread shared object.


From: Beckerle, Mike 
Sent: Wednesday, March 25, 2020 10:55 AM
To: dev@daffodil.apache.org 
Subject: Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema 
compiler needs to know statically the information about variables found in the 
schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did 
know and use them, it would be building assumptions into the compiled schema 
that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that 
compiler instance is reused in a context where those settings aren't 
appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external 
variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though 
we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a 
resetExternalDFDLVariables to clear them, then imagine one wants to make two 
parse calls on separate threads with different external variables bindings:

so on main thread.
 dp.setExternalVariables(...bindings 1...)
 spawn the thread 1
on the thread 1
 dp.parse()
back on main thread
 dp.resetExternalVariables() // race condition. Did the parse call read the 
external variables before this reset or not?
 dp.setExternalVariables(...bindings 2)
.

However, if we make the external variable bindings an argument to parse, we 
avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit 
multiple calls on the same DataProcessor object simultaneously. We can provide 
a clone() method that preserves the loaded/reloaded processor, but constructs 
another DataProcessor object, thereby allowing separate external variables 
state per DataProcessor instance.


Comments?





Mike Beckerle | Principal Engineer
[Owl Cyber Defense]
[cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of 
Owl
P +1-781-330-0412
W owlcyberdefense.com