Thank you both!
> On Aug 6, 2020, at 10:13 AM, Beckerle, Mike <mbecke...@tresys.com> wrote:
>
> I would go further than Steve on this.
>
> There is only one thread-safe thing in Daffodil. This is by design/intention.
> Given a DataProcessor object, one may call its parse and unparse methods from
> multiple threads.
>
> These are thread safe because all the shared state of DataProcessor (the
> compiled schema) is read-only, and all structures allocated by a
> parse/unparse call are private (not shared at all) so are private to that one
> thread running that call.
>
> btw: There is one thread-safety bug in Daffodil (known currently)
> https://issues.apache.org/jira/browse/DAFFODIL-2216
> <https://issues.apache.org/jira/browse/DAFFODIL-2216>
>
> Everywhere in Daffodil, developers are expected to avoid state, or where
> required use local state and *not* protect it from multi-thread access
> because only one thread should ever be accessing it. Code is expected to use
> the faster, lower-overhead, non-thread-safe collection classes rather than
> worry about state sharing, and we look for this in code review.
>
> The Daffodil compiler has a single global synchronized method lock. So I
> believe you can't compile schemas in parallel unless you run more than one
> JVM instance to do it. The compilation is all sequentialized on purpose so
> that we don't have to worrry about use of singleton objects.
>
>
> From: Steve Lawrence <slawre...@apache.org <mailto:slawre...@apache.org>>
> Sent: Thursday, August 6, 2020 9:04 AM
> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
> <users@daffodil.apache.org <mailto:users@daffodil.apache.org>>
> Subject: Re: Caching, thread safety, optimizations
>
> I'm not 100% sure if the Compiler and ProcessorFactory are thread safe.
> We fix issues as they come up and try our best, but I'm not sure we
> guarantee thread-safety. For example, there are definitely known issues
> if you use the set*() functions. The newer with*() functions were added
> to deal with these potential issues and should be used instead.
>
> The DataProcessor is thread-safe, and we work hard to make sure it stays
> that way, since this is the thing that does most of the work. So every
> DataProcessor parse() or unparse() call can definitely be made in
> different threads without a problem.
>
> The ScalaXMLInfosetOutputter (as well as most of the other
> InfosetOutputters) are stateful, and so should not be shared among
> different threads, but they can be reused by calling the reset()
> function. I would recommend one InfosetOutputter per thread and call
> reset() inbetween uses. Or just create a new one each time parse/unparse
> is needed--these should be pretty lightweight to allocate.
>
> In general, I would recommend a workflow of creating a unique
> Compiler/ProcessorFactory/DataProcessor for each unique schema that you
> want to parse/unparse data with. Once you have the DataProcessor, throw
> away the Compiler/ProcessorFactory and cache and reuse that
> DataProcessor anytime you need to parse/unparse data using that schema.
> And then create/reset the InfosetOutputter as mentioned above.
>
> - Steve
>
> On 8/5/20 2:26 PM, Patrick Grandjean wrote:
> > Hi,
> >
> > I am looking to optimize applications that use Apache Daffodil and would
> > like to
> > know which classes or functions are thread-safe, reusable, can be cached in
> > a
> > singleton, etc. For instance, I believe that ScalaXMLInfosetOutputter is
> > reusable since it has a reset() function. Here is a list of
> > classes/functions/instances I am currently using:
> > - Daffodil.compiler()
> > - ProcessorFactory
> > - ProcessorFactory.onPath(String)
> > - DataProcessor
> > - ScalaXMLInfosetOutputter
> >
> > I would like to avoid having to instantiate each class at every call.
> > Otherwise,
> > what are the common optimizations that can be done when using Apache
> > Daffodil's
> > Java/Scala API?
> >
> > Patrick.
> >