I would go further than Steve on this. There is only one thread-safe thing in Daffodil. This is by design/intention. Given a DataProcessor object, one may call its parse and unparse methods from multiple threads.
These are thread safe because all the shared state of DataProcessor (the compiled schema) is read-only, and all structures allocated by a parse/unparse call are private (not shared at all) so are private to that one thread running that call. btw: There is one thread-safety bug in Daffodil (known currently) https://issues.apache.org/jira/browse/DAFFODIL-2216 Everywhere in Daffodil, developers are expected to avoid state, or where required use local state and *not* protect it from multi-thread access because only one thread should ever be accessing it. Code is expected to use the faster, lower-overhead, non-thread-safe collection classes rather than worry about state sharing, and we look for this in code review. The Daffodil compiler has a single global synchronized method lock. So I believe you can't compile schemas in parallel unless you run more than one JVM instance to do it. The compilation is all sequentialized on purpose so that we don't have to worrry about use of singleton objects. ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Thursday, August 6, 2020 9:04 AM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: Re: Caching, thread safety, optimizations I'm not 100% sure if the Compiler and ProcessorFactory are thread safe. We fix issues as they come up and try our best, but I'm not sure we guarantee thread-safety. For example, there are definitely known issues if you use the set*() functions. The newer with*() functions were added to deal with these potential issues and should be used instead. The DataProcessor is thread-safe, and we work hard to make sure it stays that way, since this is the thing that does most of the work. So every DataProcessor parse() or unparse() call can definitely be made in different threads without a problem. The ScalaXMLInfosetOutputter (as well as most of the other InfosetOutputters) are stateful, and so should not be shared among different threads, but they can be reused by calling the reset() function. I would recommend one InfosetOutputter per thread and call reset() inbetween uses. Or just create a new one each time parse/unparse is needed--these should be pretty lightweight to allocate. In general, I would recommend a workflow of creating a unique Compiler/ProcessorFactory/DataProcessor for each unique schema that you want to parse/unparse data with. Once you have the DataProcessor, throw away the Compiler/ProcessorFactory and cache and reuse that DataProcessor anytime you need to parse/unparse data using that schema. And then create/reset the InfosetOutputter as mentioned above. - Steve On 8/5/20 2:26 PM, Patrick Grandjean wrote: > Hi, > > I am looking to optimize applications that use Apache Daffodil and would like > to > know which classes or functions are thread-safe, reusable, can be cached in a > singleton, etc. For instance, I believe that ScalaXMLInfosetOutputter is > reusable since it has a reset() function. Here is a list of > classes/functions/instances I am currently using: > - Daffodil.compiler() > - ProcessorFactory > - ProcessorFactory.onPath(String) > - DataProcessor > - ScalaXMLInfosetOutputter > > I would like to avoid having to instantiate each class at every call. > Otherwise, > what are the common optimizations that can be done when using Apache > Daffodil's > Java/Scala API? > > Patrick. >