Re: XMLBeans performance and source code status [Re: Proposal: XMLBeans]
Eric, What's the relationship between XmlCursor and the JSR-173 Streaming API for XML? Ted Eric Vasilik wrote: When working with XMLBeans in a strongly typed way (with a Schema), individual objects are created for each piece of information, usually instances of simple and complex Schema types. However, you can also access and manipulate the XML in a typeless manor. What we've done with XMLBeans is provided access to the full XML Infoset via the XmlCursor interface. XmlCursor provides functionality very similar to the DOM, but takes a very different tact. Instead of creating an DOM Node for each element, attribute, text, etc, one may create a single XmlCursor and navigate that cursor about the XML instance, interrogating the XML: element/attr names, child/parent elements, text, comments, etc. Also, one may modify the XML by removing elements and attrs, inserting text, for example. All of this can be done by either not creating objects or reusing objects so that the number of objects needed to operate on the XML is constant, not on the order of the size of the XML like a DOM would require. The kind of interface allows an implementer of an in memory XML store more freedom to implement the internal structure which represents the XML in memory. One, for example, could simply store the XML as it was, for example, read in from disk and implement a cursor as an index into that string, parsing or modifying the parts of the string as necessary to satisfy the requests. We don't go to quite this extreme. In principle, we create one object for every leaf element or attribute and two objects for every interior element. All text for attribute values, comments, procinst's and text between element markup is stored in a single character array. We have found that creating fewer objects and batching text leads to loading the XML into memory faster as well as having a similar, if not slightly smaller, memory footprint when compared to the DOM. Also, working with cursors seems to be an easier programming model than the DOM as it does not have text nodes and is more intuitive. With respect to the synchronized access, the strongly typed schema XMLBeans objects cache values so that conversion to text does not occur until it is needed. Likewise, when modifications are made to the XML Infoset, the strongly typed data (ints, for example) are not parsed from the text until requested. In general the impact of synchronization is quite low because of the lazy approach we have taken along with the caching. As I read your question again, I realize that you may have interpreted synchronized to mean managing data among several threads. The synchronization described refers to the fact that one may manipulate the XML via the XmlCursor or the strongly typed XMLBean classes generated from the schema, each mechanism capable of seeing the changes from the other in a tightly integrated way. With respect to building XMLBeans, we plan to remove any dependency upon the jars you mentioned. Indeed, there exists very little dependence on these. Mostly just interfaces, not any classes needed for the implementation. - Eric Vasilik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans]
Adding a few links and other info - Aleksander Slominski wrote: http://dev2dev.bea.com/articles/hitesh_seth.jsp that is good overview but has not enough technical details and other docs): as far as i can understand actual objects Above you've linked to an XML Journal review reprint. Here is a page the points to other information: http://dev2dev.bea.com/technologies/xmlbeans/index.jsp One of the links is a very brief summary of some brutally transparent and upfront performance and test compliance numbers: http://workshop.bea.com/xmlbeans/schemaandperf.jsp BTW, despite the fact that we posted the numbers on pretty marketing pages on bea.com, the numbers above are not marketing-varnished numbers - they are the actual measurements that we developers track day-to-day. Those are numbers we measure to help us focus on use-cases that we're working on making faster. The XML cursor access _without_ strong-type conversion is between 10% and 58% faster than Xerces2 DOM access, going to about 35% for large (1Mb) XML documents. Xerces2, btw, is extremely speedy, so we're proud to be on par with it in any scenario! Adding strong-type conversion (for example parsing xs:int to java int and dates to Calendars) adds enough cost that reading the data out of a document is between 0% and 48% slower than reading out using (untyped) Xerces2 DOM. Apples-to-apples, we measure ourselves significantly faster than JAXB RI and Castor (140% to 282% and 66% to 800%). Please don't sue me - those are our real numbers, but if performance is important to your application, you should measure it for yourself. We do fault-in object allocations when demanded, and you can see in our memory test that when we fault-in all the objects for a whole document, we take up more memory than Xerces2 DOM. One current project is to take steps to reduce that number. When we use XmlCursor and don't fault-in all the objects, the memory number you will find to be much slimmer. (I don't have a measurement because our measurements focus on problem areas we're actually working on.) Eric Vasilik writes: The synchronization described refers to the fact that one may manipulate the XML via the XmlCursor or the strongly typed XMLBean classes generated from the schema As Eric says, we don't want to confuse the two uses of the word synchronize. But since Aleksander brought it up - here's some information on thread-synchronization too. We examined both with- and without-thread-synchronized access, and found that without-thread-sync, programmers fall into traps like working with XML config files on multiple threads in thread-unsafe ways without without being aware of it. We found that it costs between 1% (strongly-typed access) and 10% (XmlCursor access) to synchronize. So we're currently synchronizing access to the data now, paying for more [app] stability with a little bit of perf. We'd like to provide the option to single-threaded (or savvy) users of not synchronizing to get the 1-10% back. That's future work. As Eric pointed out, the key I think is not in what our current numbers are, but the fact that we've isolated our implementation from our interface so that we have the flexibility of reducing allocations, deferring work, and otherwise improving performance further in the future. Abstracting the primary store behind a cursor rather than a tree of objects with identity gives us some leeway in shuffling our implementation strategy in the future without restructing the APIs. David Bau - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: XMLBeans performance and source code status [Re: Proposal: XMLBeans]
David Bau wrote: Adding a few links and other info - Eric Vasilik writes: The synchronization described refers to the fact that one may manipulate the XML via the XmlCursor or the strongly typed XMLBean classes generated from the schema As Eric says, we don't want to confuse the two uses of the word synchronize. But since Aleksander brought it up - here's some information on thread-synchronization too. We examined both with- and without-thread-synchronized access, and found that without-thread-sync, programmers fall into traps like working with XML config files on multiple threads in thread-unsafe ways without without being aware of it. We found that it costs between 1% (strongly-typed access) and 10% (XmlCursor access) to synchronize. So we're currently synchronizing access to the data now, paying for more [app] stability with a little bit of perf. We'd like to provide the option to single-threaded (or savvy) users of not synchronizing to get the 1-10% back. That's future work. hi, did you consider fail quickly approach that is used in Java collections (so for example Iterator can detect if it is used from more than one thread and fails if it happens)? the other possibility would be to allow making some objects (such as configuration) immutable so can be safely shared between multiple threads. As Eric pointed out, the key I think is not in what our current numbers are, but the fact that we've isolated our implementation from our interface so that we have the flexibility of reducing allocations, deferring work, and otherwise improving performance further in the future. Abstracting the primary store behind a cursor rather than a tree of objects with identity gives us some leeway in shuffling our implementation strategy in the future without restructing the APIs. that sounds like very good strategy! however i winder what is really current state. when i looked on source code and i could not see how layering could work (or it working already?): what parts are API a interfaces and how implementation is separated and can be switched - is this possible in current version to chose different implementation (by using for example factory pattern)? i can see it working for com.bea.xml (XSD types) and com.bea.xbean.values (implementation) - this is very valuable set of Java classes providing XSD validation (even more if they were more abstract so could be used with any XML databinding). thanks, alek -- If everything seems under control, you're just not going fast enough. Mario Andretti - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans]
When working with XMLBeans in a strongly typed way (with a Schema), individual objects are created for each piece of information, usually instances of simple and complex Schema types. However, you can also access and manipulate the XML in a typeless manor. What we've done with XMLBeans is provided access to the full XML Infoset via the XmlCursor interface. XmlCursor provides functionality very similar to the DOM, but takes a very different tact. Instead of creating an DOM Node for each element, attribute, text, etc, one may create a single XmlCursor and navigate that cursor about the XML instance, interrogating the XML: element/attr names, child/parent elements, text, comments, etc. Also, one may modify the XML by removing elements and attrs, inserting text, for example. All of this can be done by either not creating objects or reusing objects so that the number of objects needed to operate on the XML is constant, not on the order of the size of the XML like a DOM would require. The kind of interface allows an implementer of an in memory XML store more freedom to implement the internal structure which represents the XML in memory. One, for example, could simply store the XML as it was, for example, read in from disk and implement a cursor as an index into that string, parsing or modifying the parts of the string as necessary to satisfy the requests. We don't go to quite this extreme. In principle, we create one object for every leaf element or attribute and two objects for every interior element. All text for attribute values, comments, procinst's and text between element markup is stored in a single character array. We have found that creating fewer objects and batching text leads to loading the XML into memory faster as well as having a similar, if not slightly smaller, memory footprint when compared to the DOM. Also, working with cursors seems to be an easier programming model than the DOM as it does not have text nodes and is more intuitive. With respect to the synchronized access, the strongly typed schema XMLBeans objects cache values so that conversion to text does not occur until it is needed. Likewise, when modifications are made to the XML Infoset, the strongly typed data (ints, for example) are not parsed from the text until requested. In general the impact of synchronization is quite low because of the lazy approach we have taken along with the caching. As I read your question again, I realize that you may have interpreted synchronized to mean managing data among several threads. The synchronization described refers to the fact that one may manipulate the XML via the XmlCursor or the strongly typed XMLBean classes generated from the schema, each mechanism capable of seeing the changes from the other in a tightly integrated way. With respect to building XMLBeans, we plan to remove any dependency upon the jars you mentioned. Indeed, there exists very little dependence on these. Mostly just interfaces, not any classes needed for the implementation. - Eric Vasilik -Original Message- From: Aleksander Slominski [mailto:[EMAIL PROTECTED] Sent: Friday, July 04, 2003 8:31 PM To: [EMAIL PROTECTED] Cc: Jakarta General List; [EMAIL PROTECTED] Subject: XMLBeans performance and source code status [Re: Proposal: XMLBeans] Cliff Schmidt wrote: What's compelling about XMLBeans compared to some of the other front runners, such as JDOM and XOM, Castor and JAXB? The main difference between XMLBeans and JDOM or XOM is that XMLBeans does not create objects for each XML information item. Instead, it provides cursor-based access to each item in the XML Infoset. It has an architecture where, if an actual object is needed for a node, it can be created on-demand. We found this provided great performance benefit. hi, i am interested to find if you have some more details on performance benefits - it seems to be very intriguing and distinguishing feature of XMLBeans. i may be missing something but i tried to find this information online without any lack (i checked http://dev2dev.bea.com/articles/hitesh_seth.jsp that is good overview but has not enough technical details and other docs): as far as i can understand actual objects are created for every XML information item? so as objects are in memory the same way as objects in DOM what performance benefits do you have in mind? do you refer to faster creation time or lower memory footprint? did you check for example on the same machine how big XML document can be loaded with XMLBeans and DOM (for example Xerces2) before running out of memory? The biggest differences between XMLBeans and Castor or JAXB are: 1) the goal of 100% Schema support (currently supports everything in Schema other than redefine and substitution groups, and those features are nearly ready), and 2) the integrated and synchronized access of the underlying XML content with strongly typed Java classes. did you
XMLBeans performance and source code status [Re: Proposal: XMLBeans]
Cliff Schmidt wrote: What's compelling about XMLBeans compared to some of the other front runners, such as JDOM and XOM, Castor and JAXB? The main difference between XMLBeans and JDOM or XOM is that XMLBeans does not create objects for each XML information item. Instead, it provides cursor-based access to each item in the XML Infoset. It has an architecture where, if an actual object is needed for a node, it can be created on-demand. We found this provided great performance benefit. hi, i am interested to find if you have some more details on performance benefits - it seems to be very intriguing and distinguishing feature of XMLBeans. i may be missing something but i tried to find this information online without any lack (i checked http://dev2dev.bea.com/articles/hitesh_seth.jsp that is good overview but has not enough technical details and other docs): as far as i can understand actual objects are created for every XML information item? so as objects are in memory the same way as objects in DOM what performance benefits do you have in mind? do you refer to faster creation time or lower memory footprint? did you check for example on the same machine how big XML document can be loaded with XMLBeans and DOM (for example Xerces2) before running out of memory? The biggest differences between XMLBeans and Castor or JAXB are: 1) the goal of 100% Schema support (currently supports everything in Schema other than redefine and substitution groups, and those features are nearly ready), and 2) the integrated and synchronized access of the underlying XML content with strongly typed Java classes. did you estimate what is impact of requiring synchronized access? i am really curious why was is it required:. i can see need to share XML schemas but why to require synchronizing access to XML content? i would think that approach from java.util where collections are not thread-safe until specifically made synchronized could work here as well? I'd say you'd want to do as much setup before incubation as possible. This includes normalizing your code layout (something that didn't materialize for Tapestry, unfortunately) to match the other Jakarta projects (this will ease things if and when you transition to Maven builds). You probably want to check out a bit about Gump as well ... I can think of one person who will probably veto you until you are integrated into Gump. It's *exceptionally* painful to work with Gump at the moment, but ultimately worth it. i have question concerning Gump bit in general what is on Wiki page http://nagoya.apache.org/wiki/apachewiki.cgi?XmlBeansProposal: (...) '''(2) identify the initial source from which the subproject is to be populated''' *http://workshop.bea.com/xmlbeans/XsdUpload.jsp (...) i looked on source code and it seems that it is not possible to rebuild xbean.jar just from source and it is not clear what are dependencies? i noticed there are parts of code that depends on outside packages (like weblogic.xml.stream.XMLInputStream or com.bea.xquery) and some subpackages that are in com.bea.xml* that are in xbean.jar but not in src directory? what are plans for those pieces of code - are they also open source or XMLBeans would depend on BEA implementation classes to be on CLASSPATH to compile it? i hope XmlBeans will be actively developed as open source (in Apache or outside) so it continues to grow as it really looks like an interesting project. thanks, alek -- If everything seems under control, you're just not going fast enough. Mario Andretti - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Proposal: XMLBeans
(copying the other lists, by request) Thanks for all the good questions and advice, Howard. Please let me know if the following leaves you with other questions or concerns. On Wednesday, July 02, 2003 7:26 PM, Howard M. Lewis Ship wrote: See [http://workshop.bea.com/xmlbeans/quickStart.jsp BEA's quick start page] for more information. There's a commons module, Betwixt, that very much overlaps what you describe here. The main difference is the XML Schema support. However, there are areas of XMLBeans that overlap with other projects (especially commons modules like Betwixt). This is one of the reasons we are interested in Apache -- we would like to integrate/reuse as much as possible of other projects, especially if it makes the final product better. We also believe that the other projects could benefit from pieces of XMLBeans. What's compelling about XMLBeans compared to some of the other front runners, such as JDOM and XOM, Castor and JAXB? The main difference between XMLBeans and JDOM or XOM is that XMLBeans does not create objects for each XML information item. Instead, it provides cursor-based access to each item in the XML Infoset. It has an architecture where, if an actual object is needed for a node, it can be created on-demand. We found this provided great performance benefit. The biggest differences between XMLBeans and Castor or JAXB are: 1) the goal of 100% Schema support (currently supports everything in Schema other than redefine and substitution groups, and those features are nearly ready), and 2) the integrated and synchronized access of the underlying XML content with strongly typed Java classes. ''Meritocracy: '' We would very much like to see XMLBeans evolve under the meritocracy model that is used within Apache. The advice I got is still good ... get your meritocracy working RIGHT NOW. I personally found big benefits to reoganizing along these lines; it would have been worth the effort even if Tapestry hadn't made it all the way in. The meritocracy really works to get people motivated and contributing. I agree. We spent a lot of time thinking about the best community structure for XMLBeans and decided that the meritocracy by Apache was the model that we wanted to follow. We've made the first steps in this direction, but we are now trying to go all the way with it. Whether Apache is the right place for XMLBeans or not, we will be following the meritocracy model. In fact, we are looking at starting open source projects around several other technologies we've recently developed, and even though many of these won't be appropriate for Apache, we will still will be following a meritocracy model since we are also convinced that it really does work. ''Core Developers:'' In addition to key members of the XMLBeans development team, the initial committers include developers from outside BEA who have spent several months using XMLBeans to solve their particular development needs. Be prepared to document a bit more about outside developers' contributions. To be clear, the outside developers have only recently had the opportunity to submit patches, but they have been working closely with the BEA development team for six months by submitting bugs and feature requests based on the problems that they would run into while developing against XMLBeans within each of their applications. I've learned the hard way to get rid of all [L]GPL dependencies before you attempt to move to Jakarta. It's not even a question for us. We are in the process of dealing with this now. I just wanted to let you all know that we were handling it. Do you have a roadmap of where you would like this project to be in 6 months? A year? Two years? Yes - and I will post it shortly on the Wiki site. What licensing to you currently use? LGPL is a problem, BSD or ASL is the way to go. (Pardon me if this is on the pages you've linked to ... I haven't clicked through yet). We currently use an Apache-style license. See http://workshop.bea.com/xmlbeans/XsdUpload.jsp. '''(5) identify apache sponsoring individual ''' * Steven Noels ([EMAIL PROTECTED]) That's a good step! Yes - Steven has been incredibly helpful by allowing me to bounce ideas off him about how BEA can get involved with the open source community, and specifically if and how XMLBeans would be complement Apache. I'd be interested to know why you feel the project will benefit from hosting at Jakarta? My personal experience with Tapestry is that the move to Jakarta was good for exposure ... but Tapestry, regretably, did not have a major player (such as BEA) backing it. Eclipse, for example, self-hosts, yet is taken very seriously as an open source project. We are looking at open sourcing other BEA technologies, some of which would probably make the most sense in a self-hosted community, especially ones that we don't think Apache would be interested in. One reason we are
RE: Proposal: XMLBeans
On Thursday, July 03, 2003 12:02 PM, Cliff Schmidt wrote: On Wednesday, July 02, 2003 7:26 PM, Howard M. Lewis Ship wrote: Do you have a roadmap of where you would like this project to be in 6 months? A year? Two years? Yes - and I will post it shortly on the Wiki site. I've just posted a description of what we have in mind at: http://nagoya.apache.org/wiki/apachewiki.cgi?XmlBeansRoadMap Cliff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Proposal: XMLBeans
-Original Message- From: Cliff Schmidt [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 02, 2003 9:42 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Proposal: XMLBeans Hi, I would like to propose a new subproject to further develop XMLBeans, which is a Java-XML binding tool that also allows low-level access to the full XML instance Infoset. The technology was developed by BEA, but we believe this work could potentially be a good fit for either the Jakarta or the XML communities. Please see http://nagoya.apache.org/wiki/apachewiki.cgi?XmlBeansProposal for a detailed proposal, which I've also copied below. Thanks, Cliff --- '''Proposal for an XMLBeans subproject in Apache XML or Jakarta''' ''2 July 2003, Cliff Schmidt ([EMAIL PROTECTED])'' '''(0) rationale''' XMLBeans is an XML-Java binding tool that uses XML Schema as a basis for generating Java classes to be used to easily access XML instance data. It was designed to provide both easy access to XML information via convenient Java classes as well as complete access to the underlying XML, combining the best of low-level, full access APIs like SAX and DOM with the convenience of Java binding. See [http://workshop.bea.com/xmlbeans/quickStart.jsp BEA's quick start page] for more information. There's a commons module, Betwixt, that very much overlaps what you describe here. I would like to look over both APIs before noodling further. Please take these thoughts as cursory and initial until I've had a better look-see. What's compelling about XMLBeans compared to some of the other front runners, such as JDOM and XOM, Castor and JAXB? '''(0.1) criteria''' ''Meritocracy: '' We would very much like to see XMLBeans evolve under the meritocracy model that is used within Apache. The advice I got is still good ... get your meritocracy working RIGHT NOW. I personally found big benefits to reoganizing along these lines; it would have been worth the effort even if Tapestry hadn't made it all the way in. The meritocracy really works to get people motivated and contributing. ''Community: '' Over the last six months, we have developed a thriving developer community to provide feedback directly to the development team through a discussion forum. We have invited three of the members of the community to join us as committers, based on their contributions to the development of the product. We've also had thousands of users experiment with the technology. Part of the reason for this success has been the availability of [http://workshop.bea.com/xmlbeans/docindex.html sample code and thorough documentation]. ''Core Developers:'' In addition to key members of the XMLBeans development team, the initial committers include developers from outside BEA who have spent several months using XMLBeans to solve their particular development needs. Be prepared to document a bit more about outside developers' contributions. ''Alignment:'' While XMLBeans does not currently have major dependencies on other Apache Jakarta or XML products, we are very interested in pursuing integration. For instance, we are looking into enabling Xerces to be used as our XML parser. One reason for doing this is due to the fact that we currently use the [http://piccolo.sourceforge.net/ Piccolo parser], and I realize that the LGPL terms of this parser would be problematic for an Apache project. Therefore, if it is eventually agreed that an XMLBeans subproject in Apache would be a Good Thing, we will replace our current parser with an alternative that uses an Apache license. I've learned the hard way to get rid of all [L]GPL dependencies before you attempt to move to Jakarta. We are also aware that there is some overlap of functionality with JAXB, specifically regarding object binding. We would very much like to see the convergence of these two technologies. In fact, one of the committers has recently joined the JAXB v.2 Expert Group, in order to facilitate this. Finally, we are aware that the proposed WS-Commons subproject might include a JAXB proposal. If this subproject is approved, we would want to work closely together on any future JAXB-XMLBeans convergence. '''(0.2) warning signs''' ''Orphaned products: '' BEA has been receiving very positive press and customer feedback about XMLBeans and only wishes to invest further in the development and support of this technology. Do you have a roadmap of where you would like this project to be in 6 months? A year? Two years? ''Inexperience with open source:'' While we do not yet have any committers who are currently active in the XML or Jakarta projects, several of them have previous experience working with open source communities. For example, the architect behind XMLBeans, David Bau, has built a strong community