On Fri, Mar 22, 2013 at 10:34 AM, Svante Schubert <[email protected]> wrote: > This time I will pass as mentor, but would like to comment to the SAX > approach. > > Currently we are already using SAX (AFAIK DOM in general) to build up > our own typed DOM tree, see > http://svn.apache.org/viewvc/incubator/odf/trunk/odfdom/src/main/java/org/odftoolkit/odfdom/pkg/OdfFileSaxHandler.java?view=markup > > The DOM of the XML files will only be created when elements of the > desired files are accessed - not when you load the overall document. > If there is a very huge document (let's say presentation) we still have > to parse the complete document, even if we only desire a certain > slide,as it is all in one content.xml. > If there is a very huge document (let's say spreadsheet) we still have > to parse the complete document, even if we only desire a certain > spreadsheet or range from it, as it is all in one content.xml. > If there is a very huge document (let's say text) we still have to parse > the complete document, even if we only desire a certain chapter or > content table, as it is all in one content.xml. > Do you had a special scenario in mind, Rob? >
There is a difference between parsing the entire document and building a DOM for everything in content.xml. For example, I might just be looking to extract all hyperlinks from a document. Or I might want to replace all instances of "Sun Microsystems" with "Oracle Corp.'. Instantiating the entire content.xml DOM for tasks like this is overkill. -Rob > PS: For instance, operations for real-time collaboration could be > created by the above SAX Interface. > > - Svante > > On 22.03.2013 14:23, Rob Weir wrote: >> If anyone wants to mentor a GSoC student you need to get your idea >> entered into JIRA now. It looks like the deadline is this weekend. >> >> I wonder whether a streaming/scanning parser could be done in this >> time frame? We've seen some cases where our DOM-based solution takes >> up too much memory and a more SAX-like approach would be better. >> >> Any other ideas? >> >> -Rob >> >> >> ---------- Forwarded message ---------- >> From: Ulrich Stärk <[email protected]> >> Date: Fri, Mar 22, 2013 at 5:01 AM >> Subject: Re: Google Summer of Code 2013 >> To: [email protected] >> >> >> Dear PMCs, >> >> I'm going to submit our application to Google this weekend but our >> ideas list only shows 34 ideas >> until now. That's a shame considering that we have over a hundred >> projects and were able to offer >> potential students 142 project ideas to choose from last year and it >> might also hinder our chances >> of being accepted. >> >> Remember, GSoC is a great way to attract fresh blood to your projects >> and to get work done that >> might otherwise go undone. It is in your own interest to participate. >> >> Incubator mentors, please also talk to your respective podlings. >> >> If there is anything keeping you from participating, or anything that >> needs clarification, don't >> hesitate to contact the community development project at >> [email protected] or, if you want to >> keep the discussion private, [email protected]. >> >> Cheers, >> >> Uli >> >> On 05.03.2013 16:26, Ulrich Stärk wrote: >>> Hello PMCs, >>> >>> Google Summer of Code [1] is the ideal opportunity for you to attract new >>> contributors to your projects. >>> >>> The ASF will apply as a participating organization meaning individual >>> projects don't have to apply >>> separately. >>> >>> If you want to participate with your project you NOW need to >>> >>> - understand what it means to be a mentor [2]. >>> >>> - record your project ideas. Just create issues in JIRA, label them with >>> gsoc2013, and they will >>> show up at [3]. Please be as specific as possible when describing your >>> idea. Include the programming >>> language, the tools and skills required, but try not to scare potential >>> students away. They are >>> supposed to learn what's required before the program starts. Use labels, >>> e.g. for the programming >>> language (java, c, c++, erlang, python, brainfuck, ...) or technology area >>> (cloud, xml, web, foo, >>> bar, ...) and record them at [5]. Please use the COMDEV JIRA project for >>> recording your ideas if >>> your project doesn't use JIRA (e.g. httpd, ooo). Contact >>> [email protected] if you need >>> assistance. >>> >>> - subscribe to [email protected] (restricted to potential mentors, >>> meant to be used as a >>> private list - general discussions on the public [email protected] >>> list as much as possible >>> please). Use a recognized address when subscribing (@apache.org or one of >>> your alias addresses on >>> record). >>> >>> Note that the ASF isn't accepted yet, nevertheless you *really* should >>> start recording your ideas now. >>> >>> Over the years we were able to complete hundreds of projects successfully. >>> Some of our prior >>> students are active contributors now! Let's make this a success again this >>> year! >>> >>> >>> Uli >>> >>> P.S.: Except for the private parts (label spreadsheet mostly), this email >>> is free to be shared >>> publicly if you want to. >>> >>> [1] http://www.google-melange.com/gsoc/homepage/google/gsoc2013 >>> [2] http://community.apache.org/guide-to-being-a-mentor.html >>> [3] http://s.apache.org/gsoc2013ideas >>> [4] http://community.apache.org/gsoc.html >>> [5] http://s.apache.org/gsoclabels >>> >
