Yes, Saxon loads all documents are loaded into memory then processed. VXQuery loads enough documents to fill a frame and then passes them on to the next operator.
On Thu, Feb 13, 2014 at 10:27 AM, Till Westmann <[email protected]> wrote: > One more question about this: We're querying a collection of documents, > right? So if Saxon run out of memory, does that mean that it first loads > all the documents in the collection into memory and keeps them there? > > Thanks, > Till > > On Feb 12, 2014, at 9:31 PM, Till Westmann <[email protected]> wrote: > > > Right, I forgot about that. > > > > Thanks, > > Till > > > > On Feb 12, 2014, at 12:23 PM, Eldon Carman <[email protected]> wrote: > > > >> They have a version that supports streams to handle larger files. Its > just > >> not the free version. > >> > >> > >> On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> > wrote: > >> > >>> Hi Preston, > >>> > >>> do you have indications that this is a limitation of just the free > version? > >>> I think that it wouldn't be completely surprising to see a big memory > blow > >>> up. > >>> Assuming that the XML file is in single-byte UTF-8 (which I think it > is) > >>> and that the text is stored in 2-byte UTF-16 characters in the JVM, we > >>> already have a factor of 2. And then there are probably a number of > objects > >>> and references that take up additional memory. So it might be that all > >>> versions of Saxon take up a lot of space in memory. But of course it is > >>> also possible that the commercial version uses a more memory efficient > >>> representation. > >>> > >>> Cheers, > >>> Till > >>> > >>> On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote: > >>> > >>>> In testing larger datasets sizes, saxon has run into a memory > >>> limitation. A > >>>> data set size of 2.21 GB was not able to be queried by saxon. Even > with > >>>> setting the java heap size be larger than the data set, the > application > >>>> throws an error: "Exception in thread "main" > java.lang.OutOfMemoryError: > >>> GC > >>>> overhead limit exceeded". Just to confirm, I used the following > settings: > >>>> JAVA_OPTS="-Xmx12g -Xms12g" > >>>> > >>>> Several internet posts comment on allocating 5 times as much memory as > >>> the > >>>> xml data size as a rule of thumb. Not guaranteed to work. Some of my > >>>> testing have worked with datasets up to 460MB (happens to the be the > my > >>>> tiny dataset size). Guess we now have confirmed the memory limitation > of > >>>> the free version of saxon. > >>> > >>> > > > >
