ok, that makes sense, yes I do definately want versionability, and yes, am happy to pay that performance (it is small in reality). The only issue I had was memory use when trying to load up the system to do some measurements, I got there in the end, thanks for everyones help !
Michael. On 9/6/06, Stefan Guggisberg <[EMAIL PROTECTED]> wrote:
On 9/4/06, Stefan Guggisberg <[EMAIL PROTECTED]> wrote: > On 9/4/06, Michael Neale <[EMAIL PROTECTED]> wrote: > > Hi Stefan. > > > > Node types attached, and the example code that rips through it and saves > > stuff. Let me know if there is anything obvious I am doing wrong ! > > > > > > Any one interested can download the loop code and node types from this zip: > > http://www.users.on.net/~michaelneale/work/jackrabbit_perf.zip > > thanks, michael. i'll have a look at it, hopefully sometime this week, > and i'll get back to you with my findings. i solved the mistery ;) your node type extends from mix:versionable which explains why using nt:unstructured provides better performance. if you add mix:versionable to your nt:unstructured nodes you'll get about the same performance figures as when using your own node types. versionability of a node inevitably incurs additional overhead (such as allocating resources in the version store). unless you really need versionability i'd suggest to avoid mix:versionable in your node type model. cheers stefan > > cheers > stefan > > > > > On 9/4/06, Stefan Guggisberg <[EMAIL PROTECTED]> wrote: > > > > > > hi michael, > > > > > > On 9/4/06, Michael Neale <[EMAIL PROTECTED]> wrote: > > > > hi Stefan. > > > > > > > > Yes I was able to make it rip through saving lots of simple nodes like > > > that > > > > no problem. > > > > When I add more properties, it degrades a fair bit (probably not > > > surprising > > > > if I guess at how the data is being stored for each property). > > > > > > > > Interestingly, when I use my own specific node type it slows down quite > > > a > > > > lot (and memory consumption goes up) then with nt:unstructured, yet with > > > all > > > > other properties being set in the same way. I had to bump up the memory > > > > quite a lot to avoid OutOfMemoryException's. > > > > > > that's indeed very interesting and comes as a surprise. would you mind > > > sharing > > > with us your node type definitions and some sample code? i'd like to > > > investigate > > > this further. > > > > > > cheers > > > stefan > > > > > > > > > > > In the end, when I batched things up, I was able to ramp up the number > > > of > > > > nodes to what I wanted to test. Performance was acceptable once it was > > > > loaded up - it is definately the save() operations that are the most > > > > expensive. It was just very very difficult to build up my test data > > > without > > > > killing memory. > > > > > > > > Thanks everyone for your help, I have learned a lot about jackrabbit in > > > the > > > > meantime. > > > > > > > > On 9/1/06, Stefan Guggisberg <[EMAIL PROTECTED]> wrote: > > > > > > > > > > hi michael > > > > > > > > > > i quickly ran a test which successfully added 20k child nodes to the > > > same > > > > > parent (whether that's a useful content model is a different > > > story...). > > > > > > > > > > here's the code i used to test: > > > > > > > > > > Node parent = root.addNode("foo", "nt:unstructured"); > > > > > for (int i = 1; i <= 20000; i++) { > > > > > parent.addNode("bar"); > > > > > if (i % 1000 == 0) { > > > > > root.save(); > > > > > System.out.println("added 1000 child nodes; total=" + i); > > > > > } > > > > > } > > > > > > > > > > note that save() is a relatively expensive operation; it therefore > > > makes > > > > > sense > > > > > to batch multiple addNode etc calls (which are relatively > > > inexpensive). > > > > > > > > > > please provide a simple self-contained test case that reproduces the > > > > > behaviour > > > > > you're describing. > > > > > > > > > > cheers > > > > > stefan > > > > > > > > > > On 9/1/06, Michael Neale <[EMAIL PROTECTED]> wrote: > > > > > > 1: > > > > > > yeah I use JProfiler - top of the charts with a bullet was: > > > > > > org.apache.jackrabbit.util.WeakIdentityCollection$WeakRef (a ha ! > > > that > > > > > would > > > > > > explain the performance slug when GC has to kick in late in the > > > piece). > > > > > > followed by: > > > > > > org.apache.derby.impl.store.raw.data.StoredRecordHeader > > > > > > and of course a whole lot of byte[]. > > > > > > > > > > > > I am using default everything (which means Derby) and no blobs > > > > > whatsoever > > > > > > (so all in the database). > > > > > > > > > > > > 2: > > > > > > If I logout, and use fresh everything, it seems to continue fine (ie > > > > > fast > > > > > > enough pace), but I haven't really pushed it where I wanted to get > > > it > > > > > (10000 > > > > > > Child nodes). > > > > > > > > > > > > Responding to Alexandru's email (hi alex, nice work on InfoQ if I > > > > > remember > > > > > > correctly ! I am a fan), it would seem that the Session keeps most > > > in > > > > > > memory, which I can understand. > > > > > > > > > > > > I guess my problem is that I am trying to load up the system to test > > > > > really > > > > > > basically that it scales to the numbers that I know I need to scale > > > to, > > > > > but > > > > > > I am having trouble getting the data in - bulk load wise. If I bump > > > up > > > > > the > > > > > > memory, it certainly seems to hum along better, but if Session is > > > > > keeping a > > > > > > lot around, then this will have limits - there is no way to "clear" > > > the > > > > > > session ? > > > > > > > > > > > > Perhaps I will explain what I am using JCR for (feel free to smack > > > me > > > > > down > > > > > > if this is not what JCR and Jackrabbit are ever indended for): > > > > > > I am storing "atomic business rules" (which means each node is a > > > small > > > > > > single business rule). The data on each node is very small. These > > > nodes > > > > > are > > > > > > stored flat as child nodes under a top level node. To give structure > > > > > > (categorisation) for the users, I have references to these nodes all > > > > > over > > > > > > the place so people can navigate them all sorts of different ways > > > (as > > > > > there > > > > > > is no one clear hierarchy at the time the rules are created). JCR > > > gives > > > > > me > > > > > > most of what I need, but as these rule nodes can number in the > > > > > thousands > > > > > > (4000 is not uncommon for a reasonably complex business unit), > > > then I > > > > > am > > > > > > worried that this just can't work. > > > > > > > > > > > > I have seen from past posts that people put nodes under different > > > > > parents > > > > > > (so there is no great number of child nodes) so that is one option, > > > but > > > > > my > > > > > > gut feel is that its the WeakIdentityCollection: this well meaning > > > code > > > > > > means that the GC has to due a huge amount of work at the worst > > > possible > > > > > > time (when under stress). I am sure most of the time this is not an > > > > > issue. > > > > > > > > > > > > Any ideas/tips/gotchas for a newbie? I would really like to be > > > confident > > > > > > that I can scale up enough (its modest) with JCR for this purpose. > > > > > > > > > > > > On 8/31/06, Nicolas <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > 2 more ideas: > > > > > > > > > > > > > > 1/ Did you try using a memory profiler so we can know what is > > > wrong? > > > > > > > > > > > > > > 2/ What happens if you logout after say 100 updates? > > > > > > > > > > > > > > > > > > > > > a+ > > > > > > > Nico > > > > > > > my blog! http://www.deviant-abstraction.net !! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
