Re: [basex-talk] Help with a Query/Performance
Hi Christian, thanks for the help! I have a working version of this now that performs well. Initially I didn't want to reconstruct parts of the message, because we have a couple of different versions of these containers and usually there are multiple namespaces and prefixes involved that should be preserved. But it turns out this was easier than I thought. Thanks & greetings from Salzburg, Tom Von: Christian Grün Gesendet: Montag, 20. Jänner 2020 19:06 An: Tom Rauchenwald (UNIFITS) Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] Help with a Query/Performance I missed to do the obvious next step. The following query is evaluated in a few milliseconds: declare variable $OFFSET1 := 3; declare variable $OFFSET2 := 2; let $container := db:open('tr-test')/Container let $message := $container/*:MessageA[$OFFSET1] let $detail := $message/MessageADetail[$OFFSET2] return element { name($container) } { $container/*[contains(name(), 'MetaData')], element { name($message) } { $message/MessageAMetaData, element { name($detail) } { $detail/* } } } On Mon, Jan 20, 2020 at 6:54 PM Christian Grün wrote: > > Dear Tom, > > If you have large elements, it will usually be faster to create new > elements. Here’s one way to do it: > > let $offset1 := 3 > let $offset2 := 2 > let $container := db:open('tr-test')/Container > return element Container { > (: add meta data elements :) > $container/*[starts-with(name(), 'ContainerMetaData')], > (: alternative: add everything except Message elements > $container/(* except (MessageA, MessageB, MessageC)), :) > $container/MessageA[$offset1] update { > delete node MessageADetail[position() != $offset2] > } > } > > There are probably ways to get this even faster; I may have a look at > this tomorrow. > > All the best from Konstanz, > Christian > > > > On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS) > wrote: > > > > Hi list, > > > > I'm struggling with a query. > > > > We have XML documents with a structure similar to this: > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > Messages are bundled in a container (up to n times for each message), and > > each message has details (also up to n times). Container, Message contain > > data that is the same for all details (it's basically a grouping). > > I'd like to retrieve a Detail with all corresponding data associated with > > it, so basically a MessageADetail, MessageA (without all the other > > MessageADetails), Container (without all the other Messages). > > I know the position of the message (i.e., I know that I want the second > > MessageA for example), and I know the position of the Detail (i.e., I know > > that I want the 3rd Detail). > > The use case is to show the detail in context in a UI. > > > > The query to do this I came up with is (here I want to get the 2nd detail > > from the third MessageA): > > > > let $fh := (copy $x := /*:Container > >modify ( delete node $x/*:MessageA[position() != 3] > > , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] > > , delete node $x/*:MessageB > > , delete node $x/*:MessageC > > ) > > return $x) > > return $fh > > > > This works well for small documents. For large documents it can take a > > couple of seconds to evaluate the query (our real-life documents do have > > more data/elements in Details and Message). > > I'm wondering if there's a better/more efficient way to do this. I tried > > formulating a query that doesn't do deletes, but I couldn't come up with a > > solution that performs better and is correct. > > > > Any pointers would be very much appreciated. > > > > Here's a function to generate sufficiently large test data: > > > > declare function local:sample($numberOfMessages, $numberOfDetails) { > > > > FOO > > FOO > > {for $i in 1 to $numberOfMessages > > return > > > > > > FOO {$i} > > FOO {$i} > > > > {for $j in 1 to $numberOfDetails > > return > > > >FOO {$j} > >FOO {$j} > > > > } > > > > } > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > }; > > > > db:create('tr-test', local:sample(20, 10), 'test.xml') > > > > Thanks, > > Tom Rauchenwald > > > >
[basex-talk] Help with a Query/Performance
Hi list, I'm struggling with a query. We have XML documents with a structure similar to this: FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping). I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages). I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail). The use case is to show the detail in context in a UI. The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA): let $fh := (copy $x := /*:Container modify ( delete node $x/*:MessageA[position() != 3] , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] , delete node $x/*:MessageB , delete node $x/*:MessageC ) return $x) return $fh This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message). I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct. Any pointers would be very much appreciated. Here's a function to generate sufficiently large test data: declare function local:sample($numberOfMessages, $numberOfDetails) { FOO FOO {for $i in 1 to $numberOfMessages return FOO {$i} FOO {$i} {for $j in 1 to $numberOfDetails return FOO {$j} FOO {$j} } } FOO FOO FOO FOO FOO FOO FOO FOO }; db:create('tr-test', local:sample(20, 10), 'test.xml') Thanks, Tom Rauchenwald
Re: [basex-talk] Reflect.forName() / Performance
Hi Christian, > Your hint was helpful: If a function signature is not registered yet > when it is parsed, all kind of lookups are performed to locate the > function. This is e.g. the case if the function is recursive, or if > the invoked functions occurs after the currently parsed code. > > I have revised the code which is responsible for locating invoked > functions: In the JavaFunction class [1], the lookup will be avoided > if we can tell in advance that no Java class will be found. I did some non-scientific testing, the results are quite awesome. This is with one of our regression test suites, in this case we create 600 databases with exactly one (small) file, run a few queries against it (typically 3), and our business logic interprets the results (the use case is to check if some business rules are violated in the xml). Here are the results, the numbers are seconds to execute the full test suite: | | BaseX 8.6.1 | BaseX 9.0.2 | BaseX 9.1 Snapshot | |-+-+-+| | 1 |42.5 |44.5 | 20.3 | | 2 |42.5 |44.4 | 18.1 | | 3 |42.9 |44.6 | 18.1 | | 4 |42.5 |44.8 | 18.2 | | 5 |42.5 |43.8 | 18.6 | |-+-+-+| | Avg |42.6 |44.4 | 18.7 | So our case is now more than twice as fast as before :) I did some profiling/sampling with JProfiler as well, the Class.forName call is now almost completely gone, while it was the most-hit method before. I guess that means that improving the caching of classes is not needed, since you fixed the core problem. Thanks a lot, this really helps us a lot! -tom
Re: [basex-talk] Reflect.forName() / Performance
Hi Christian, thanks for your feedback, I hope you're doing well! > I have some concerns that the caching of non-existing classes could be > exploited and bloat the cache. Maybe we’d need to use WeakHashMap > (and/or soft references) instead? I didn't think about that, I'll try to come up with a better solution. >> I'm not sure why BaseX tries to load our xqm as Java Modules, but >> what I noticed is that Reflect.forName caches the positive case >> (i.e., the class is found), but not the negative case (i.e., the class is >> not found). > > Sounds like an interesting finding; maybe there’s something we can > optimize here. Could you possibly provide us a little self-contained > example that demonstrates the behavior? Sure. This is what I'm doing: Given the following module (installed with module install foo.xqm): module namespace uc = 'http://unifits.com/common'; declare function uc:remove-elements($input as element(), $remove-names as xs:string*) as element() { element {node-name($input) } {$input/@*, for $child in $input/node()[not(local-name() = $remove-names)] return if ($child instance of element()) then uc:remove-elements($child, $remove-names) else $child } }; if I start BaseXGui in debug mode, and set a breakpoint in Reflect.forName(), every time I execute a query such as import module namespace uc = 'http://unifits.com/common'; / the breakpoint is hit, i.e. Class.forName() is called. I *think* this might have to do with the fact that the function above is recursive, but I have to admit that I don't really grasp the code that does the module loading/parsing. Thanks, -tom
[basex-talk] Reflect.forName() / Performance
Hi BaseX-Team, when profiling some of our tests i found that we spend some time in Reflect.forName(). We have 2 xquery modules in the repo (we don't call java code directly). I'm not sure why BaseX tries to load our xqm as Java Modules, but what I noticed is that Reflect.forName caches the positive case (i.e., the class is found), but not the negative case (i.e., the class is not found). I've changed the code to cache the negative case as well (see below), and noticed an improvement of about 5 percent. Our tests create and query loads of small databases, so this is maybe quite an artificial speedup. I could provide a PR if this is a worthwhile improvement in your opinion (and if I'm not missing something obvious). We're still on BaseX 8.7.6 in case that matters, as far as I could see the Code didn't change in BaseX 9. Thanks, Tom Code: public static Class forName(final String name) throws ClassNotFoundException { Class c = CLASSES.get(name); if(c == null) { if (CLASSES.containsKey(name)) { throw new ClassNotFoundException(name); } else { try { c = Class.forName(name); } catch (ClassNotFoundException e) { CLASSES.put(name, null); throw e; } if (!Modifier.isPublic(c.getModifiers())) throw new ClassNotFoundException(name); CLASSES.put(name, c); } } return c; }
Re: [basex-talk] Location of users.xml
Hi Christian, hope you're well! >> is it possible to override the expected location of users.xml? > > Currently no, but it’s possible to have the data directory inside the > war file. Do you use a custom location? > > If you use RESTXQ, you could use XQuery to copy a users.xml file to > the database directory. Thanks for the quick response. We use the java client api, so the users.xml needs to be in place or we won't be able to connect. We use usually use a custom location for the database directory. I'll have to think about this a bit more, but I guess we could adapt the war file to our needs and copy the users.xml to the database directory when basex starts up. > Christian Thanks, -tom
[basex-talk] Location of users.xml
Hi, is it possible to override the expected location of users.xml? >From the wiki (http://docs.basex.org/wiki/User_Management) > The permission file has been moved from the home directory to the > database directory. It was renamed from .basexperm to users.xml We're using the war distribution, and would prefer to either specify the location of users.xml via a parameter or have it reside in the war. Is this currently possible? Thanks, -tom