Re: [basex-talk] Help with a Query/Performance

2020-01-21 Thread Tom Rauchenwald (UNIFITS)
Hi Christian,

thanks for the help!

I have a working version of this now that performs well.
Initially I didn't want to reconstruct parts of the message, because we have a 
couple of different versions of these containers and usually there are multiple 
namespaces and prefixes involved that should be preserved. But it turns out 
this was easier than I thought.

Thanks & greetings from Salzburg,
Tom



Von: Christian Grün 
Gesendet: Montag, 20. Jänner 2020 19:06
An: Tom Rauchenwald (UNIFITS) 
Cc: basex-talk@mailman.uni-konstanz.de 
Betreff: Re: [basex-talk] Help with a Query/Performance

I missed to do the obvious next step. The following query is evaluated
in a few milliseconds:

  declare variable $OFFSET1 := 3;
  declare variable $OFFSET2 := 2;

  let $container := db:open('tr-test')/Container
  let $message := $container/*:MessageA[$OFFSET1]
  let $detail := $message/MessageADetail[$OFFSET2]
  return element { name($container) } {
$container/*[contains(name(), 'MetaData')],
element { name($message) } {
  $message/MessageAMetaData,
  element { name($detail) } {
$detail/*
  }
}
  }


On Mon, Jan 20, 2020 at 6:54 PM Christian Grün
 wrote:
>
> Dear Tom,
>
> If you have large elements, it will usually be faster to create new
> elements. Here’s one way to do it:
>
>   let $offset1 := 3
>   let $offset2 := 2
>   let $container := db:open('tr-test')/Container
>   return element Container {
> (: add meta data elements :)
> $container/*[starts-with(name(), 'ContainerMetaData')],
> (: alternative: add everything except Message elements
> $container/(* except (MessageA, MessageB, MessageC)), :)
> $container/MessageA[$offset1] update {
>   delete node MessageADetail[position() != $offset2]
> }
>   }
>
> There are probably ways to get this even faster; I may have a look at
> this tomorrow.
>
> All the best from Konstanz,
> Christian
>
>
>
> On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS)
>  wrote:
> >
> > Hi list,
> >
> > I'm struggling with a query.
> >
> > We have XML documents with a structure similar to this:
> >
> > 
> >   FOO
> >   FOO
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> >
> > Messages are bundled in a container (up to n times for each message), and 
> > each message has details (also up to n times). Container, Message contain 
> > data that is the same for all details (it's basically a grouping).
> > I'd like to retrieve a Detail with all corresponding data associated with 
> > it, so basically a MessageADetail, MessageA (without all the other 
> > MessageADetails), Container (without all the other Messages).
> > I know the position of the message (i.e., I know that I want the second 
> > MessageA for example), and I know the position of the Detail (i.e., I know 
> > that I want the 3rd Detail).
> > The use case is to show the detail in context in a UI.
> >
> > The query to do this I came up with is (here I want to get the 2nd detail 
> > from the third MessageA):
> >
> >   let $fh := (copy $x := /*:Container
> >modify ( delete node $x/*:MessageA[position() != 3]
> >   , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
> >   , delete node $x/*:MessageB
> >   , delete node $x/*:MessageC
> >   )
> >   return $x)
> >   return $fh
> >
> > This works well for small documents. For large documents it can take a 
> > couple of seconds to evaluate the query (our real-life documents do have 
> > more data/elements in Details and Message).
> > I'm wondering if there's a better/more efficient way to do this. I tried 
> > formulating a query that doesn't do deletes, but I couldn't come up with a 
> > solution that performs better and is correct.
> >
> > Any pointers would be very much appreciated.
> >
> > Here's a function to generate sufficiently large test data:
> >
> > declare function local:sample($numberOfMessages, $numberOfDetails) {
> > 
> >   FOO
> >   FOO
> >   {for $i in 1 to $numberOfMessages
> > return
> >   
> > 
> >   FOO {$i}
> >   FOO {$i}
> > 
> > {for $j in 1 to $numberOfDetails
> >  return
> >  
> >FOO {$j}
> >FOO {$j}
> >  
> > }
> >   
> >   }
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> > };
> >
> > db:create('tr-test', local:sample(20, 10), 'test.xml')
> >
> > Thanks,
> > Tom Rauchenwald
> >
> >


[basex-talk] Help with a Query/Performance

2020-01-20 Thread Tom Rauchenwald (UNIFITS)
Hi list,

I'm struggling with a query.

We have XML documents with a structure similar to this:


  FOO
  FOO
  

  FOO
  FOO


  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  


Messages are bundled in a container (up to n times for each message), and each 
message has details (also up to n times). Container, Message contain data that 
is the same for all details (it's basically a grouping).
I'd like to retrieve a Detail with all corresponding data associated with it, 
so basically a MessageADetail, MessageA (without all the other 
MessageADetails), Container (without all the other Messages).
I know the position of the message (i.e., I know that I want the second 
MessageA for example), and I know the position of the Detail (i.e., I know that 
I want the 3rd Detail).
The use case is to show the detail in context in a UI.

The query to do this I came up with is (here I want to get the 2nd detail from 
the third MessageA):

  let $fh := (copy $x := /*:Container
   modify ( delete node $x/*:MessageA[position() != 3]
  , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
  , delete node $x/*:MessageB
  , delete node $x/*:MessageC
  )
  return $x)
  return $fh

This works well for small documents. For large documents it can take a couple 
of seconds to evaluate the query (our real-life documents do have more 
data/elements in Details and Message).
I'm wondering if there's a better/more efficient way to do this. I tried 
formulating a query that doesn't do deletes, but I couldn't come up with a 
solution that performs better and is correct.

Any pointers would be very much appreciated.

Here's a function to generate sufficiently large test data:

declare function local:sample($numberOfMessages, $numberOfDetails) {

  FOO
  FOO
  {for $i in 1 to $numberOfMessages
return
  

  FOO {$i}
  FOO {$i}

{for $j in 1 to $numberOfDetails
 return
 
   FOO {$j}
   FOO {$j}
 
}
  
  }
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  

};

db:create('tr-test', local:sample(20, 10), 'test.xml')

Thanks,
Tom Rauchenwald




Re: [basex-talk] Reflect.forName() / Performance

2018-10-23 Thread Tom Rauchenwald
Hi Christian,

> Your hint was helpful: If a function signature is not registered yet
> when it is parsed, all kind of lookups are performed to locate the
> function. This is e.g. the case if the function is recursive, or if
> the invoked functions occurs after the currently parsed code.
>
> I have revised the code which is responsible for locating invoked
> functions: In the JavaFunction class [1], the lookup will be avoided
> if we can tell in advance that no Java class will be found.

I did some non-scientific testing, the results are quite awesome.
This is with one of our regression test suites, in this case we create
600 databases with exactly one (small) file, run a few queries against
it (typically 3), and our business logic interprets the results (the use
case is to check if some business rules are violated in the xml).

Here are the results, the numbers are seconds to execute the full test suite:

| | BaseX 8.6.1 | BaseX 9.0.2 | BaseX 9.1 Snapshot |
|-+-+-+|
|   1 |42.5 |44.5 |   20.3 |
|   2 |42.5 |44.4 |   18.1 |
|   3 |42.9 |44.6 |   18.1 |
|   4 |42.5 |44.8 |   18.2 |
|   5 |42.5 |43.8 |   18.6 |
|-+-+-+|
| Avg |42.6 |44.4 |   18.7 |

So our case is now more than twice as fast as before :)

I did some profiling/sampling with JProfiler as well, the Class.forName
call is now almost completely gone, while it was the most-hit method
before.

I guess that means that improving the caching of classes is not needed,
since you fixed the core problem.

Thanks a lot, this really helps us a lot!
-tom



Re: [basex-talk] Reflect.forName() / Performance

2018-10-17 Thread Tom Rauchenwald
Hi Christian,

thanks for your feedback, I hope you're doing well!

> I have some concerns that the caching of non-existing classes could be
> exploited and bloat the cache. Maybe we’d need to use WeakHashMap
> (and/or soft references) instead?

I didn't think about that, I'll try to come up with a better solution.

>> I'm not sure why BaseX tries to load our xqm as Java Modules, but
>> what I noticed is that Reflect.forName caches the positive case
>> (i.e., the class is found), but not the negative case (i.e., the class is 
>> not found).
>
> Sounds like an interesting finding; maybe there’s something we can
> optimize here. Could you possibly provide us a little self-contained
> example that demonstrates the behavior?

Sure. This is what I'm doing:

Given the following module (installed with module install foo.xqm):

module namespace uc = 'http://unifits.com/common';
declare function uc:remove-elements($input as element(), $remove-names as 
xs:string*) as element() {
  element {node-name($input) }
  {$input/@*,
for $child in $input/node()[not(local-name() = $remove-names)]
return
  if ($child instance of element())
  then uc:remove-elements($child, $remove-names)
  else $child
   }
};


if I start BaseXGui in debug mode, and set a breakpoint in
Reflect.forName(), every time I execute a query such as

import module namespace uc = 'http://unifits.com/common';
/

the breakpoint is hit, i.e. Class.forName() is called. 
I *think* this might have to do with the fact that the function above is
recursive, but I have to admit that I don't really grasp the code that
does the module loading/parsing.

Thanks,
-tom



[basex-talk] Reflect.forName() / Performance

2018-10-17 Thread Tom Rauchenwald (UNIFITS)
Hi BaseX-Team,


when profiling some of our tests i found that we spend some time in 
Reflect.forName().

We have 2 xquery modules in the repo (we don't call java code directly).


I'm not sure why BaseX tries to load our xqm as Java Modules, but what I 
noticed is that Reflect.forName caches the positive case (i.e., the class is 
found), but not the negative case (i.e., the class is not found).

I've changed the code to cache the negative case as well (see below), and 
noticed an improvement of about 5 percent.

Our tests create and query loads of small databases, so this is maybe quite an 
artificial speedup.


I could provide a PR if this is a worthwhile improvement in your opinion (and 
if I'm not missing something obvious).


We're still on BaseX 8.7.6 in case that matters, as far as I could see the Code 
didn't change in BaseX 9.


Thanks,

Tom


Code:


public static Class forName(final String name) throws ClassNotFoundException 
{
Class c = CLASSES.get(name);

if(c == null) {
  if (CLASSES.containsKey(name)) {
throw new ClassNotFoundException(name);
  } else {
try {
  c = Class.forName(name);
} catch (ClassNotFoundException e) {
  CLASSES.put(name, null);
  throw e;
}
if (!Modifier.isPublic(c.getModifiers())) throw new 
ClassNotFoundException(name);
CLASSES.put(name, c);
  }
}
return c;
  }



Re: [basex-talk] Location of users.xml

2015-11-27 Thread Tom Rauchenwald
Hi Christian,

hope you're well!

>> is it possible to override the expected location of users.xml?
>
> Currently no, but it’s possible to have the data directory inside the
> war file. Do you use a custom location?
>
> If you use RESTXQ, you could use XQuery to copy a users.xml file to
> the database directory.

Thanks for the quick response.
We use the java client api, so the users.xml needs to be in place or we
won't be able to connect.

We use usually use a custom location for the database directory.

I'll have to think about this a bit more, but I guess we could adapt
the war file to our needs and copy the users.xml to the database
directory when basex starts up.

> Christian

Thanks,
-tom



[basex-talk] Location of users.xml

2015-11-27 Thread Tom Rauchenwald
Hi,

is it possible to override the expected location of users.xml?

>From the wiki (http://docs.basex.org/wiki/User_Management)

> The permission file has been moved from the home directory to the
> database directory. It was renamed from .basexperm to users.xml

We're using the war distribution, and would prefer to either specify
the location of users.xml via a parameter or have it reside in the
war.

Is this currently possible?

Thanks,
-tom