Re: [basex-talk] Help with a Query/Performance

2020-01-21 Thread Tom Rauchenwald (UNIFITS)
Hi Christian,

thanks for the help!

I have a working version of this now that performs well.
Initially I didn't want to reconstruct parts of the message, because we have a 
couple of different versions of these containers and usually there are multiple 
namespaces and prefixes involved that should be preserved. But it turns out 
this was easier than I thought.

Thanks & greetings from Salzburg,
Tom



Von: Christian Grün 
Gesendet: Montag, 20. Jänner 2020 19:06
An: Tom Rauchenwald (UNIFITS) 
Cc: basex-talk@mailman.uni-konstanz.de 
Betreff: Re: [basex-talk] Help with a Query/Performance

I missed to do the obvious next step. The following query is evaluated
in a few milliseconds:

  declare variable $OFFSET1 := 3;
  declare variable $OFFSET2 := 2;

  let $container := db:open('tr-test')/Container
  let $message := $container/*:MessageA[$OFFSET1]
  let $detail := $message/MessageADetail[$OFFSET2]
  return element { name($container) } {
$container/*[contains(name(), 'MetaData')],
element { name($message) } {
  $message/MessageAMetaData,
  element { name($detail) } {
$detail/*
  }
}
  }


On Mon, Jan 20, 2020 at 6:54 PM Christian Grün
 wrote:
>
> Dear Tom,
>
> If you have large elements, it will usually be faster to create new
> elements. Here’s one way to do it:
>
>   let $offset1 := 3
>   let $offset2 := 2
>   let $container := db:open('tr-test')/Container
>   return element Container {
> (: add meta data elements :)
> $container/*[starts-with(name(), 'ContainerMetaData')],
> (: alternative: add everything except Message elements
> $container/(* except (MessageA, MessageB, MessageC)), :)
> $container/MessageA[$offset1] update {
>   delete node MessageADetail[position() != $offset2]
> }
>   }
>
> There are probably ways to get this even faster; I may have a look at
> this tomorrow.
>
> All the best from Konstanz,
> Christian
>
>
>
> On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS)
>  wrote:
> >
> > Hi list,
> >
> > I'm struggling with a query.
> >
> > We have XML documents with a structure similar to this:
> >
> > 
> >   FOO
> >   FOO
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> >
> > Messages are bundled in a container (up to n times for each message), and 
> > each message has details (also up to n times). Container, Message contain 
> > data that is the same for all details (it's basically a grouping).
> > I'd like to retrieve a Detail with all corresponding data associated with 
> > it, so basically a MessageADetail, MessageA (without all the other 
> > MessageADetails), Container (without all the other Messages).
> > I know the position of the message (i.e., I know that I want the second 
> > MessageA for example), and I know the position of the Detail (i.e., I know 
> > that I want the 3rd Detail).
> > The use case is to show the detail in context in a UI.
> >
> > The query to do this I came up with is (here I want to get the 2nd detail 
> > from the third MessageA):
> >
> >   let $fh := (copy $x := /*:Container
> >modify ( delete node $x/*:MessageA[position() != 3]
> >   , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
> >   , delete node $x/*:MessageB
> >   , delete node $x/*:MessageC
> >   )
> >   return $x)
> >   return $fh
> >
> > This works well for small documents. For large documents it can take a 
> > couple of seconds to evaluate the query (our real-life documents do have 
> > more data/elements in Details and Message).
> > I'm wondering if there's a better/more efficient way to do this. I tried 
> > formulating a query that doesn't do deletes, but I couldn't come up with a 
> > solution that performs better and is correct.
> >
> > Any pointers would be very much appreciated.
> >
> > Here's a function to generate sufficiently large test data:
> >
> > declare function local:sample($numberOfMessages, $numberOfDetails) {
> > 
> >   FOO
> >   FOO
> >   {for $i in 1 to $numberOfMessages
> > return
> >   
> > 
> >   FOO {$i}
> >   FOO {$i}
> > 
> > {for $j in 1 to $numberOfDetails
> >  return
> >  
> >FOO {$j}
> >FOO {$j}
> >  
> > }
> >   
> >   }
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> > };
> >
> > db:create('tr-test', local:sample(20, 10), 'test.xml')
> >
> > Thanks,
> > Tom Rauchenwald
> >
> >


[basex-talk] Help with a Query/Performance

2020-01-20 Thread Tom Rauchenwald (UNIFITS)
Hi list,

I'm struggling with a query.

We have XML documents with a structure similar to this:


  FOO
  FOO
  

  FOO
  FOO


  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  


Messages are bundled in a container (up to n times for each message), and each 
message has details (also up to n times). Container, Message contain data that 
is the same for all details (it's basically a grouping).
I'd like to retrieve a Detail with all corresponding data associated with it, 
so basically a MessageADetail, MessageA (without all the other 
MessageADetails), Container (without all the other Messages).
I know the position of the message (i.e., I know that I want the second 
MessageA for example), and I know the position of the Detail (i.e., I know that 
I want the 3rd Detail).
The use case is to show the detail in context in a UI.

The query to do this I came up with is (here I want to get the 2nd detail from 
the third MessageA):

  let $fh := (copy $x := /*:Container
   modify ( delete node $x/*:MessageA[position() != 3]
  , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
  , delete node $x/*:MessageB
  , delete node $x/*:MessageC
  )
  return $x)
  return $fh

This works well for small documents. For large documents it can take a couple 
of seconds to evaluate the query (our real-life documents do have more 
data/elements in Details and Message).
I'm wondering if there's a better/more efficient way to do this. I tried 
formulating a query that doesn't do deletes, but I couldn't come up with a 
solution that performs better and is correct.

Any pointers would be very much appreciated.

Here's a function to generate sufficiently large test data:

declare function local:sample($numberOfMessages, $numberOfDetails) {

  FOO
  FOO
  {for $i in 1 to $numberOfMessages
return
  

  FOO {$i}
  FOO {$i}

{for $j in 1 to $numberOfDetails
 return
 
   FOO {$j}
   FOO {$j}
 
}
  
  }
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  

};

db:create('tr-test', local:sample(20, 10), 'test.xml')

Thanks,
Tom Rauchenwald




[basex-talk] Reflect.forName() / Performance

2018-10-17 Thread Tom Rauchenwald (UNIFITS)
Hi BaseX-Team,


when profiling some of our tests i found that we spend some time in 
Reflect.forName().

We have 2 xquery modules in the repo (we don't call java code directly).


I'm not sure why BaseX tries to load our xqm as Java Modules, but what I 
noticed is that Reflect.forName caches the positive case (i.e., the class is 
found), but not the negative case (i.e., the class is not found).

I've changed the code to cache the negative case as well (see below), and 
noticed an improvement of about 5 percent.

Our tests create and query loads of small databases, so this is maybe quite an 
artificial speedup.


I could provide a PR if this is a worthwhile improvement in your opinion (and 
if I'm not missing something obvious).


We're still on BaseX 8.7.6 in case that matters, as far as I could see the Code 
didn't change in BaseX 9.


Thanks,

Tom


Code:


public static Class forName(final String name) throws ClassNotFoundException 
{
Class c = CLASSES.get(name);

if(c == null) {
  if (CLASSES.containsKey(name)) {
throw new ClassNotFoundException(name);
  } else {
try {
  c = Class.forName(name);
} catch (ClassNotFoundException e) {
  CLASSES.put(name, null);
  throw e;
}
if (!Modifier.isPublic(c.getModifiers())) throw new 
ClassNotFoundException(name);
CLASSES.put(name, c);
  }
}
return c;
  }