Hi Jonannes, Charles and Michael,
at first thanks for your immediate readiness to help.
I will shortly present the structure of the database:
Dataset
Structure
Institute Name=Physik
Degree Abbr=ABC Name=ABC
Module Abbr=HIJ Name=HIJ
!-- the Module nodes are arbirtrary
nested in themselves --
/Module
!-- more Module nodes --
/Degree
!-- more Degree nodes --
/Institute
!--more Institute nodes--
/Structure
!-- other informations --
Lessons
Lesson ID=12345
Name lang=deName of a Lesson/Name
AssociatedModules
Module Abbr=HIJ/
Module Abbr=ABC/
!-- there are 1..unbounded Modules per Lesson,
only modules
containing no modules are referenced --
/AssociatedModules
!-- othere informations --
/Lesson
/Lessons
/Dataset
The task is now to create a list like that:
http://vlvz1.physik.hu-berlin.de/ss2012/physik/verzeichnis/en/, that is
the whole structure, but only with Modules, where are in fact associated
lessons.
The current query looks like this:
let $lang := data($ses/lang)
let $sem := data($ses/sem)
let $inst := data($ses/inst)
let $semxml := db:open(vlvz,concat($sem,'.xml'))
let $moduleswithlvs :=
distinct-values($semxml//AssociatedModules/Module/@Abbr)
return
span
div class=struc
{
for $degree in
$semxml//Institute[@Name=$ses//inst]/Degree[Modules//Module/@Abbr=$moduleswithlvs]
return div class=indent
span
class=degree{data($degree/@Abbr)}#x20;{data($degree/@Name)}br//span
{
for $module in $degree/Modules//Module[(* and
*/@Abbr=$moduleswithlvs) or @Abbr=$moduleswithlvs]
let $leaf := not($module/*)
let $depth := functx:depth-of-node($module)-7
return
div class=indent depth{$depth}
{data($module/@Abbr)}#x20;{data($module/@Name)}#x20;br/
{
if ($leaf)
then
div class=indent
{
for $lesson in vlvz:getlvs($semxml,data($module/@Abbr))
return div class=lessonspan
class=lessonid{$lesson/@ID}/spanspan
class=lessonname{$lesson/Name[@lang=$ses//lang]}/spanspan
class=lessonmodules{string-join($lesson/AssociatedModules/Module/@Abbr,',
')}/span/div
!-- note [1] --
}
/div
else ()
}
/div
}
/div
}
/div
/span
I noticed already, that [1] is crucial: This node makes running the
query about 10 times longer than with returning an empty sequence
There is no difference with respect to just returning div/div, its
as slow as with its content.
I should also mention the function vlvz:getlvs:
declare function vlvz:getlvs($semxml as node()*,$modabbr as xs:string)
as node()*
{
for $l in $semxml//Lesson
where $l[AssociatedModules/Module/@Abbr=$modabbr]
order by data($l/@ID)
return $l
};
That the queries are bad designed with respect to performance is
probably the case: Basicly all what I've done till know with XQuery was
just learning by doing.
Beste Grüße aus der Hauptstadt,
Ronny
On 03/29/2012 11:00 AM, Michael Seiferle wrote:
Hi Ronny,
Hi Johannes Charles, thanks for joining the conversation.
In my opinion, and speaking officially for BaseX, I'd suppose that XML
processing with BaseX databases should almost always[1] be faster than
processing the XML sequentially via lxml.
However, performance may vary depending on the actual queries and/or the
python glue code.
I think Charles' approach of having as much logic in XQuery as possible
will be the best option to pick here.
Maybe some of your Python code could as well be rewritten in XQuery, on
the other hand this might not even be necessary due to XQuery rewrites
as Johannes suggested.
@Ronny, maybe you could provide us with some sample code? In case it is
not intended for the general public feel free to send it to
supp...@basex.org mailto:supp...@basex.org.
Looking forward to seeing your code!
Viele Grüße vom Bodensee
Michael
[1] I can sure think of examples that prove me wrong ;-)
Am 28.03.2012 um 23:19 schrieb Johannes.Lichtenberger:
Thus I suppose it
would be the best to write the queries in a reply, such that the BaseX
team can make suggestions for similar queries which better utilize
index-structures and the query optimizations from the query processor.
___
BaseX-Talk mailing list