Keeping it in header has the advantage that it remains generic and can be used 
from anywhere and with any type of parameters (e.g. not only for sorting 
xmlNodePtrs). If in .c file, there can only be one sort function. Although 
since the sort is used from only one place, it does not matter :-)
Another thing would be the need to move XP_OPTIMIZED_NON_ELEM_COMPARISON to a 
header included both from the sort.c and xpath.c. But that would probably be 
for better.

I have done more performance tests. Timsort behaves better than I thought (or 
rather Shellsort worse than I thought). For sorted nodesets of size n like 
'/item[true()]' Timsort does only n-1 comparisons, unlike current Shellsort. 
('/item' does not need sorting.) It does not matter whether the data are small 
or big, Timsort wins. For partially unsorted sequences like '/item[(position() 
mod 10) = 0] | /item[(position() mod 10) = 1] | ..' it wins too. It wins both 
at number of comparisons or valgrind instructions (of the whole sort).

/Vojtech

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stefan 
Behnel
Sent: 25. Ĩervence 2012 18:26
To: xml
Subject: Re: [xml] xmlXPathNodeSetSort performance

Vojtech Fried, 25.07.2012 17:45:
> Second version of Timsort patch, slightly more polished. It builds on 
> my gcc, I have fixed some warnings and merged the two headers into 
> one. I did not move the code to .c file though, because the sort 
> implementation uses some macro magic, i.e. the functions you see in 
> the code are really function "templates" and they are "instantiated" 
> with the name and type you choose with the macros (basically it is a 
> poor man's C++ template system :-). I could remove the macros and 
> specialize the functions for libxml xmlNodePtr, but that seems quite ugly to 
> me.

What about moving it all into a .c file, then adding a new entry point at the 
end that specialises the preceding code for the exact use case in xpath.c?

Thanks for doing this, BTW, the node set sorting performance is a huge problem. 
Not only for very large lists of nodes, but also for multi-step XPath 
expressions, which currently result in multiple sorting and re-sorting steps.

Note that node comparison can be very costly, so another thing to investigate 
would be to add a flag to the node set struct that remembers if it's sorted 
already. That would allow the sort algorithm to skip to the merging step 
directly, instead of traversing the whole node list first.

Stefan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/ [email protected] 
https://mail.gnome.org/mailman/listinfo/xml
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to