Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
On 7/15/06, Tom Lane [EMAIL PROTECTED] wrote: Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? It happens because (as previously suggested by Tom) the dataset for the 'short' (~10k rows, .3 sec) sort has no rows whose leftmost fields evaluate to 'equal' when passed to the qsort compare function. The 'long' sort, (~30k rows, 78 sec) has plenty of rows whose first 6 columns all evaluate as 'equal' when the rows are compared. For the 'long' data, the compare moves on rightward until it encounters 'flato', which is a TEXT column with an average length of 7.5k characters (with some rows up to 400k). The first 6 columns are mostly INTEGER, so compares on them are relatively inexpensive. All the expensive compares on 'flato' account for the disproportionate difference in sort times, relative to the number of rows in each set. As for the potential for memory leaks - thinking about it. Thanks, Charles Duffy. Peter Eisentraut [EMAIL PROTECTED] writes: The merge sort is here: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc It uses alloca, so we're good here. Uh ... but it also uses malloc, and potentially a honkin' big malloc at that (up to a quarter of physical RAM). So I'm worried again. Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? regards, tom lane Column | Type | Modifiers ---+-+--- record| integer | commr1| integer | envr1 | oid | docin | integer | creat | integer | flati | text| flato | text| doc | text| docst | integer | vlord | integer | vl0 | integer | vl1 | date| vl2 | text| vl3 | text| vl4 | text| vl5 | text| vl6 | text| vl7 | date| vl8 | text| vl9 | integer | ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Charles Duffy [EMAIL PROTECTED] writes: ... For the 'long' data, the compare moves on rightward until it encounters 'flato', which is a TEXT column with an average length of 7.5k characters (with some rows up to 400k). The first 6 columns are mostly INTEGER, so compares on them are relatively inexpensive. All the expensive compares on 'flato' account for the disproportionate difference in sort times, relative to the number of rows in each set. Yeah, and it's not just that it's text either. At those sizes, all the values will be toasted, which means each compare is paying the price of fetching multiple rows from the toast table. And decompressing them too, no doubt. These costs are most likely swamping the actual strcoll() (not that that's not bad enough compared to int4cmp). We could probably tweak the sorting code to forcibly detoast sort keys before beginning the sort, but I'm not entirely convinced that would be a win: reading and writing enormous sort keys won't be cheap either. Meanwhile, for a cheap solution: do you really need to sort on flato at all? Maybe sorting on substr(flato,1,100) would be good enough? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Charles Duffy [EMAIL PROTECTED] writes: On 7/12/06, Qingqing Zhou [EMAIL PROTECTED] wrote: the problem here is that 29247 doesn't look like a big number so I can't see why your patch solved the problem, unless the qsort_comparetup() function of the data type eats too many circles or the cpu is too slow. Well...when exhibiting the problem, execution stays inside qsort() for the entire 'delay period' - I don't think its off in some other recess which is also lacking in interrupt flag checking. As for the CPU - the machine is a 4 way Opteron, and otherwise performs well. It does seem odd that the in-memory sort takes as long as it does - the plan may suggest a reason. Well, there's something awfully fishy here. Compare the two lower-level sorts in your EXPLAIN output: - Sort (cost=2909.44..2944.94 rows=14201 width=320) (actual time=78196.698..78239.799 rows=29247 loops=1) Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6, vl7, vl8, vl9 - Append (cost=0.00..1930.02 rows=14201 width=320) (actual time=0.052..396.577 rows=29247 loops=1) - Sort (cost=403.42..403.59 rows=71 width=320) (actual time=318.727..339.305 rows=10932 loops=1) Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6, vl7, vl8, vl9 - Append (cost=78.88..401.23 rows=71 width=320) (actual time=5.197..192.868 rows=10932 loops=1) The first one took 77843.222 msec to sort 29247 rows, the second took 146.437 msec to sort 10932 rows with what appears to be the same sort key. One of these things is not like the other ... What are the data types of the sort columns? Is there anything much different in the statistics of the two subqueries --- for example, maybe one produces a lot of output that only differs in the rightmost sort columns, where the other has keys that always differ in a leading column? I forget if you are running 8.1 or not, but if you are, turning on trace_sort might produce useful information. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Charles Duffy [EMAIL PROTECTED] writes: [ CHECK_FOR_INTERRUPTS inside qsort comparison routine ] It occurs to me that there's a nonzero risk of problems here, because if the interrupt occurs qsort() will lose control. I'm wondering whether there are any implementations of qsort() that allocate memory and then release it before returning. If so, interrupting the sort would lead to memory leaked permanently (till backend exit anyway). We might have to just tolerate this, but if it occurs on a lot of platforms I'd have second thoughts about applying the patch. Anyone familiar with the internals of glibc's qsort, in particular? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Tom Lane wrote: We might have to just tolerate this, but if it occurs on a lot of platforms I'd have second thoughts about applying the patch. Anyone familiar with the internals of glibc's qsort, in particular? Doesn't look like it's allocating any nonlocal memory: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12content-type=text/x-cvsweb-markupcvsroot=glibc -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane wrote: We might have to just tolerate this, but if it occurs on a lot of platforms I'd have second thoughts about applying the patch. Anyone familiar with the internals of glibc's qsort, in particular? Doesn't look like it's allocating any nonlocal memory: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12content-type=text/x-cvsweb-markupcvsroot=glibc But this file defines _quicksort() not qsort(). I was under the impression that the latter is actually a mergesort in glibc ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Tom Lane wrote: Doesn't look like it's allocating any nonlocal memory: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1. 12content-type=text/x-cvsweb-markupcvsroot=glibc But this file defines _quicksort() not qsort(). I was under the impression that the latter is actually a mergesort in glibc ... The merge sort is here: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc It uses alloca, so we're good here. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Peter Eisentraut [EMAIL PROTECTED] writes: The merge sort is here: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc It uses alloca, so we're good here. Uh ... but it also uses malloc, and potentially a honkin' big malloc at that (up to a quarter of physical RAM). So I'm worried again. Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Charles Duffy [EMAIL PROTECTED] writes: Their work_mem setting was rather large (100). We determined that when it received SIGINT, the backend was always inside qsort(), so it wouldn't call ProcessInterrupts() again until it finished this large in-memory sort. Upon entering tuplesort_performsort(), state-memtupcount was 29247. It occurs to me that this kind of thing is something dtrace could help with. It might even be able to do something clever like time between consecutive CHECK_FOR_INTERRUPT calls grouped by the function that postgres spent the most time in between those points. If not that then something like grouped by the first function call in the intervening period is probably pretty straightforward. Of course this is complicated by CHECK_FOR_INTERRUPTS being a macro... perhaps a probe could be added in that macro. In fact I suspect many of the locations we'll need manually added probes will be macros. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly