Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-28 Thread Charles Duffy

On 7/15/06, Tom Lane [EMAIL PROTECTED] wrote:

Anyway, Qingqing's question still needs to be answered: how can a sort
of under 30k items take so long?



It happens because (as previously suggested by Tom) the dataset for
the 'short' (~10k rows, .3 sec) sort has no rows whose leftmost fields
evaluate to 'equal' when passed to the qsort compare function. The
'long' sort, (~30k rows, 78 sec) has plenty of rows whose first 6
columns all evaluate as 'equal' when the rows are compared.

For the 'long' data, the compare moves on rightward until it
encounters 'flato', which is a TEXT column with an average length of
7.5k characters (with some rows up to 400k). The first 6 columns are
mostly INTEGER, so compares on them are relatively inexpensive. All
the expensive compares on 'flato' account for the disproportionate
difference in sort times, relative to the number of rows in each set.

As for the potential for memory leaks - thinking about it.

Thanks,

Charles Duffy.


Peter Eisentraut [EMAIL PROTECTED] writes:
 The merge sort is here:

 
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc

 It uses alloca, so we're good here.

Uh ... but it also uses malloc, and potentially a honkin' big malloc at
that (up to a quarter of physical RAM).  So I'm worried again.

Anyway, Qingqing's question still needs to be answered: how can a sort
of under 30k items take so long?

regards, tom lane

  Column   |  Type   | Modifiers
---+-+---
 record| integer |
 commr1| integer |
 envr1 | oid |
 docin | integer |
 creat | integer |
 flati | text|
 flato | text|
 doc   | text|
 docst | integer |
 vlord | integer |
 vl0   | integer |
 vl1   | date|
 vl2   | text|
 vl3   | text|
 vl4   | text|
 vl5   | text|
 vl6   | text|
 vl7   | date|
 vl8   | text|
 vl9   | integer |

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-28 Thread Tom Lane
Charles Duffy [EMAIL PROTECTED] writes:
 ... For the 'long' data, the compare moves on rightward until it
 encounters 'flato', which is a TEXT column with an average length of
 7.5k characters (with some rows up to 400k). The first 6 columns are
 mostly INTEGER, so compares on them are relatively inexpensive. All
 the expensive compares on 'flato' account for the disproportionate
 difference in sort times, relative to the number of rows in each set.

Yeah, and it's not just that it's text either.  At those sizes, all
the values will be toasted, which means each compare is paying the
price of fetching multiple rows from the toast table.  And decompressing
them too, no doubt.  These costs are most likely swamping the actual
strcoll() (not that that's not bad enough compared to int4cmp).

We could probably tweak the sorting code to forcibly detoast sort keys
before beginning the sort, but I'm not entirely convinced that would be
a win: reading and writing enormous sort keys won't be cheap either.

Meanwhile, for a cheap solution: do you really need to sort on flato
at all?  Maybe sorting on substr(flato,1,100) would be good enough?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Tom Lane
Charles Duffy [EMAIL PROTECTED] writes:
 On 7/12/06, Qingqing Zhou [EMAIL PROTECTED] wrote:
 the problem here is that 29247 doesn't look like a big number so I can't see
 why your patch solved the problem, unless the qsort_comparetup() function of
 the data type eats too many circles or the cpu is too slow.

 Well...when exhibiting the problem, execution stays inside qsort() for
 the entire 'delay period' - I don't think its off in some other recess
 which is also lacking in interrupt flag checking.
 As for the CPU - the machine is a 4 way Opteron, and otherwise
 performs well. It does seem odd that the in-memory sort takes as long
 as it does - the plan may suggest a reason.

Well, there's something awfully fishy here.  Compare the two lower-level
sorts in your EXPLAIN output:

  -  Sort  (cost=2909.44..2944.94 rows=14201 width=320) (actual 
time=78196.698..78239.799 rows=29247 loops=1)
Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, 
docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6, vl7, vl8, vl9
-  Append  (cost=0.00..1930.02 rows=14201 width=320) (actual 
time=0.052..396.577 rows=29247 loops=1)

  -  Sort  (cost=403.42..403.59 rows=71 width=320) (actual 
time=318.727..339.305 rows=10932 loops=1)
Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, 
docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6, vl7, vl8, vl9
-  Append  (cost=78.88..401.23 rows=71 width=320) (actual 
time=5.197..192.868 rows=10932 loops=1)

The first one took 77843.222 msec to sort 29247 rows, the second took
146.437 msec to sort 10932 rows with what appears to be the same sort
key.  One of these things is not like the other ...

What are the data types of the sort columns?  Is there anything much
different in the statistics of the two subqueries --- for example,
maybe one produces a lot of output that only differs in the rightmost
sort columns, where the other has keys that always differ in a leading
column?

I forget if you are running 8.1 or not, but if you are, turning on
trace_sort might produce useful information.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Tom Lane
Charles Duffy [EMAIL PROTECTED] writes:
 [ CHECK_FOR_INTERRUPTS inside qsort comparison routine ]

It occurs to me that there's a nonzero risk of problems here, because if
the interrupt occurs qsort() will lose control.  I'm wondering whether
there are any implementations of qsort() that allocate memory and then
release it before returning.  If so, interrupting the sort would lead to
memory leaked permanently (till backend exit anyway).

We might have to just tolerate this, but if it occurs on a lot of
platforms I'd have second thoughts about applying the patch.  Anyone
familiar with the internals of glibc's qsort, in particular?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Peter Eisentraut
Tom Lane wrote:
 We might have to just tolerate this, but if it occurs on a lot of
 platforms I'd have second thoughts about applying the patch.  Anyone
 familiar with the internals of glibc's qsort, in particular?

Doesn't look like it's allocating any nonlocal memory:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12content-type=text/x-cvsweb-markupcvsroot=glibc

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 We might have to just tolerate this, but if it occurs on a lot of
 platforms I'd have second thoughts about applying the patch.  Anyone
 familiar with the internals of glibc's qsort, in particular?

 Doesn't look like it's allocating any nonlocal memory:

 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12content-type=text/x-cvsweb-markupcvsroot=glibc

But this file defines _quicksort() not qsort().  I was under the
impression that the latter is actually a mergesort in glibc ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Peter Eisentraut
Tom Lane wrote:
  Doesn't look like it's allocating any nonlocal memory:
 
  http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.
 12content-type=text/x-cvsweb-markupcvsroot=glibc

 But this file defines _quicksort() not qsort().  I was under the
 impression that the latter is actually a mergesort in glibc ...

The merge sort is here:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc

It uses alloca, so we're good here.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-14 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 The merge sort is here:

 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21content-type=text/x-cvsweb-markupcvsroot=glibc

 It uses alloca, so we're good here.

Uh ... but it also uses malloc, and potentially a honkin' big malloc at
that (up to a quarter of physical RAM).  So I'm worried again.

Anyway, Qingqing's question still needs to be answered: how can a sort
of under 30k items take so long?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

2006-07-11 Thread Greg Stark

Charles Duffy [EMAIL PROTECTED] writes:

 Their work_mem setting was rather large (100). We determined that when it
 received SIGINT, the backend was always inside qsort(), so it wouldn't
 call ProcessInterrupts() again until it finished this large in-memory
 sort. Upon entering tuplesort_performsort(), state-memtupcount was
 29247.

It occurs to me that this kind of thing is something dtrace could help with.
It might even be able to do something clever like time between consecutive
CHECK_FOR_INTERRUPT calls grouped by the function that postgres spent the most
time in between those points. If not that then something like grouped by the
first function call in the intervening period is probably pretty
straightforward.

Of course this is complicated by CHECK_FOR_INTERRUPTS being a macro... perhaps
a probe could be added in that macro. In fact I suspect many of the locations
we'll need manually added probes will be macros.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly