Re: [HACKERS] Releasing memory during External sorting?

2005-09-23 Thread Meir Maor
Calculating Optimal memory for disk based sort is based only on minimizing IO.
A previous post stated we can merge as many subfiles as we want in a single pass,
this is not accurate, as we want to eliminate disk seeks also in the merge phase,
also the merging should be done by reading blocks of data from each subfile,
if we have data of size N and M memory, then we will have K=N/M subfiles to merge
after sorting each. 
in the merge operation if we want to merge all blocks in one pass we will read 
M/K data from each subfile into memory and begin merging, we will read another M/K block
when the buffer from a subfile is empty, 
we would like disk seek time to be irrelavant when comparing to sequential IO time.
We notice that we are performing IO in blocks of N/K^2 which is M/(N/M)^2 
let us assume that sequeential IO is done at 100MB/s and that
a random seek requires ~15ms. and we want seek time to be irrelavnt in one order of
magnitute we get, that in the time of one random seek we can read 1.5MB of data
and would get optimal performance if we perform IO in blocks of 15MB.
and since in the merge algorithm showed above we perform IO in blocks of M/K
we would like MK*15MB which results in a very large memory requirement.
M^2N*15MB
Msqrt(N*15MB)
for example for sorting 10GB of data, we would like M380MB
for optimal performance.

alternativly if we can choose a diffrent algorithm in which we merge only a constant
number of sunfiles to gether at a time but then we will require multiple passes to merge
the entire file. we will require log(K) passes over the entire data and this approach obviously
improves with increase of memory.

The first aproach requires 2 passes of the entire data and K^2+K random seeks,
the second aproach(when merging l blocks at a time) requires: log(l,K) passes over the data
and K*l+K random seeks.

On 9/23/05, Simon Riggs [EMAIL PROTECTED] wrote:
I have concerns about whether we are overallocating memory for use inexternal sorts. (All code relating to this is in tuplesort.c)When we begin a sort we allocate (work_mem | maintenance_work_mem) andattempt to do the sort in memory. If the sort set is too big to fit in
memory we then write to disk and begin an external sort. The same memoryallocation is used for both types of sort, AFAICS.The external sort algorithm benefits from some memory but not much.Knuth says that the amount of memory required is very low, with a value
typically less than 1 kB. I/O overheads mean that there is benefit fromhaving longer sequential writes, so the optimum is much larger thanthat. I've not seen any data that indicates that a setting higher than
16 MB adds any value at all to a large external sort. I have someindications from private tests that very high memory settings mayactually hinder performance of the sorts, though I cannot explain thatand wonder whether it is the performance tests themselves that have
issues.Does anyone have any clear data that shows the value of large settingsof work_mem when the data to be sorted is much larger than memory? (I amwell aware of the value of setting work_mem higher for smaller sorts, so
any performance data needs to reflect only very large sorts).If not, I would propose that when we move from qsort to tapesort mode wefree the larger work_mem setting (if one exists) and allocate only a
lower, though still optimal setting for the tapesort. That way thememory can be freed for use by other users or the OS while the tapesortproceeds (which is usually quite a while...).Feedback, please.
Best Regards, Simon Riggs---(end of broadcast)---TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] beginning hackers

2005-08-24 Thread Meir Maor
IMHO (as a wanbe pgsql hacker) it is more important to mark tasks as
suitable for beginners, if they do not require in depth knowledge of
the pgsql codebase, and not
according to how easy they are in other terms. 
for example If a task requires a significant amount of new non trivial
code which has little to
do with existing code and is just plugged in one little place, I would
consider this
task suitable for beginners, as I do not assume beginner pgsql hackers
are incompetent
or even inexperienced programmers they are simply to pgsql.

  Meir

On 8/24/05, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Mon, Aug 22, 2005 at 07:09:10PM -0400, Bruce Momjian wrote:
  Jim C. Nasby wrote:
   Can someone turn these items into a beginning hacker's TODO as has
   been discussed before? Or find a way to mark them on the main TODO?
  
   If someone wants to tell me how this should be done and give me whatever
   files need to be changed I'd be happy to submit a patch.
 
  Sure, submit a diff against doc/TODO and mark them with something like
  %.
 
 Here's my stab at marking items. I picked items that I thought would
 either be well-contained or that would be pretty straightforward. But
 since I'm not very familiar with the code itself a lot of these could be
 way off-base. I tried to err on the side of marking things that might be
 boarderline since presumably it's easier for someone to see a marked
 item and veto it rather than look at the entire list and try and find
 new items. In any case, it wouldn't hurt for someone to make another
 pass after this is applied and look for easy items that I missed.
 
 BTW, while I was doing this it struck me that it might make sense to
 have a difficulty ranking of, say 1-5, instead of just marking beginner
 items. Thoughts?
 --
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Softwarehttp://pervasive.com512-569-9461
 
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 
 
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match