Re: [HACKERS] bitmap AM design
Le 2 mars 05, à 21:17, Hannu Krosing a écrit : Ühel kenal päeval (teisipäev, 1. märts 2005, 14:54-0500), kirjutas [EMAIL PROTECTED]: Now, it occurs to me that if my document reference table can refer to something other than an indexed primary key, I can save a lot of index processing time in PostgreSQL if I can have a safe analogy to CTID. I guess you could work on making hash indexes better (for concurrent access). You should have a look to this thread http://archives.postgresql.org/pgsql-hackers/2005-02/msg00263.php Take a look at this paper about lock-free parallel hash table http://www.cs.rug.nl/~wim/mechver/hashtable/ 'a safe analogy to CTID' looks remarkably like hash index -- Hannu Krosing [EMAIL PROTECTED] Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] bitmap AM design
Pailloncy Jean-Gerard wrote: You should have a look to this thread http://archives.postgresql.org/pgsql-hackers/2005-02/msg00263.php Take a look at this paper about lock-free parallel hash table http://www.cs.rug.nl/~wim/mechver/hashtable/ Is this relevant? Hash indexes are on-disk data structures, so ISTM lock-free algorithms aren't really applicable. (BTW, is poor concurrency really the biggest issue with hash indexes? If so, there is some low-hanging fruit that I noticed a few years ago, but never got around to fixing: _hash_doinsert() write-locks the hash metapage on every insertion merely to increment a tuple counter. This could be improved by just acquiring the lock with probability 1/k, and incrementing the counter k times -- or some similar statistical approximation. IMHO there are bigger issues with hash indexes, like the lack of WAL safety, the very slow index build times, and their relatively poor performance -- i.e. no better than btree for any workload I've seen. If someone wants to step up to the plate and fix some of that, I'll improve hash index concurrency -- any takers? :-) ) -Neil ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] bitmap AM design
Pailloncy Jean-Gerard wrote: You should have a look to this thread http://archives.postgresql.org/pgsql-hackers/2005-02/msg00263.php Take a look at this paper about lock-free parallel hash table http://www.cs.rug.nl/~wim/mechver/hashtable/ Is this relevant? Hash indexes are on-disk data structures, so ISTM lock-free algorithms aren't really applicable. (BTW, is poor concurrency really the biggest issue with hash indexes? If so, there is some low-hanging fruit that I noticed a few years ago, but never got around to fixing: _hash_doinsert() write-locks the hash metapage on every insertion merely to increment a tuple counter. This could be improved by just acquiring the lock with probability 1/k, and incrementing the counter k times -- or some similar statistical approximation. IMHO there are bigger issues with hash indexes, like the lack of WAL safety, the very slow index build times, and their relatively poor performance -- i.e. no better than btree for any workload I've seen. If someone wants to step up to the plate and fix some of that, I'll improve hash index concurrency -- any takers? :-) ) As always, I'd love to have the time to do this stuff, but as always, I'm not in the position to spend any time on it. It's frustrating. Anyway, IMHO, hash indexes would be dramatically improved if you could specify your own hashing function and declare initial table size. If you could do that, and work on an assumption that the hashing index was for fairly static data, it could handle many needs. As it stands, hash indexes are virtually useless. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] refactoring fork() and EXEC_BACKEND
While going through the usual motions needed to fork a child process of the postmaster, it occurred to me that there's a fair bit of duplicated code involved. There are also #ifdef for various situations (BeOS, LINUX_PROFILE, and EXEC_BACKEND), which makes the code yet more ugly. I think we could make this a lot cleaner. I'd like to define an API like so: This is a lot like what I was planning to work towards with the refactoring of the forkexec code I promised to do for 8.1. Glad to hear you think in the same direction. pid_t fork_process(int proc_type); pid_t fork_backend(Port *port); If the process needs to add a lot of private information to the argv in the case of EXEC_BACKEND, they could invoke a third variant: #ifdef EXEC_BACKEND pid_t forkexec_process(int proc_type, int argc, char **argv); #endif (Or possibly using varargs, if that is cleaner for most call-sites). Hopefully most call sites could just use fork_process(). I was actually thinking of not passing these on the commandline at all, in order to avoid possible quoting issues etc (recall all the problems with the stupid commandline processing on win32). Instead moving it into a struct that is appended to the end of the backend variable file/shared memory. snip So, most call sites would be quite nice: pid_t result = fork_process(PROC_TYPE_FOO); if (result == -1) { /* fork failed, in parent */ } else if (result == 0) { /* in child */ } else { /* in parent, `result' is pid of child */ } You're not going to be able to get the in child there for an execed process, are you? it has to be called somewhere in the new process, and thus it would have to be a function, wouldn't it? I'd also like to move the implementation of fork_process() and friends, as well as internal_forkexec(), into a separate file -- I'd rather not clutter up postmaster.c with it. That was also what I was thinking. Let me know if you want to split the load somewhere :-) //Magnus ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Where to see the patch queue (was Re: [PATCHES] Patch
On Thursday 03 March 2005 19:08, Neil Conway wrote: Thomas F.O'Connell wrote: committers, myself included, deserve some blame for not making more rapid progress on the queue of unapplied patches for 8.1. In the meanwhile, the queue should be easier for folks to find (why is the pgpatches queue the only one linked from postgresql.org, but it is almost empty?) Wow, I hadn't realized just how big that queue had gotten. I've got no qualms adding another link to the website, but can we get a scheme wrt to links for the various patche queues? Is pgpatches for the current stable branch and pgpatches2 for the current development branch? (Well, that can't be true since before 8.0 release they were both for unstable branches...) Incidentally, I think DEV_FAQ 1.4 should be modified to include mentioning the patch queues. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SQL99 Hierarchical queries
Hello hackers, Thanks very much for doing this work. Is there some way you can summarize what you did so others can join you in working on it? If it is easier for you to write this in some language other than English, please do, and we'll find translators :) I noticed that the patch touches the parser, the optimizer and the executor. What does it to with each? What did I miss? This is some info about implementations. Parser. WITH aliased queries stored as list of SUBQUERY nodes. Each of them is parsed as usually. In pstate-withClause already analyzed part of WITH queries list is stored. When next WITH subquery is analyzed, it's been added to subqueries list inside withClause node, so any WITH subquery can see al prevoius WITH subqueries. In FROM clause all WITH aliases represented by special type of RangTblEntry - RTE_WITH_SUBQUERY. It stores a reference to WithClause node and index of itself in QITH subqueries list. For analyzing var added step to search in WITH aliases if they are present. Recursiveness support. Before WITH subquery analyzing in pstate-withClause-calias stored it's alias. So when transforming FROM clause item and relation name found only in pstate-withClause-calias query marked as recursive. SQL99 recursive queries are made using UNION and first UNION subquery should be non recursive (this isn't currently checked). Thus when transforming set operation statement, after analyzing of first statement it's is RTE stored in pstate-withClause-cRTE, and all checks for vars in this WITH subquery in made against this RTE. Optimizer. WithClause node transformed to With node, scan of RTEs of type RTE_WITH_SUBQUERY to WithScan nodes. Each WITH subquery is enveloped into WithSubPlan node. It stores result and working tuplestores, and some flags (used for prevent double initialization and execution) for each subquery. Nothing extraordinary is done here. Executor. When executor tries to fetch first tuple from any WithScan node, this node check whether With node have been executed, if no then it executes it and then fetches all it's tuples from it's result tuplestore. With node being executed is simply call ExecNode on each subplan in it's list and storing tuples in result tuplestore. Recursiveness support. It's all done in Append node. If it marked as recusrive, it changes a little it's behaviour. Tuples fetched from subplan are stored in workin table. When Append reaches the end of list of its subqueries it call nodeWithSwitchTables. This function for query being executed will append result table to final table, move working table to result table, and clean working table. After this Append begins next loop of subqueries execution, starting from 2nd subquery. Thus first Append subquery is executed only once. Execution ends when no one tuple fetched from all subqueries. This approach allows WithScan nodes to fetch data fetched by Append in previous loop. Regards, Evgen. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Solving hash table overrun problems
Aaron Birkland [EMAIL PROTECTED] writes: This also brings up a line of thought I had a while ago on a related topic. Something like a HashDistinct might be useful, if it had no startup cost. It would basically be a plan node in the executor that would dynamically build a hashtable so that it can pull rows from its child node (discarding if they appear in the hashtable) until it can pass on a novel row. I have some reservations about it, though. We already have that: the planner will use a HashAgg node in this fashion in some contexts (I think just as one of the ways to do IN, at the moment). It's not yet bright enough to consider doing it for SELECT DISTINCT. The DISTINCT planning code is old and crufty and pretty tightly interwired with ORDER BY ... it needs work. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] bitmap AM design
Neil Conway [EMAIL PROTECTED] writes: (BTW, is poor concurrency really the biggest issue with hash indexes? If so, there is some low-hanging fruit that I noticed a few years ago, but never got around to fixing: _hash_doinsert() write-locks the hash metapage on every insertion merely to increment a tuple counter. Given the short amount of time that lock is held, this wouldn't win anything worth noticing. Also, it's not merely to increment a counter --- the counter drives decisions about whether to split buckets, so any decrease in accuracy would lead directly to losses in overall performance. The lack of WAL support is a much bigger issue. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] db cluster ?
Hello hackers, i'm wondering if is possible to somehow spread pretty big db (aprox 50G) over few boxes to get more speed ? if anyone did that i'd be glad to have some directions in right way, thanks and best regard, Bostjan
Re: [HACKERS] bitmap AM design
[EMAIL PROTECTED] writes: Anyway, IMHO, hash indexes would be dramatically improved if you could specify your own hashing function That's called a custom operator class. and declare initial table size. It would be interesting to see if setting up the hashtable with about the right number of buckets initially would make CREATE INDEX enough faster to be a win ... but that doesn't mean I want to make the user deal with it. We could probably hack hashbuild() to estimate the size of the parent table using the same code that the planner is now using (ie, actual size in pages times a possibly-dead-reckoning rows per page estimate). regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Solving hash table overrun problems
On Thu, Mar 03, 2005 at 17:05:40 -0500, Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H div K) mod N). If K is way low this could be very slow. Is there a way to do something similar changing the hash function to H div KN? If you went down this road you would probably want to use distinct primes for each new N. * Now begin scanning the outer join input. Tuples of batch number zero (according to the current calculation) can be matched to the current hashtable contents. Tuples of higher batch numbers are dropped into holding files for the outer input, one per batch. For new keys at this step do you know their final disposition so that making new hash entries won't be necessary? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Solving hash table overrun problems
Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H div K) mod N). If K is way low this could be very slow. How so? You're not concerned about the time to do the division itself are you? * Now begin scanning the outer join input. Tuples of batch number zero (according to the current calculation) can be matched to the current hashtable contents. Tuples of higher batch numbers are dropped into holding files for the outer input, one per batch. For new keys at this step do you know their final disposition so that making new hash entries won't be necessary? Well, we probably have a pretty fair idea of N at this point, so it'll usually be right --- but we reserve the right to increase N again later in case we have to do so because one of the later inner batches is much bigger than the zero batch we are currently processing. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Solving hash table overrun problems
On Fri, Mar 04, 2005 at 10:42:08 -0500, Tom Lane [EMAIL PROTECTED] wrote: Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H div K) mod N). If K is way low this could be very slow. How so? You're not concerned about the time to do the division itself are you? No, rather having lots of entries in the same hash buckets. I was thinking about recent discussions were there was a large number of rows with almost all of the keys having just a few values, but there are a lot of unique keys, but analyze doesn't see enough of the unique ones to make a good estimate for K. * Now begin scanning the outer join input. Tuples of batch number zero (according to the current calculation) can be matched to the current hashtable contents. Tuples of higher batch numbers are dropped into holding files for the outer input, one per batch. For new keys at this step do you know their final disposition so that making new hash entries won't be necessary? Well, we probably have a pretty fair idea of N at this point, so it'll usually be right --- but we reserve the right to increase N again later in case we have to do so because one of the later inner batches is much bigger than the zero batch we are currently processing. I just noticed that it wasn't mentioned that an overflow could occur at this step. I didn't think it would be hard to do one if needed, but was wondering if knowing a key couldn't match (because it was in the current batch 0 and didn't match and existing key in that batch) was enough to emit or discard the row. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Solving hash table overrun problems
Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Bruno Wolff III [EMAIL PROTECTED] writes: If K is way low this could be very slow. How so? You're not concerned about the time to do the division itself are you? No, rather having lots of entries in the same hash buckets. That won't happen because we are going to set K with an eye to the maximum number of rows we intend to hold in memory (given work_mem). With the addition of the dynamic batch splitting logic, that number of rows is actually reasonably accurate. The only way this scheme can really lose badly is if there are large numbers of tuples with exactly the same hash code, so that no matter how much we increase N we can't split up the bucketload. This is a risk for *any* hashing scheme, however. In practice we have to rely on the planner to not choose hashing when there are only a few distinct values for the key. I just noticed that it wasn't mentioned that an overflow could occur at this step. It can't, because we aren't loading the outer tuples into the hash table. We are just considering them one at a time and probing for matches. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] I am in Copenhagen
I am in Copenhagen and am speaking tomorrow and will return on Sunday. When I return I will get back to the patches queue. I have been delayed working on my own patches. :-) -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] bitmap AM design
[EMAIL PROTECTED] writes: Anyway, IMHO, hash indexes would be dramatically improved if you could specify your own hashing function That's called a custom operator class. Would I also be able to query the bucket size and all that? and declare initial table size. It would be interesting to see if setting up the hashtable with about the right number of buckets initially would make CREATE INDEX enough faster to be a win ... but that doesn't mean I want to make the user deal with it. We could probably hack hashbuild() to estimate the size of the parent table using the same code that the planner is now using (ie, actual size in pages times a possibly-dead-reckoning rows per page estimate). I know a linear hash is different than a classic simple hash table, but a classic simple hash table has some great advantages at the expense of disk space. IMHO being able to use the hash index in a way that is more of the classic theoretical hash table and use the linear behavor if the table grows beyond initial estimates I think would be a big win. It could actually get to a 1:1 operation data retrieval on properly estimated tables. Estimations are a great idea, something like first prime after 2*NROWS (with a GUC limit, I guess) would probably make hash indexes the fastest most disk hogging index around. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] buildfarm issues
Now that we've been running for a while there are a few buildfarm issues that I need to address. First, do we keep the right data about the members? Essentially, we keep: operating_system, os_version, compiler, compiler_version, architecture. For Linux, we genarlly ask for the Distribution/distro-version instead of the OS/os-version. However, that lead to interesting situations - Gentoo for example is so flexible that in version 2004.03 you might easily be using kernel version 2.4.x or 2.6.x ... in fact it's almost impossible to tell what might be installed on a Gentoo system, or how it was compiled. So I'm really not sure how we should treat such systems. Second is the fact that systems change over time. People upgrade their machines. I'm considering a facility to allow people to change the os-version,compiler-version aspects of their registered personality - these will become essentially timestamped pieces of information, so we'll still be able to tie a set of values to a history item. Third, what can be done to improve the usefulness of the server / web site? I already have the following items on my list of TODOs: . allow sorting of member list by personality feature (e.g. see all the OS/X boxes together) . filter dashboard page to only show certain members (this already works but you have to construct the list manually, e.g. http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=catmember=dogmember=lorismember=gibbon ) . an email alerting system for when a system changes state Fourth, there are some significant gaps in the buildfarm coverage. I am currently personally supplying the Windows and Cygwin members, but since that is on my laptop which is frequently required for other work it's not ideal, and there can be large gaps. Other gaps include HPUX, for example. If you run a machine that should be covered and can help, please do. After the initial setup there should be very little work involved. Finally, a volunteer or two to help me with both running and administering this would be great. Knowledhge of perl and Template Toolkit a decided advantage. Buildfarm was created to assist the principal hackers - so any and all (constructive) input is welcome. cheers andrew ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] buildfarm issues
On Friday 04 March 2005 10:11, Andrew Dunstan wrote: Now that we've been running for a while there are a few buildfarm issues that I need to address. First, do we keep the right data about the members? Essentially, we keep: operating_system, os_version, compiler, compiler_version, architecture. For Linux, we genarlly ask for the Distribution/distro-version instead of the OS/os-version. However, that lead to interesting situations - Gentoo for example is so flexible that in version 2004.03 you might easily be using kernel version 2.4.x or 2.6.x ... in fact it's almost impossible to tell what might be installed on a Gentoo system, or how it was compiled. So I'm really not sure how we should treat such systems. Second is the fact that systems change over time. People upgrade their machines. I'm considering a facility to allow people to change the os-version,compiler-version aspects of their registered personality - these will become essentially timestamped pieces of information, so we'll still be able to tie a set of values to a history item. What about using uname(1), cc -v, etc to glean this information and post it with each event logged? I belive you have all this stuff already in the config.log that is used already ? Third, what can be done to improve the usefulness of the server / web site? I already have the following items on my list of TODOs: . allow sorting of member list by personality feature (e.g. see all the OS/X boxes together) . filter dashboard page to only show certain members (this already works but you have to construct the list manually, e.g. http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=catmember=dogmem ber=lorismember=gibbon ) . an email alerting system for when a system changes state Fourth, there are some significant gaps in the buildfarm coverage. I am currently personally supplying the Windows and Cygwin members, but since that is on my laptop which is frequently required for other work it's not ideal, and there can be large gaps. Other gaps include HPUX, for example. If you run a machine that should be covered and can help, please do. After the initial setup there should be very little work involved. Finally, a volunteer or two to help me with both running and administering this would be great. Knowledhge of perl and Template Toolkit a decided advantage. Buildfarm was created to assist the principal hackers - so any and all (constructive) input is welcome. cheers andrew ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] buildfarm issues
Darcy Buskermolen wrote: On Friday 04 March 2005 10:11, Andrew Dunstan wrote: Now that we've been running for a while there are a few buildfarm issues that I need to address. First, do we keep the right data about the members? Essentially, we keep: operating_system, os_version, compiler, compiler_version, architecture. For Linux, we genarlly ask for the Distribution/distro-version instead of the OS/os-version. However, that lead to interesting situations - Gentoo for example is so flexible that in version 2004.03 you might easily be using kernel version 2.4.x or 2.6.x ... in fact it's almost impossible to tell what might be installed on a Gentoo system, or how it was compiled. So I'm really not sure how we should treat such systems. Second is the fact that systems change over time. People upgrade their machines. I'm considering a facility to allow people to change the os-version,compiler-version aspects of their registered personality - these will become essentially timestamped pieces of information, so we'll still be able to tie a set of values to a history item. What about using uname(1), cc -v, etc to glean this information and post it with each event logged? I belive you have all this stuff already in the config.log that is used already ? See previous para - on Linux we want the distro name and version, not Linux plus kernel version. uname doesn't seem to help much there. Also, I have no idea how portable cc -v is. Can we guarantee to have the compiler version properly identified on every platform? cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.0.X and the ARC patent
On Wed, 2005-03-02 at 20:55 -0500, Tom Lane wrote: Michael Adler [EMAIL PROTECTED] writes: Looking at the Response Time Charts 8.0.1/ARC http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/rt.html 20050301 with 2Q patch http://www.osdl.org/projects/dbt2dev/results/dev4-010/313/rt.html It seems like the average response time has gone down, but the worse case ceiling has raised about 35%. The worst cases are associated with checkpoints. I'm not sure why a checkpoint would have a greater effect on the 2Q system than an ARC system --- checkpoint doesn't request any new buffers so you'd think it'd be independent. Maybe this says that the bgwriter is less effective with 2Q, so that there are more dirty buffers remaining to be written at the checkpoint? But why? The pattern seems familiar. Reduced average response time increases total throughput, which on this test means we have more dirty buffers to write at checkpoint time. I would not neccessarily suspect 2Q over ARC, at least initially. The pattern of behaviour is similar across ARC, 2Q and Clock, though the checkpoint points differ in intensity. The latter makes me suspect BufMgrLock contention or similar. There is a two-level effect at Checkpoint time...first we have the write from PostgreSQL buffers to OS cache, then we have the write from OS cache to disk by the pdflush daemon. At this point, I'm not certain whether the delay is caused by the checkpointing or the pdflush daemons. Mark and I had discussed some investigations around that. This behaviour is new in the 2.6 kernel, so it is possible there is an unpleasant interaction there, though I do not wish to cast random blame. Checkpoint doesn't request new buffers, but it does require the BufMgrLock in order to write all of the dirty buffers. It could be that the I/Os map direct to OS cache, so that the tight loop to write out dirty buffers causes such an extreme backlog for the BufMgrLock that it takes more than a minute to clear and return to normal contention. It could be that at checkpoint time, the number of writes exceeds the dirty_ratio and the kernel forces the checkpoint process to bypass the cache and pdflush daemons altogether, and performing the I/O itself. Single-threaded, this would display the scalability profile we see. Some kernel level questions in there... There is no documented event-state model for LWlock acquisition, so it might be possible that there is a complex bottleneck in amongst them. Amdahl's Law tells me that looking at the checkpoints is the next best action for tuning, since they add considerably to the average response time. Looking at the oprofile for the run as a whole is missing out the delayed transaction behaviour that occurs during checkpoints. I would like to try and catch an oprofile of the system while performing a checkpoint, as a way to give us some clues. Perhaps that could be achieved by forcing a manual checkpoint as superuser, and making that interaction cause a switch to a new oprofile output file. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.0.X and the ARC patent
Simon Riggs [EMAIL PROTECTED] writes: Checkpoint doesn't request new buffers, but it does require the BufMgrLock in order to write all of the dirty buffers. There is no BufMgrLock anymore in the clock-algorithm code, so I'm not sure how much of this analysis is still relevant. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] db cluster ?
Hello hackers, i'm wondering if is possible to somehow spread pretty big db (aprox 50G) over few boxes to get more speed ? if anyone did that i'd be glad to have some directions in right way, I have done different elements of clusering with PostgreSQL on a per task basis, but not a fully comprehensive generic distributed cluster. There are a couple tools you can use if your are an engineering sort of fellow. Sloney is a replication cluster, all the data is on all the machines. There is a project that shows promise as a distributed data system: contrib/dblink. One could segment their database as a number of logical data managers and use dblink to incorporate the data on one database into the queries on another. It won't be transparent, but could be fairly managable if you use views to implement the links. I guess the real question is what performance do you need to improve? If it is just read performance, then sloney is probably your best bet. Put a number of redundant machines behind a load balancer and you are all set. If you need to increase write performance, well, that can be problematic. What is it that your want to accomplish? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] buildfarm issues
Andrew, A couple of things, 1. we need to develop a matrix of systems/os/compiler to see what coverage we do have and compare it to the INSTALL guide. 2. the run_build.pl should be changed to keep the information on the system to date (and have the matrix in 1 change) 3. have the run_build.pl script check the build farm CVS to see if there is a new version of the build farm code and either download and/or alert the owner of the system I am able and willing to help out in all of the above. I am also willing to help fill the gap in systems if needed. I have been using perl for over 10 years now (since perl 4 days) and have been doing web/postgres coding for the last 6 years. Let me know what I can do to help Jim -- Original Message --- From: Andrew Dunstan [EMAIL PROTECTED] To: Darcy Buskermolen [EMAIL PROTECTED] Cc: PostgreSQL-development pgsql-hackers@postgresql.org Sent: Fri, 04 Mar 2005 14:28:09 -0500 Subject: Re: [HACKERS] buildfarm issues Darcy Buskermolen wrote: On Friday 04 March 2005 10:11, Andrew Dunstan wrote: Now that we've been running for a while there are a few buildfarm issues that I need to address. First, do we keep the right data about the members? Essentially, we keep: operating_system, os_version, compiler, compiler_version, architecture. For Linux, we genarlly ask for the Distribution/distro-version instead of the OS/os-version. However, that lead to interesting situations - Gentoo for example is so flexible that in version 2004.03 you might easily be using kernel version 2.4.x or 2.6.x ... in fact it's almost impossible to tell what might be installed on a Gentoo system, or how it was compiled. So I'm really not sure how we should treat such systems. Second is the fact that systems change over time. People upgrade their machines. I'm considering a facility to allow people to change the os-version,compiler-version aspects of their registered personality - these will become essentially timestamped pieces of information, so we'll still be able to tie a set of values to a history item. What about using uname(1), cc -v, etc to glean this information and post it with each event logged? I belive you have all this stuff already in the config.log that is used already ? See previous para - on Linux we want the distro name and version, not Linux plus kernel version. uname doesn't seem to help much there. Also, I have no idea how portable cc -v is. Can we guarantee to have the compiler version properly identified on every platform? cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq --- End of Original Message --- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] buildfarm issues
Andrew Dunstan [EMAIL PROTECTED] writes: Also, I have no idea how portable cc -v is. Not at all. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] hi all
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Qu Tianlian) wrote: Hi all: I have a question. How to add table in slony. I try to add table in already being database that using slony . but it's not realize table's replication. I used postgresql version 7.4.2 and slony version 1.0 Can you help me . Thanks You should take this to the Slony-I mailing list, as that is where people who know hang out. By the way, there was never a release of something slony version 1.0. Versions of Slony-I that have been released include 1.0.1, 1.0.2, and 1.0.5, but there was no 1.0 version. -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/spreadsheets.html The social dynamics of the net are a direct consequence of the fact that nobody has yet developed a Remote Strangulation Protocol. -- Larry Wall ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] bitmap AM design
Some more thoughts and questions. The main idea above bitmaps is narrowing some data sets' possible values to yes and no (i.e. 1 and 0) and then organize the data in the series of bits, where bit's position determines values to consider. In the cases, where several indexed attributes are used in WHERE clause, it's possible to do logical AND/OR on bitmaps before returning any results to the caller. For large tables with high number of low-cardinality attributes using bitmaps can result in certain speed-up. For on-disk bitmaps I'm working on, each value of each indexed attribute has it's own bitmap (i.e. series of bits, with bits set to 1 for rows with corresponding fields having value of that bitmap). Scanning the bitmap, we end up with an array of 1-bits' positions and need to convert those positions to CTIDs, as executor is expecting. So, index should also keep a CTID table, that corresponds to the bitmap's data. This CTID table will be the same for all bitmap indexes, created for one table, thus having 2 bitmap indexes will mean you're wasting some amount of disk space, storing absolutely identical data. So, to save space, we have 2 possibilities: 1) create a new relkind for the CTID table (maybe used not only for on-disk bitmaps); 2) make all create index ... using bitmap statements actually create/extend existing bitmap index. This also implies, that planner/executor should try using multicolumn bitmap index when at least one indexed field is present in the WHERE clause. I'm working on the 2nd case, because 1st one requires more work not only in the access method + executor + planner area. It is also possible to keep things as is and make a note in the documentation, that it is better to have 1 multicolumn bitmap index, then several single column ones, and that planner will still use multicolumn index even if not all columns are involved. Any comments/ideas here? After implementing bitmap index access method, it'll be necessary to teach planner and executor to use multicolumn bitmaps for any number of scan-attributes. Also, I cannot say in what circumstances planner should prefer bitmap scan to seqscan; I thought of cases, when it estimates return set being about 60% of the relation. What community has to say here? Also, as Tom is planning to work on in-memory bitmaps (maybe something is done already, don't know), I thought that it can be possible to cooperate. As I have no clue at the moment how in-memory bitmaps are going to work, is it possible to hear from you some draft of the forthcoming implementation? And what prerequisites would be required to join both bitmaps somehow? Waiting for your thoughts/comments. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.0.X and the ARC patent
Simon Riggs [EMAIL PROTECTED] writes: Amdahl's Law tells me that looking at the checkpoints is the next best action for tuning, since they add considerably to the average response time. Looking at the oprofile for the run as a whole is missing out the delayed transaction behaviour that occurs during checkpoints. Even aside from the effect it has on average response time. I would point out that many applications are governed by the worst case more than the average throughput. For a web server, for example (or any OLTP application in general), it doesn't matter if the database can handle x transactions/s on average. What matters is that 100% of the time the latency is below the actual rate of requests. If every 30m latency suddenly spikes up beyond that, even for only a minute, then it will fall behind in the requests. The user will effectively see a completely unresponsive web server. So I would really urge you to focus your attention on the maximum latency figure. It's at least if not *more* important than the average throughput number. PS That's why I was pushing before for the idea that the server should try to spread the I/O from one checkpoint out over more or less the time interval between checkpoints. If it's been 30m since the last checkpoint then you have about 30m to do the I/O for this checkpoint. (Though I would suggest a safety factor of aiming to be finished within 50% of the time.) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match