[HACKERS] Group by count() and indexes
Consider the following query on a large table with lots of different `id's: SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10; It has an (usually unique) index on id. Obviously, the index helps to evaluate count(id) for a given value of id, but count()s for all the `id's should be evaluated, so sort() will take most of the time. Is there a way to improve performance of this query? If not, please give some indication to do a workaround on the source itself, so perhaps I may be able to work out a patch. Thanks in advance. Anuradha -- Debian GNU/Linux (kernel 2.4.21-pre4) It is contrary to reasoning to say that there is a vacuum or space in which there is absolutely nothing. -- Descartes ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Group by, count, order by and limit
My 3rd attempt to post ... Consider this query on a large table with lots of different IDs: SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10; It has an index on id. Obviously, the index helps to evaluate count(id) for a given value of id, but count()s for all the `id's should be evaluated, so sort() will take most of the time. Is there a way to improve performance of this query? If not, please give some indication to do a workaround on the source itself, so perhaps I may be able to come out with a patch. Thanks in advance. Anuradha -- Debian GNU/Linux (kernel 2.4.21-pre4) There are three ways to get something done: (1) Do it yourself. (2) Hire someone to do it for you. (3) Forbid your kids to do it. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Postgresql and multithreading
On Fri, Oct 18, 2002 at 10:28:38AM -0400, Tom Lane wrote: Greg Copeland [EMAIL PROTECTED] writes: On Thu, 2002-10-17 at 22:20, Tom Lane wrote: Simple: respond to 'em all with a one-line answer: convince us why we should use it. The burden of proof always seems to fall on the wrong end in these discussions. ... Now, it seems, that people don't want to answer questions at all as it's bothering the developers. Not at all. But rehashing issues that have been talked out repeatedly is starting to bug some of us ;-). Perhaps the correct standard answer is more like this has been discussed before, please read the list archives. Let me explain my posting which started this `thread': - The developer's FAQ section 1.9 explains why PostgreSQL doesn't use threads (and many times it has been discussed on the list). - The TODO list has an item `Experiment with multi-threaded backend' and points to a mailing list discussion about the implementation by Myron Scott. His final comment is that he didn't `gain much performance' and `ended up with some pretty unmanagable code'. He also says that he wouldn't `personally try this again ... but there probably was a better way'. - I was going through the TODO list, and was wondering if I should try on this. But before doing that, naturally, I wanted to figure out if any of the core developers themselves have any plans of doing it. Now, I am trying hard to figure out why this `are you going to do this? otherwise I can try it', type posting was not differentiated from numerous `why don't YOU implement this feature' type postings ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Life is too important to take seriously. -- Corky Siegel ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Postgresql and multithreading
On Wed, Oct 16, 2002 at 02:08:21PM -0400, Curtis Faith wrote: 2) Including the pros and cons of the feature/implementation and how close the group is to deciding whether something would be worth doing. - I can also do this. The pros and cons of many such features have been discussed over the lists as well as on the FAQs. But the second matter, the group's likehood their implementation cannot always be deduced from those communications or from docs. Therefore suggested material into the FAQs are going to be extremely useful to like-to-be developers. They also would hopefully reduce unnecessary traffic on the list. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) I have found little that is good about human beings. In my experience most of them are trash. -- Sigmund Freud ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Postgresql and multithreading
On Wed, Oct 16, 2002 at 12:59:57AM -0400, Bruce Momjian wrote: Anuradha Ratnaweera wrote: Is there any plans to make postgresql multithreading? We don't think it is needed, except perhaps for Win32 and Solaris, which have slow process creation times. Thanks, Bruce. But what I want to know is whether multithreading is likely to get into in postgresql, say somewhere in 8.x, or even in 9.x? (as they did with Apache). Are there any plans to do so, or is postgres going to remain rather a multi-process application? Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) One nice thing about egotists: they don't talk about other people. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Postgresql and multithreading
On Wed, Oct 16, 2002 at 01:25:23AM -0400, Bruce Momjian wrote: Anuradha Ratnaweera wrote: ... what I want to know is whether multithreading is likely to get into in postgresql, say somewhere in 8.x, or even in 9.x? It may be optional some day, most likely for Win32 at first, but we see little value to it on most other platforms; of course, we may be wrong. In that case, I wonder if it is worth folking a new project to add threading support to the backend? Of course, keeping in sync with the original would be lot of work. In that way, one should be able to test the hypothesis (whether threads improve things, or the other way round - if one likes it it that way :)) without messing around with stable postgres code, as they did and do with postgresql-R. And a minor question is wheter it is legal to keep the _changes_ in such a project GPL? I am also not sure if it is a big win on Apache either; I think the jury is still out on that one, hence the slow adoption of 2.X, As far as we are concened, it is the stability, rather than speed which still keeps us in 1.3. and we don't want to add threads and make a mess of the code or slow it down, which does often happen. Fully agreed. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Equality is not when a female Einstein gets promoted to assistant professor; equality is when a female schlemiel moves ahead as fast as a male schlemiel. -- Ewald Nyquist ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Peer to peer replication of Postgresql databases
On Fri, Oct 11, 2002 at 08:30:55AM -0500, Greg Copeland wrote: I'd be curious to hear in a little more detail what constitutes not good for postgres on a mosix cluster. It seems that almost all the postgres processes remain in the `home' node. Please notice that I am not underestimating Mosix in any way. We have tested many programs from our parallel processing project with extreme success on our mosix cluster. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Ginger snap. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Peer to peer replication of Postgresql databases
On Fri, Oct 11, 2002 at 07:10:26PM +0530, Shridhar Daithankar wrote: On 11 Oct 2002 at 8:30, Greg Copeland wrote: I'd be curious to hear in a little more detail what constitutes not good for postgres on a mosix cluster. Well, I guess in kind of replication we are talking here, the performance will be enhanced only if separate instances of psotgresql runs on separate machine. Now if mosix kernel applies some AI and puts all of them on same machine, it isn't going to be any good for the purpose replication is deployed. Exactly. First, since we know what is going on, it is not necessary for the OS to decide what's going on. Secondly, database replication is not looked after at all, unless we do some crude tricks on the filesystem. Still it won't be efficient. I guess that's what she meant.. ^^^ Correction: that's what _HE_ meant... ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) All other things being equal, a bald man cannot be elected President of the United States. -- Vic Gold ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Peer to peer replication of Postgresql databases
On Fri, Oct 11, 2002 at 12:07:00PM -0400, Neil Conway wrote: [ pgsql-patches removed from Cc: list ] Anuradha Ratnaweera [EMAIL PROTECTED] writes: I am trying to add some replication features to postgres (yes, I have already looked at ongoing work), in a peer to peer manner. Did you look at the research behind Postgres-R, and the pgreplication stuff? Am looking at the research papers related to it now. - When a frontend process sends a read query, each backend process does that from its own data area. Surely that's not correct -- a SELECT can be handled by *any one* node, not each and every one, right? Yes. Sorry about my careless wording. Unless anything is kind of locked, each node has a copy of the database, so each one can handle SELECTs individually. The actual situation will be far from this simple, because there will be database writes going on and generating consistent SELECTs would need careful handling of concurency issues. - There are two types of write queries. Postmasters use seperate communication channels for each. One is the sequencial channel which carries writes whose order is important, and the non-sequencial channel carries write queries whose order is not important. How do you distinguish between these? Nope. We assume that all the communication should go through the sequencial channel unless indicated by the client. In that case, we will have to find a way to indicate this from the client's side. This doesn't sound very elegant, may be we can figure out a better way. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Being against torture ought to be sort of a bipartisan thing. -- Karl Lehenbauer ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Peer to peer replication of Postgresql databases
On Fri, Oct 11, 2002 at 04:04:29PM +0530, Shridhar Daithankar wrote: On 11 Oct 2002 at 16:29, Anuradha Ratnaweera wrote: On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote: I will look at it, too. Thanks for the link. In some cases, starting anew is faster than learning unmaintained existing code. Ok. Checked out what usogres is. It is not what I want. I don't want a static `main database'. It should simply a cluster of them - just like a set of Raid-0 disks, may be with a tempory controller for some tasks. Also, as a matter of fact, usogres is not unmaintained code. While that's true, usogres code is just few files. I wouldn't take more than half an hour to read up the things. And besides it contain postgresql protocol implementation necessary which would take some time to test and debug, Great. I will look into this over the weekend. And it's in C++. I like that..;-) And I DON'T like that ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) QOTD: I ain't broke, but I'm badly bent. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Peer to peer replication of Postgresql databases
On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote: Well, I don't think adding support for multiple slaves to usogres would be that problematic. Of course if you want to load balance your application queries, application has to be aware of that. I will not do sending requests to a mosix cluster anyway. Have already tested postgres on a mosix cluster, and as expected results are not good. (although mosix does the correct thing in keeping all the database backend processes on one node). Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Remember: Silly is a state of Mind, Stupid is a way of Life. -- Dave Butler ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Peer to peer replication of Postgresql databases
Hi all, I am trying to add some replication features to postgres (yes, I have already looked at ongoing work), in a peer to peer manner. The goal is to achive `nearly complete fault tolerence' by replicating data. The basic framework I have in mind is somewhat like this. - Postmasters are running on different computers on a networked cluster. Their data areas are identical at the beginning and recide on local storage devices. - Each postmaster is aware that they are a part of a cluster and they can communicate with each other, send multicast requests and look for each other's presence (like heartbeat in linux-ha project). - When a frontend process sends a read query, each backend process does that from its own data area. - There are two types of write queries. Postmasters use seperate communication channels for each. One is the sequencial channel which carries writes whose order is important, and the non-sequencial channel carries write queries whose order is not important. - When a frontend process sends non-sequencial write query to a backend, it is directly written to the local data area and a multicast is sent (preferably asynchronously) to the other postmasters who will also update their respective local areas. May be we can simply duplicate what goes to WAL into a TCP/IP socket (with some header info, of course). - When a sequencial-write query is requested, the corresponding postmaster informs a main-postmaster (more about in the next point), waits for his acknowledgement, and proceeds the same way as the non-sequencial write. - Each postmaster is assigned a priority. The one with the highest priority is doing some bookkeeping to handle concurrency issues etc. If he goes away, another one takes charge. Or maybe we can completely ignore the main-postmaster concept and let the clients broadcast a request to obtain locks etc. - When a new postmaster, hence a computer, joins the cluster, he will replicate the current database from one of the clients. Suggessions and critisisms are welcome. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) The best audience is intelligent, well-educated and a little drunk. -- Maurice Baring ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] pg_ident.conf
I have Red Hat Linux 6.2 , PostgreSQL 7.0.2. Could anybody help me to configure ident daemon using the file pg_ident.conf Thanks in advance, anuradha