[HACKERS] Group by count() and indexes

2003-02-18 Thread Anuradha Ratnaweera

Consider the following query on a large table with lots of different
`id's:

SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;

It has an (usually unique) index on id.  Obviously, the index helps to
evaluate count(id) for a given value of id, but count()s for all the
`id's should be evaluated, so sort() will take most of the time.

Is there a way to improve performance of this query?  If not, please
give some indication to do a workaround on the source itself, so perhaps
I may be able to work out a patch.

Thanks in advance.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.21-pre4)

It is contrary to reasoning to say that there is a vacuum or space in
which there is absolutely nothing.
-- Descartes


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Group by, count, order by and limit

2003-02-17 Thread Anuradha Ratnaweera

My 3rd attempt to post ...

Consider this query on a large table with lots of different IDs:

SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;

It has an index on id.  Obviously, the index helps to evaluate count(id)
for a given value of id, but count()s for all the `id's should be
evaluated, so sort() will take most of the time.

Is there a way to improve performance of this query?  If not, please
give some indication to do a workaround on the source itself, so perhaps
I may be able to come out with a patch.

Thanks in advance.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.21-pre4)

There are three ways to get something done:
(1) Do it yourself.
(2) Hire someone to do it for you.
(3) Forbid your kids to do it.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Postgresql and multithreading

2002-10-19 Thread Anuradha Ratnaweera
On Fri, Oct 18, 2002 at 10:28:38AM -0400, Tom Lane wrote:
 Greg Copeland [EMAIL PROTECTED] writes:
  On Thu, 2002-10-17 at 22:20, Tom Lane wrote:
  Simple: respond to 'em all with a one-line answer: convince us why we
  should use it.  The burden of proof always seems to fall on the wrong
  end in these discussions.
 
  ... Now, it seems, that
  people don't want to answer questions at all as it's bothering the
  developers.
 
 Not at all.  But rehashing issues that have been talked out repeatedly
 is starting to bug some of us ;-).  Perhaps the correct standard
 answer is more like this has been discussed before, please read the
 list archives.

Let me explain my posting which started this `thread':

- The developer's FAQ section 1.9 explains why PostgreSQL doesn't use
  threads (and many times it has been discussed on the list).

- The TODO list has an item `Experiment with multi-threaded backend' and
  points to a mailing list discussion about the implementation by Myron
  Scott.  His final comment is that he didn't `gain much performance'
  and `ended up with some pretty unmanagable code'.  He also says that
  he wouldn't `personally try this again ... but there probably was a
  better way'.

- I was going through the TODO list, and was wondering if I should try
  on this. But before doing that, naturally, I wanted to figure out if
  any of the core developers themselves have any plans of doing it.

Now, I am trying hard to figure out why this `are you going to do this?
otherwise I can try it', type posting was not differentiated from
numerous `why don't YOU implement this feature' type postings ;)

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

Life is too important to take seriously.
-- Corky Siegel


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Postgresql and multithreading

2002-10-17 Thread Anuradha Ratnaweera

On Wed, Oct 16, 2002 at 02:08:21PM -0400, Curtis Faith wrote:
 
 2) Including the pros and cons of the feature/implementation and how close
 the group is to deciding whether something would be worth doing. - I can
 also do this.

The pros and cons of many such features have been discussed over the
lists as well as on the FAQs. But the second matter, the group's
likehood their implementation cannot always be deduced from those
communications or from docs.

Therefore suggested material into the FAQs are going to be extremely
useful to like-to-be developers. They also would hopefully reduce
unnecessary traffic on the list.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

I have found little that is good about human beings.  In my experience
most of them are trash.
-- Sigmund Freud


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Postgresql and multithreading

2002-10-15 Thread Anuradha Ratnaweera

On Wed, Oct 16, 2002 at 12:59:57AM -0400, Bruce Momjian wrote:
 Anuradha Ratnaweera wrote:
  
  Is there any plans to make postgresql multithreading?
 
 We don't think it is needed, except perhaps for Win32 and Solaris, which
 have slow process creation times.

Thanks, Bruce.  But what I want to know is whether multithreading is
likely to get into in postgresql, say somewhere in 8.x, or even in 9.x?
(as they did with Apache).  Are there any plans to do so, or is postgres
going to remain rather a multi-process application?

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

One nice thing about egotists: they don't talk about other people.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Postgresql and multithreading

2002-10-15 Thread Anuradha Ratnaweera

On Wed, Oct 16, 2002 at 01:25:23AM -0400, Bruce Momjian wrote:
 Anuradha Ratnaweera wrote:

  ... what I want to know is whether multithreading is likely to get
  into in postgresql, say somewhere in 8.x, or even in 9.x?
 
 It may be optional some day, most likely for Win32 at first, but we see
 little value to it on most other platforms;  of course, we may be wrong.

In that case, I wonder if it is worth folking a new project to add
threading support to the backend?  Of course, keeping in sync with the
original would be lot of work.

In that way, one should be able to test the hypothesis (whether threads
improve things, or the other way round - if one likes it it that way :))
without messing around with stable postgres code, as they did and do
with postgresql-R.

And a minor question is wheter it is legal to keep the _changes_ in such
a project GPL?

 I am also not sure if it is a big win on Apache either;  I think the
 jury is still out on that one, hence the slow adoption of 2.X,

As far as we are concened, it is the stability, rather than speed which
still keeps us in 1.3.

 and we don't want to add threads and make a mess of the code or slow
 it down, which does often happen.

Fully agreed.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

Equality is not when a female Einstein gets promoted to assistant
professor; equality is when a female schlemiel moves ahead as fast as a
male schlemiel.
-- Ewald Nyquist


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Peer to peer replication of Postgresql databases

2002-10-13 Thread Anuradha Ratnaweera

On Fri, Oct 11, 2002 at 08:30:55AM -0500, Greg Copeland wrote:

 I'd be curious to hear in a little more detail what constitutes not
 good for postgres on a mosix cluster.

It seems that almost all the postgres processes remain in the `home'
node.

Please notice that I am not underestimating Mosix in any way.  We have
tested many programs from our parallel processing project with extreme
success on our mosix cluster.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

Ginger snap.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Peer to peer replication of Postgresql databases

2002-10-13 Thread Anuradha Ratnaweera

On Fri, Oct 11, 2002 at 07:10:26PM +0530, Shridhar Daithankar wrote:
 On 11 Oct 2002 at 8:30, Greg Copeland wrote:
 
  I'd be curious to hear in a little more detail what constitutes not
  good for postgres on a mosix cluster.
 
 Well, I guess in kind of replication we are talking here, the
 performance will be enhanced only if separate instances of psotgresql
 runs on separate machine.  Now if mosix kernel applies some AI and
 puts all of them on same machine, it isn't going to be any good for
 the purpose replication is deployed.

Exactly.  First, since we know what is going on, it is not necessary for
the OS to decide what's going on.  Secondly, database replication is not
looked after at all, unless we do some crude tricks on the filesystem.
Still it won't be efficient.

 I guess that's what she meant..
  ^^^
Correction: that's what _HE_ meant... ;)

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

All other things being equal, a bald man cannot be elected President of
the United States.
-- Vic Gold


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Peer to peer replication of Postgresql databases

2002-10-13 Thread Anuradha Ratnaweera

On Fri, Oct 11, 2002 at 12:07:00PM -0400, Neil Conway wrote:
 [ pgsql-patches removed from Cc: list ]
 
 Anuradha Ratnaweera [EMAIL PROTECTED] writes:
  I am trying to add some replication features to postgres (yes, I
  have already looked at ongoing work), in a peer to peer manner.
 
 Did you look at the research behind Postgres-R, and the pgreplication
 stuff?

Am looking at the research papers related to it now.

  - When a frontend process sends a read query, each backend process
does that from its own data area.
 
 Surely that's not correct -- a SELECT can be handled by *any one*
 node, not each and every one, right?

Yes.  Sorry about my careless wording.  Unless anything is kind of
locked, each node has a copy of the database, so each one can handle
SELECTs individually.

The actual situation will be far from this simple, because there will be
database writes going on and generating consistent SELECTs would need
careful handling of concurency issues.

  - There are two types of write queries.  Postmasters use seperate
communication channels for each.  One is the sequencial channel which
carries writes whose order is important, and the non-sequencial
channel carries write queries whose order is not important.
 
 How do you distinguish between these?

Nope.  We assume that all the communication should go through the
sequencial channel unless indicated by the client.  In that case, we
will have to find a way to indicate this from the client's side.  This
doesn't sound very elegant, may be we can figure out a better way.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

Being against torture ought to be sort of a bipartisan thing.
-- Karl Lehenbauer


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Peer to peer replication of Postgresql databases

2002-10-11 Thread Anuradha Ratnaweera
On Fri, Oct 11, 2002 at 04:04:29PM +0530, Shridhar Daithankar wrote:
 On 11 Oct 2002 at 16:29, Anuradha Ratnaweera wrote:
 
  On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote:
  I will look at it, too.  Thanks for the link.  In some cases, starting
  anew is faster than learning unmaintained existing code.

Ok.  Checked out what usogres is.  It is not what I want.  I don't want
a static `main database'.  It should simply a cluster of them - just like
a set of Raid-0 disks, may be with a tempory controller for some tasks.

Also, as a matter of fact, usogres is not unmaintained code.

 While that's true, usogres code is just few files. I wouldn't take more than 
 half an hour to read up the things. And besides it contain postgresql protocol 
 implementation necessary which would take some time to test and debug,

Great.  I will look into this over the weekend.

 And it's in C++. I like that..;-)

And I DON'T like that ;)

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

QOTD:
I ain't broke, but I'm badly bent.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Peer to peer replication of Postgresql databases

2002-10-11 Thread Anuradha Ratnaweera
On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote:
 
 Well, I don't think adding support for multiple slaves to usogres would be that 
 problematic. Of course if you want to load balance your application queries, 
 application has to be aware of that. I will not do sending requests to a mosix 
 cluster anyway.

Have already tested postgres on a mosix cluster, and as expected results
are not good.  (although mosix does the correct thing in keeping all the
database backend processes on one node).

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

Remember: Silly is a state of Mind, Stupid is a way of Life.
-- Dave Butler


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Peer to peer replication of Postgresql databases

2002-10-11 Thread Anuradha Ratnaweera

Hi all,

I am trying to add some replication features to postgres (yes, I have
already looked at ongoing work), in a peer to peer manner.  The goal
is to achive `nearly complete fault tolerence' by replicating data.

The basic framework I have in mind is somewhat like this.

- Postmasters are running on different computers on a networked cluster.
  Their data areas are identical at the beginning and recide on local
  storage devices.

- Each postmaster is aware that they are a part of a cluster and they
  can communicate with each other, send multicast requests and look for
  each other's presence (like heartbeat in linux-ha project).

- When a frontend process sends a read query, each backend process
  does that from its own data area.

- There are two types of write queries.  Postmasters use seperate
  communication channels for each.  One is the sequencial channel which
  carries writes whose order is important, and the non-sequencial
  channel carries write queries whose order is not important.

- When a frontend process sends non-sequencial write query to a backend,
  it is directly written to the local data area and a multicast is
  sent (preferably asynchronously) to the other postmasters who will
  also update their respective local areas.

  May be we can simply duplicate what goes to WAL into a TCP/IP socket
  (with some header info, of course).

- When a sequencial-write query is requested, the corresponding
  postmaster informs a main-postmaster (more about in the next point),
  waits for his acknowledgement, and proceeds the same way as the
  non-sequencial write.

- Each postmaster is assigned a priority.  The one with the highest
  priority is doing some bookkeeping to handle concurrency issues etc.
  If he goes away, another one takes charge.

  Or maybe we can completely ignore the main-postmaster concept and
  let the clients broadcast a request to obtain locks etc.

- When a new postmaster, hence a computer, joins the cluster, he
  will replicate the current database from one of the clients.

Suggessions and critisisms are welcome.

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.18-xfs-1.1)

The best audience is intelligent, well-educated and a little drunk.
-- Maurice Baring


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] pg_ident.conf

2000-12-03 Thread anuradha

I have  Red Hat Linux  6.2 , PostgreSQL 7.0.2.
Could anybody help me to configure ident daemon using the file
pg_ident.conf



Thanks in advance,

anuradha