Re: [HACKERS] Can't subscribe or get CVS

2001-09-28 Thread Christopher Kings-Lynne

Hi noel,

The correct CVSROOT is now:

export CVSROOT=:pserver:[EMAIL PROTECTED]:/projects/cvsroot

And the password is blank or 'postgresql'

Chris

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 [EMAIL PROTECTED]
 Sent: Friday, 21 September 2001 12:04 AM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: [HACKERS] Can't subscribe or get CVS
 
 
 
 I just tried to get PostgreSQL from CVS , but it rejected the password
 'postgresql' for user 'anoncvs':
 
 $ export 
 CVSROOT=:pserver:[EMAIL PROTECTED]:/home/projects/pgsql/cvsroot
 $ cvs login
 (Logging in to [EMAIL PROTECTED])
 CVS password:
 cvs login: authorization failed: server postgresql.org 
 rejected access to /home/projects/pgsql/cvsroot for user anoncvs
 
 
 Then, I tried to post this to pgsql-hackers, but my scubscrption
 failed, too!
 
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Subject: Majordomo results
 Date: Thu, 20 Sep 2001 11:55:29 -0400 (EDT)
 
  subscribe
  Illegal command!
 
  Skipped 1 line of trailing unparseable text.
 
 No valid commands processed.
 
 Is majordomo and CVS broken, or do I need different instructions?
 
 --Noel
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread Bruce Momjian

 
 Sounds cool to me ... definitely something to fix before v7.2, if its as
 easy as you make it sound ... I'm expecting the new drive to be
 installed today (if all goes well ... Thomas still has his date/time stuff
 to finish off, now that CVSup is fixed ...
 
 Let''s try and target Monday for Beta then?  I think the only two
 outstaandings are you and Thomas right now?
 
 Bruce, that latest rtree patch looks intriguing also ... can anyone
 comment positive/negative about it, so that we can try and get that in
 before Beta?

I put it in the queue and will apply in a day or two.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] path for contrib/intarray (current CVS)

2001-09-28 Thread Oleg Bartunov

Please apply attached patch to current CVS tree.

Changes:

 1. gist__int_ops is now without lossy
 2. added sort entry in picksplit

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



patch_intarray.gz
Description: Binary data


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] multibyte performance

2001-09-28 Thread Tatsuo Ishii

 pgbench unfortunately seems quite irrelevant to this issue, since it
 performs no textual operations whatsoever.

Yup.

  It'd be interesting to
 modify pgbench so that it updates the filler column somehow on each
 update (perhaps store a text copy of the new balance there), and then
 repeat the tests.

Maybe. I'm not sure if it would show significant differences though.

Anyway, what I'm interested in include:

o regexp/like/ilike operations
o very long text handling

I'll come up with more testings..
--
Tatsuo Ishii

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread mlw

Bruce Momjian wrote:
 
  Bruce Momjian wrote:
  
Save for the fact that the kernel can switch between threads faster then
it can switch processes considering threads share the same address space,
stack, code, etc.  If need be sharing the data between threads is much
easier then sharing between processes.
  
   Just a clarification but because we fork each backend, don't they share
   the same code space?  Data/stack is still separate.
 
  In Linux and many modern UNIX programs, you share everything at fork time. The
  data and stack pages are marked copy on write which means that if you touch
  it, the processor traps and drops into the memory manager code. A new page is
  created and replaced into your address space where the page, to which you were
  going to write, was.
 
 Yes, very true.  My point was that backends already share code space and
 non-modified data space.  It is just modified data and stack that is
 non-shared, but then again, they would have to be non-shared in a
 threaded backend too.

In a threaded system everything would be shared, depending on the OS, even the
stacks. The stacks could be allocated out of the same global pool.

You would need something like thread local storage to deal with isolating
aviables from one thread to another. That always seemed more trouble that it
was worth. Either that or go through each and every global variable in
PostgreSQL and make it a member of a structure, and create an instance of this
structure for each new thread.

IMHO once you go down the road of using Thread local memory, you are getting to
the same level of difficulty (for the OS) in task switching as just switching
processes. The exception to this is Windows where tasks are such a big hit.

I think threaded software is quite usefull, and I have a number of thread based
servers in production. However, my experience tells me that the work trying to
move PostgreSQL to a threaded ebvironment would be extensive and have little or
no tangable benefit.

I would rather see stuff like 64bit OIDs, three options for function definition
(short cache, nocache, long cache), etc. than to waste time making PostgreSQL
threaded. That's just my opinion.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread mlw

Bruce Momjian wrote:
 
  Save for the fact that the kernel can switch between threads faster then
  it can switch processes considering threads share the same address space,
  stack, code, etc.  If need be sharing the data between threads is much
  easier then sharing between processes.
 
 Just a clarification but because we fork each backend, don't they share
 the same code space?  Data/stack is still separate.

In Linux and many modern UNIX programs, you share everything at fork time. The
data and stack pages are marked copy on write which means that if you touch
it, the processor traps and drops into the memory manager code. A new page is
created and replaced into your address space where the page, to which you were
going to write, was.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread mlw

Lincoln Yeoh wrote:
 
 At 10:02 AM 9/27/01 -0400, mlw wrote:
 D. Hageman wrote:
  I agree with everything you wrote above except for the first line.  My
  only comment is that process boundaries are only *truely* a powerful
  barrier if the processes are different pieces of code and are not
  dependent on each other in crippling ways.  Forking the same code with the
  bug in it - and only 1 in 5 die - is still 4 copies of buggy code running
  on your system ;-)
 
 This is simply not true. All software has bugs, it is an undeniable fact.
 Some
 bugs are more likely to be hit than others. 5 processes , when one process
 hits a
 bug, that does not mean the other 4 will hit the same bug. Obscure bugs kill
 software all the time, the trick is to minimize the impact. Software is not
 perfect, assuming it can be is a mistake.
 
 A bit off topic, but that really reminded me of how Microsoft does their
 forking in hardware.
 
 Basically they fork (cluster) FIVE windows machines to run the same buggy
 code all on the same IP. That way if one process (machine) goes down, the
 other 4 stay running, thus minimizing the impact ;).
 
 They have many of these clusters put together.
 
 See: http://www.microsoft.com/backstage/column_T2_1.htm
 From Microsoft.com Backstage [1]
 
 OK so it's old (1998), but from their recent articles I believe they're
 still using the same method of achieving 100% availability. And they brag
 about it like it's a good thing...
 
 When I first read it I didn't know whether to laugh or get disgusted or
 whatever.

Believe me don't think anyone should be shipping software with serious bugs in
it, and I deplore Microsoft's complete lack of accountability when it comes to
quality, but come on now, lets not lie to ourselves. No matter which god you
may pray to, you have to accept that people are not perfect and mistakes will
be made.

At issue is how well programs are isolated from one another (one of the
purposes of operating systems) and how to deal with programmatic errors. I am
not advocating releasing bad software, I am just saying that you must code
defensively, assume a caller may pass the wrong parameters, don't trust that
malloc worked, etc. Stuff happens in the real world. Code to deal with it. 

In the end, no matter what you do, you will have a crash at some point. (The
tao of programming) accept it. Just try to make the damage as minimal as
possible.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread Lincoln Yeoh

At 10:02 AM 9/27/01 -0400, mlw wrote:
D. Hageman wrote:
 I agree with everything you wrote above except for the first line.  My
 only comment is that process boundaries are only *truely* a powerful
 barrier if the processes are different pieces of code and are not
 dependent on each other in crippling ways.  Forking the same code with the
 bug in it - and only 1 in 5 die - is still 4 copies of buggy code running
 on your system ;-)

This is simply not true. All software has bugs, it is an undeniable fact.
Some
bugs are more likely to be hit than others. 5 processes , when one process
hits a
bug, that does not mean the other 4 will hit the same bug. Obscure bugs kill
software all the time, the trick is to minimize the impact. Software is not
perfect, assuming it can be is a mistake.

A bit off topic, but that really reminded me of how Microsoft does their
forking in hardware.

Basically they fork (cluster) FIVE windows machines to run the same buggy
code all on the same IP. That way if one process (machine) goes down, the
other 4 stay running, thus minimizing the impact ;).

They have many of these clusters put together.

See: http://www.microsoft.com/backstage/column_T2_1.htm
From Microsoft.com Backstage [1]

OK so it's old (1998), but from their recent articles I believe they're
still using the same method of achieving 100% availability. And they brag
about it like it's a good thing...

When I first read it I didn't know whether to laugh or get disgusted or
whatever.

Cheerio,
Link.

[1]
http://www.microsoft.com/backstage/
http://www.microsoft.com/backstage/archives.htm



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread Bruce Momjian

 Bruce Momjian wrote:
  
   Save for the fact that the kernel can switch between threads faster then
   it can switch processes considering threads share the same address space,
   stack, code, etc.  If need be sharing the data between threads is much
   easier then sharing between processes.
  
  Just a clarification but because we fork each backend, don't they share
  the same code space?  Data/stack is still separate.
 
 In Linux and many modern UNIX programs, you share everything at fork time. The
 data and stack pages are marked copy on write which means that if you touch
 it, the processor traps and drops into the memory manager code. A new page is
 created and replaced into your address space where the page, to which you were
 going to write, was.

Yes, very true.  My point was that backends already share code space and
non-modified data space.  It is just modified data and stack that is
non-shared, but then again, they would have to be non-shared in a
threaded backend too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Glitch in handling of postmaster -o options

2001-09-28 Thread Tom Lane

I have just noticed a flaw in the handling of -o backend-options
postmaster parameters.  To wit: although these options will be passed
to all backends launched by the postmaster, they aren't passed to
checkpoint, xlog startup, and xlog shutdown subprocesses (everything
that goes through BootstrapMain).  Since BootstrapMain doesn't
recognize the same set of options that PostgresMain does, this is
a necessary restriction.  Unfortunately it means that checkpoint etc.
don't necessarily run with the same options as normal backends.

The particular case that I ran into is that I've been in the habit
of running test postmasters with -o -F to suppress fsync.  Kernel
tracing showed that checkpoint processes were issuing fsyncs anyway,
and I eventually realized why: they're not seeing the command line
option.

While that's not a fatal problem, I could imagine *much* more serious
misbehavior from inconsistent settings of some GUC parameters.  Since
backends believe that these parameters have PGC_POSTMASTER priority,
they'll accept changes that they probably oughtn't.  For example,
postmaster -o --shared_buffers=N
will cause things to blow up very nicely indeed: backends will have
a value of NBuffers that doesn't agree with what the postmaster has.

I wonder whether we should retire -o.  Or change it so that the
postmaster parses the given options for itself (consequently adjusting
its copies of GUC variables) instead of passing them on to backends
for parsing at backend start time.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] Preparation for Beta

2001-09-28 Thread Bruce Momjian

OK, I think we are on track for Monday beta.  Marc, will you be
packaging a beta1 tarball on Monday or waiting a few days?  I need to
run pgindent and pgjindent either right before or after beta starts.

Also, what are we doing with the toplevel /ChangeLogs.  I never
understood the purpose of it, and I know others have similar questions.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread Bruce Momjian

 We have been doing some scalability testing just recently here at Red
 Hat. The machine I was using was a 4-way 550 MHz Xeon SMP machine, I
 also ran the machine in uniprocessor mode to make some comparisons. All
 runs were made on Red Hat Linux running 2.4.x series kernels. I've
 examined a number of potentially interesting cases -- I'm still
 analyzing the results, but some of the initial results might be
 interesting:

Let me add a little historical information here.  I think the first
report of bad performance on SMP machines was from Tatsuo, where he had
1000 backends running in pgbench.  He was seeing poor
transactions/second with little CPU or I/O usage.  It was clear
something was wrong.

Looking at the code, it was easy to see that on SMP machines, the
spinlock select() was a problem.  Later tests on various OS's found that
no matter how small your select interval was, select() couldn't sleep
for less than one cpu tick, which is tyically 100Hz or 10ms.  At that
point we knew that the spinlock backoff code was a serious problem.  On
multi-processor machines that could hit the backoff code on lock
failure, there where hudreds of threads sleeping for 10ms, then all
waking up, one gets the lock, and the others sleep again.

On single-cpu machines, the backoff code doesn't get hit too much, but
it is still a problem.  Tom's implementation changes backoffs in all
cases by placing them in a semaphore queue and reducing the amount of
code protected by the spinlock.

We have these TODO items out of this:

* Improve spinlock code [performance]
o use SysV semaphores or queue of backends waiting on the lock
o wakeup sleeper or sleep for less than one clock tick
o spin for lock on multi-cpu machines, yield on single cpu machines
o read/write locks




-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Spinlock performance improvement proposal

2001-09-28 Thread Bruce Momjian


FYI, I have added a number of these emails to the 'thread' TODO.detail list.

 On Wed, 26 Sep 2001, D. Hageman wrote:
 
Save for the fact that the kernel can switch between threads faster then 
it can switch processes considering threads share the same address space, 
stack, code, etc.  If need be sharing the data between threads is much 
easier then sharing between processes. 
   
   When using a kernel threading model, it's not obvious to me that the
   kernel will switch between threads much faster than it will switch
   between processes.  As far as I can see, the only potential savings is
   not reloading the pointers to the page tables.  That is not nothing,
   but it is also
 major snippage
I can't comment on the isolate data line.  I am still trying to figure 
that one out.
   
   Sometimes you need data which is specific to a particular thread.
  
  When you need data that is specific to a thread you use a TSD (Thread 
  Specific Data).  
 Which Linux does not support with a vengeance, to my knowledge.
 
 As a matter of fact, quote from Linus on the matter was something like
 Solution to slow process switching is fast process switching, not another
 kernel abstraction [referring to threads and TSD]. TSDs make
 implementation of thread switching complex, and fork() complex.
 
 The question about threads boils down to: Is there far more data that is
 shared than unshared? If yes, threads are better, if not, you'll be
 abusing TSD and slowing things down. 
 
 I believe right now, postgresql' model of sharing only things that need to
 be shared is pretty damn good. The only slight problem is overhead of
 forking another backend, but its still _fast_.
 
 IMHO, threads would not bring large improvement to postgresql.
 
  Actually, if I remember, there was someone who ported postgresql (I think
 it was 6.5) to be multithreaded with major pain, because the requirement
 was to integrate with CORBA. I believe that person posted some benchmarks
 which were essentially identical to non-threaded postgres...
 
 -alex
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Fragmenting tables in postgres

2001-09-28 Thread Bruce Momjian

 [EMAIL PROTECTED] (Karthik Guruswamy) writes:
  Anyone tried fragmenting tables into multiple sub tables 
  transparently through Postgres rewrite rules ? I'm having 
  a table with 200,000 rows with varchar columns and noticed 
  that updates,inserts take a lot longer time compared to a 
  few rows in the same table.
 
 That's not a very big table ... there's no reason for inserts to
 take a long time, and not much reason for updates to take long either
 if you have appropriate indexes to help find the rows to be updated.
 Have you VACUUM ANALYZEd this table recently (or ever?)  Have you
 tried EXPLAINing the queries to see if they use indexes?
 
  I have a lot of memory in my 
  machine like 2Gig and 600,000 buffers. 
 
 You mean you set -B to 60?  That's not a bright idea.  A few
 thousand will be plenty, and will probably perform lots better.

This is a good question.  When does too many buffers become a
performance problem?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Plpython bug with int8?

2001-09-28 Thread Bradley McLean

Can someone else run this and confirm the results against the tip
of the CVS repository?

I'm trying to trace this bug (help welcome too).

(it was hidden in a trigger and a pain to narrow to this point)
-Brad

-

drop function mul1(int4,int4);
drop function mul2(int4,int4);
drop function callmul1();
drop function callmul2a();
drop function callmul2b();
create function mul1(int4,int4) returns int8 as 'select int8($1) * int8($2)' language 
'sql';
create function mul2(int4,int4) returns int8 as 'select int8($1) * 4294967296::int8 + 
int8($2)' language 'sql';
create function callmul1() returns int8 as 'return plpy.execute(select mul1(6,7) as 
x)[0][x]' language 'plpython';
create function callmul2a() returns int8 as 'select mul2(7,8)' language 'sql';
create function callmul2b() returns int8 as 'return plpy.execute(select mul2(7,8) as 
x)[0][x]' language 'plpython';
select mul1(3,4);
select callmul1();
select mul2(5,6);
select callmul2a();
select callmul2b();

Results:

...
 callmul1 
--
   42
(1 row)

mul2 
-
 21474836486
(1 row)

  callmul2a  
-
 30064771080
(1 row)

psql:bug:14: pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
psql:bug:14: connection to server was lost

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])