Re: [HACKERS] Can't subscribe or get CVS
Hi noel, The correct CVSROOT is now: export CVSROOT=:pserver:[EMAIL PROTECTED]:/projects/cvsroot And the password is blank or 'postgresql' Chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Friday, 21 September 2001 12:04 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [HACKERS] Can't subscribe or get CVS I just tried to get PostgreSQL from CVS , but it rejected the password 'postgresql' for user 'anoncvs': $ export CVSROOT=:pserver:[EMAIL PROTECTED]:/home/projects/pgsql/cvsroot $ cvs login (Logging in to [EMAIL PROTECTED]) CVS password: cvs login: authorization failed: server postgresql.org rejected access to /home/projects/pgsql/cvsroot for user anoncvs Then, I tried to post this to pgsql-hackers, but my scubscrption failed, too! From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Subject: Majordomo results Date: Thu, 20 Sep 2001 11:55:29 -0400 (EDT) subscribe Illegal command! Skipped 1 line of trailing unparseable text. No valid commands processed. Is majordomo and CVS broken, or do I need different instructions? --Noel ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Spinlock performance improvement proposal
Sounds cool to me ... definitely something to fix before v7.2, if its as easy as you make it sound ... I'm expecting the new drive to be installed today (if all goes well ... Thomas still has his date/time stuff to finish off, now that CVSup is fixed ... Let''s try and target Monday for Beta then? I think the only two outstaandings are you and Thomas right now? Bruce, that latest rtree patch looks intriguing also ... can anyone comment positive/negative about it, so that we can try and get that in before Beta? I put it in the queue and will apply in a day or two. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] path for contrib/intarray (current CVS)
Please apply attached patch to current CVS tree. Changes: 1. gist__int_ops is now without lossy 2. added sort entry in picksplit Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 patch_intarray.gz Description: Binary data ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] multibyte performance
pgbench unfortunately seems quite irrelevant to this issue, since it performs no textual operations whatsoever. Yup. It'd be interesting to modify pgbench so that it updates the filler column somehow on each update (perhaps store a text copy of the new balance there), and then repeat the tests. Maybe. I'm not sure if it would show significant differences though. Anyway, what I'm interested in include: o regexp/like/ilike operations o very long text handling I'll come up with more testings.. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Spinlock performance improvement proposal
Bruce Momjian wrote: Bruce Momjian wrote: Save for the fact that the kernel can switch between threads faster then it can switch processes considering threads share the same address space, stack, code, etc. If need be sharing the data between threads is much easier then sharing between processes. Just a clarification but because we fork each backend, don't they share the same code space? Data/stack is still separate. In Linux and many modern UNIX programs, you share everything at fork time. The data and stack pages are marked copy on write which means that if you touch it, the processor traps and drops into the memory manager code. A new page is created and replaced into your address space where the page, to which you were going to write, was. Yes, very true. My point was that backends already share code space and non-modified data space. It is just modified data and stack that is non-shared, but then again, they would have to be non-shared in a threaded backend too. In a threaded system everything would be shared, depending on the OS, even the stacks. The stacks could be allocated out of the same global pool. You would need something like thread local storage to deal with isolating aviables from one thread to another. That always seemed more trouble that it was worth. Either that or go through each and every global variable in PostgreSQL and make it a member of a structure, and create an instance of this structure for each new thread. IMHO once you go down the road of using Thread local memory, you are getting to the same level of difficulty (for the OS) in task switching as just switching processes. The exception to this is Windows where tasks are such a big hit. I think threaded software is quite usefull, and I have a number of thread based servers in production. However, my experience tells me that the work trying to move PostgreSQL to a threaded ebvironment would be extensive and have little or no tangable benefit. I would rather see stuff like 64bit OIDs, three options for function definition (short cache, nocache, long cache), etc. than to waste time making PostgreSQL threaded. That's just my opinion. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Spinlock performance improvement proposal
Bruce Momjian wrote: Save for the fact that the kernel can switch between threads faster then it can switch processes considering threads share the same address space, stack, code, etc. If need be sharing the data between threads is much easier then sharing between processes. Just a clarification but because we fork each backend, don't they share the same code space? Data/stack is still separate. In Linux and many modern UNIX programs, you share everything at fork time. The data and stack pages are marked copy on write which means that if you touch it, the processor traps and drops into the memory manager code. A new page is created and replaced into your address space where the page, to which you were going to write, was. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Spinlock performance improvement proposal
Lincoln Yeoh wrote: At 10:02 AM 9/27/01 -0400, mlw wrote: D. Hageman wrote: I agree with everything you wrote above except for the first line. My only comment is that process boundaries are only *truely* a powerful barrier if the processes are different pieces of code and are not dependent on each other in crippling ways. Forking the same code with the bug in it - and only 1 in 5 die - is still 4 copies of buggy code running on your system ;-) This is simply not true. All software has bugs, it is an undeniable fact. Some bugs are more likely to be hit than others. 5 processes , when one process hits a bug, that does not mean the other 4 will hit the same bug. Obscure bugs kill software all the time, the trick is to minimize the impact. Software is not perfect, assuming it can be is a mistake. A bit off topic, but that really reminded me of how Microsoft does their forking in hardware. Basically they fork (cluster) FIVE windows machines to run the same buggy code all on the same IP. That way if one process (machine) goes down, the other 4 stay running, thus minimizing the impact ;). They have many of these clusters put together. See: http://www.microsoft.com/backstage/column_T2_1.htm From Microsoft.com Backstage [1] OK so it's old (1998), but from their recent articles I believe they're still using the same method of achieving 100% availability. And they brag about it like it's a good thing... When I first read it I didn't know whether to laugh or get disgusted or whatever. Believe me don't think anyone should be shipping software with serious bugs in it, and I deplore Microsoft's complete lack of accountability when it comes to quality, but come on now, lets not lie to ourselves. No matter which god you may pray to, you have to accept that people are not perfect and mistakes will be made. At issue is how well programs are isolated from one another (one of the purposes of operating systems) and how to deal with programmatic errors. I am not advocating releasing bad software, I am just saying that you must code defensively, assume a caller may pass the wrong parameters, don't trust that malloc worked, etc. Stuff happens in the real world. Code to deal with it. In the end, no matter what you do, you will have a crash at some point. (The tao of programming) accept it. Just try to make the damage as minimal as possible. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Spinlock performance improvement proposal
At 10:02 AM 9/27/01 -0400, mlw wrote: D. Hageman wrote: I agree with everything you wrote above except for the first line. My only comment is that process boundaries are only *truely* a powerful barrier if the processes are different pieces of code and are not dependent on each other in crippling ways. Forking the same code with the bug in it - and only 1 in 5 die - is still 4 copies of buggy code running on your system ;-) This is simply not true. All software has bugs, it is an undeniable fact. Some bugs are more likely to be hit than others. 5 processes , when one process hits a bug, that does not mean the other 4 will hit the same bug. Obscure bugs kill software all the time, the trick is to minimize the impact. Software is not perfect, assuming it can be is a mistake. A bit off topic, but that really reminded me of how Microsoft does their forking in hardware. Basically they fork (cluster) FIVE windows machines to run the same buggy code all on the same IP. That way if one process (machine) goes down, the other 4 stay running, thus minimizing the impact ;). They have many of these clusters put together. See: http://www.microsoft.com/backstage/column_T2_1.htm From Microsoft.com Backstage [1] OK so it's old (1998), but from their recent articles I believe they're still using the same method of achieving 100% availability. And they brag about it like it's a good thing... When I first read it I didn't know whether to laugh or get disgusted or whatever. Cheerio, Link. [1] http://www.microsoft.com/backstage/ http://www.microsoft.com/backstage/archives.htm ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Spinlock performance improvement proposal
Bruce Momjian wrote: Save for the fact that the kernel can switch between threads faster then it can switch processes considering threads share the same address space, stack, code, etc. If need be sharing the data between threads is much easier then sharing between processes. Just a clarification but because we fork each backend, don't they share the same code space? Data/stack is still separate. In Linux and many modern UNIX programs, you share everything at fork time. The data and stack pages are marked copy on write which means that if you touch it, the processor traps and drops into the memory manager code. A new page is created and replaced into your address space where the page, to which you were going to write, was. Yes, very true. My point was that backends already share code space and non-modified data space. It is just modified data and stack that is non-shared, but then again, they would have to be non-shared in a threaded backend too. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Glitch in handling of postmaster -o options
I have just noticed a flaw in the handling of -o backend-options postmaster parameters. To wit: although these options will be passed to all backends launched by the postmaster, they aren't passed to checkpoint, xlog startup, and xlog shutdown subprocesses (everything that goes through BootstrapMain). Since BootstrapMain doesn't recognize the same set of options that PostgresMain does, this is a necessary restriction. Unfortunately it means that checkpoint etc. don't necessarily run with the same options as normal backends. The particular case that I ran into is that I've been in the habit of running test postmasters with -o -F to suppress fsync. Kernel tracing showed that checkpoint processes were issuing fsyncs anyway, and I eventually realized why: they're not seeing the command line option. While that's not a fatal problem, I could imagine *much* more serious misbehavior from inconsistent settings of some GUC parameters. Since backends believe that these parameters have PGC_POSTMASTER priority, they'll accept changes that they probably oughtn't. For example, postmaster -o --shared_buffers=N will cause things to blow up very nicely indeed: backends will have a value of NBuffers that doesn't agree with what the postmaster has. I wonder whether we should retire -o. Or change it so that the postmaster parses the given options for itself (consequently adjusting its copies of GUC variables) instead of passing them on to backends for parsing at backend start time. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Preparation for Beta
OK, I think we are on track for Monday beta. Marc, will you be packaging a beta1 tarball on Monday or waiting a few days? I need to run pgindent and pgjindent either right before or after beta starts. Also, what are we doing with the toplevel /ChangeLogs. I never understood the purpose of it, and I know others have similar questions. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Spinlock performance improvement proposal
We have been doing some scalability testing just recently here at Red Hat. The machine I was using was a 4-way 550 MHz Xeon SMP machine, I also ran the machine in uniprocessor mode to make some comparisons. All runs were made on Red Hat Linux running 2.4.x series kernels. I've examined a number of potentially interesting cases -- I'm still analyzing the results, but some of the initial results might be interesting: Let me add a little historical information here. I think the first report of bad performance on SMP machines was from Tatsuo, where he had 1000 backends running in pgbench. He was seeing poor transactions/second with little CPU or I/O usage. It was clear something was wrong. Looking at the code, it was easy to see that on SMP machines, the spinlock select() was a problem. Later tests on various OS's found that no matter how small your select interval was, select() couldn't sleep for less than one cpu tick, which is tyically 100Hz or 10ms. At that point we knew that the spinlock backoff code was a serious problem. On multi-processor machines that could hit the backoff code on lock failure, there where hudreds of threads sleeping for 10ms, then all waking up, one gets the lock, and the others sleep again. On single-cpu machines, the backoff code doesn't get hit too much, but it is still a problem. Tom's implementation changes backoffs in all cases by placing them in a semaphore queue and reducing the amount of code protected by the spinlock. We have these TODO items out of this: * Improve spinlock code [performance] o use SysV semaphores or queue of backends waiting on the lock o wakeup sleeper or sleep for less than one clock tick o spin for lock on multi-cpu machines, yield on single cpu machines o read/write locks -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Spinlock performance improvement proposal
FYI, I have added a number of these emails to the 'thread' TODO.detail list. On Wed, 26 Sep 2001, D. Hageman wrote: Save for the fact that the kernel can switch between threads faster then it can switch processes considering threads share the same address space, stack, code, etc. If need be sharing the data between threads is much easier then sharing between processes. When using a kernel threading model, it's not obvious to me that the kernel will switch between threads much faster than it will switch between processes. As far as I can see, the only potential savings is not reloading the pointers to the page tables. That is not nothing, but it is also major snippage I can't comment on the isolate data line. I am still trying to figure that one out. Sometimes you need data which is specific to a particular thread. When you need data that is specific to a thread you use a TSD (Thread Specific Data). Which Linux does not support with a vengeance, to my knowledge. As a matter of fact, quote from Linus on the matter was something like Solution to slow process switching is fast process switching, not another kernel abstraction [referring to threads and TSD]. TSDs make implementation of thread switching complex, and fork() complex. The question about threads boils down to: Is there far more data that is shared than unshared? If yes, threads are better, if not, you'll be abusing TSD and slowing things down. I believe right now, postgresql' model of sharing only things that need to be shared is pretty damn good. The only slight problem is overhead of forking another backend, but its still _fast_. IMHO, threads would not bring large improvement to postgresql. Actually, if I remember, there was someone who ported postgresql (I think it was 6.5) to be multithreaded with major pain, because the requirement was to integrate with CORBA. I believe that person posted some benchmarks which were essentially identical to non-threaded postgres... -alex ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Fragmenting tables in postgres
[EMAIL PROTECTED] (Karthik Guruswamy) writes: Anyone tried fragmenting tables into multiple sub tables transparently through Postgres rewrite rules ? I'm having a table with 200,000 rows with varchar columns and noticed that updates,inserts take a lot longer time compared to a few rows in the same table. That's not a very big table ... there's no reason for inserts to take a long time, and not much reason for updates to take long either if you have appropriate indexes to help find the rows to be updated. Have you VACUUM ANALYZEd this table recently (or ever?) Have you tried EXPLAINing the queries to see if they use indexes? I have a lot of memory in my machine like 2Gig and 600,000 buffers. You mean you set -B to 60? That's not a bright idea. A few thousand will be plenty, and will probably perform lots better. This is a good question. When does too many buffers become a performance problem? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Plpython bug with int8?
Can someone else run this and confirm the results against the tip of the CVS repository? I'm trying to trace this bug (help welcome too). (it was hidden in a trigger and a pain to narrow to this point) -Brad - drop function mul1(int4,int4); drop function mul2(int4,int4); drop function callmul1(); drop function callmul2a(); drop function callmul2b(); create function mul1(int4,int4) returns int8 as 'select int8($1) * int8($2)' language 'sql'; create function mul2(int4,int4) returns int8 as 'select int8($1) * 4294967296::int8 + int8($2)' language 'sql'; create function callmul1() returns int8 as 'return plpy.execute(select mul1(6,7) as x)[0][x]' language 'plpython'; create function callmul2a() returns int8 as 'select mul2(7,8)' language 'sql'; create function callmul2b() returns int8 as 'return plpy.execute(select mul2(7,8) as x)[0][x]' language 'plpython'; select mul1(3,4); select callmul1(); select mul2(5,6); select callmul2a(); select callmul2b(); Results: ... callmul1 -- 42 (1 row) mul2 - 21474836486 (1 row) callmul2a - 30064771080 (1 row) psql:bug:14: pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. psql:bug:14: connection to server was lost ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])