Re: [PERFORM] Tuning PostgreSQL
On 2003-07-22 09:04:42 +0200, Alexander Priem wrote: Hi all, Vincent, You said that using RAID1, you don't have real redundancy. But RAID1 is mirroring, right? So if one of the two disks should fail, there should be no data lost, right? Right. But the proposal was a single disk for WAL, without redundancy, and I argued that wasn't really safe. RAID1 by itself is extremely safe, possibly even the safest RAID type there is. I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb drives. I don't know if I can get the money for this, but how would the following setup sound? Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL. Four 18Gb (15.000rpm) disks in RAID5 array for data. Our own testing has shown that a 6 disk RAID-10 array is faster than what you describe. Of course, this is very much dependant on how much INSERT/UPDATES you generate (which taxes your WAL more), so your mileage may vary. For the same amount of money, I could also get: Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL. Five/Six 36Gb (10.000rpm) disks in RAID5 array for data. It is said that a higher RPM is particularly useful for a WAL disk. So you might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS and swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that diskspace. Which would be the best of the above? The one with four 15k-rpm disks or the one with five/six 10k-rpm disks? Would these configs be better than all disks in one huge RAID5 array? There are so many possible configs with RAID... 15K rpm disks are significantly faster than 10K rpm disks. If your only concern is performance, buy 15K rpm disks. If you want more diskspace for your money, fall back to larger 10K rpm disks. I personally think seperate WAL disks are vastly overrated, since they haven't shown a big performance gain in our own tests. But as I have said, this is extremely dependant on the type of load you generate, so only your own tests can tell you what you should do in this respect. About RAID types: the fastest RAID type by far is RAID-10. However, this will cost you a lot of useable diskspace, so it isn't for everyone. You need at least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as much useable diskspace as possible and still want to be redundant. RAID-1 is very useful for small (2-disk) arrays. If you have the time and are settled on buying 6 disks, I'd test the following scenarios: - 6-disk RAID-10 array (should perform best) - 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 6-disk RAID-5 array (will probably perform worst) Hope this helps. Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning PostgreSQL
On Mon, 2003-07-21 at 04:33, Shridhar Daithankar wrote: Hi Alexander , On 21 Jul 2003 at 11:23, Alexander Priem wrote: [snip] I use ext3 filesystem, which probably is not the best performer, is it? No. You also need to check ext2, reiser and XFS. There is no agreement between users as in what works best. You need to benchmark and decide. According to Jeremy Allison of SAMBA, They used ext3, which is one of the slowest filesystems on Linux, Allison said. In a real comparative test, you would use XFS. http://www.linuxworld.com/story/32673.htm -- +-+ | Ron Johnson, Jr.Home: [EMAIL PROTECTED] | | Jefferson, LA USA | | | | I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!| |unknown | +-+ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tuning PostgreSQL
Wow, I never figured how many different RAID configurations one could think of :) After reading lots of material, forums and of course, this mailing-list, I think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm each), one of those six disks will be a 'hot spare'. I will just put the OS, the WAL and the data one one volume. RAID10 is way to expensive :) If I understand correctly, this will give great read-performance, but less write-performance. But since this server will be equipped with an embedded RAID controller featuring 128Mb of battery-backed cache, I figure that this controller will negate that (at least somewhat). I will need to find out whether this cache can be configured so that it will ONLY cache WRITES, not READS Also because of this battery backed cache controller, I will go for the ext2 file system, mounted with 'noatime'. I will use a UPS, so I don't think I need the journaling of ext3. XFS is not natively supported by RedHat and I will go for the easy way here :) 1 Gb of RAM should be enough, I think. That is about the only point that almost everyone agrees on :) Do you think ECC is very important? The server I have in mind does not support it. Another one does, but is is about 1.000 euros more expensive :( One CPU should also be enough. As for postgresql.conf settings, I think I will start with the following : max_connections = 128 superuser_reserved_connections = 1 shared_buffers = 8192 max_fsm_relations = 1000 max_fsm_pages = 10 wal_buffers = 32 sort_mem = 2048 vacuum_mem = 32768 effective_cache_size = 28672 (this one I'm not sure about, maybe this one needs to be higher) random_page_cost = 2 geq0_threshold = 20 This pretty much sums it up. What do you think about this config? It may not be the fastest, but a server like this will cost about 4750 euros, and that is including an Intel Xeon 2.4GHz cpu, redundant power supply, WITHOUT the UPS. Seems very reasonable to me... Kind regards, Alexander Priem. - Original Message - From: Vincent van Leeuwen [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, July 22, 2003 11:40 AM Subject: Re: [PERFORM] Tuning PostgreSQL On 2003-07-22 09:04:42 +0200, Alexander Priem wrote: Hi all, Vincent, You said that using RAID1, you don't have real redundancy. But RAID1 is mirroring, right? So if one of the two disks should fail, there should be no data lost, right? Right. But the proposal was a single disk for WAL, without redundancy, and I argued that wasn't really safe. RAID1 by itself is extremely safe, possibly even the safest RAID type there is. I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb drives. I don't know if I can get the money for this, but how would the following setup sound? Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL. Four 18Gb (15.000rpm) disks in RAID5 array for data. Our own testing has shown that a 6 disk RAID-10 array is faster than what you describe. Of course, this is very much dependant on how much INSERT/UPDATES you generate (which taxes your WAL more), so your mileage may vary. For the same amount of money, I could also get: Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL. Five/Six 36Gb (10.000rpm) disks in RAID5 array for data. It is said that a higher RPM is particularly useful for a WAL disk. So you might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS and swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that diskspace. Which would be the best of the above? The one with four 15k-rpm disks or the one with five/six 10k-rpm disks? Would these configs be better than all disks in one huge RAID5 array? There are so many possible configs with RAID... 15K rpm disks are significantly faster than 10K rpm disks. If your only concern is performance, buy 15K rpm disks. If you want more diskspace for your money, fall back to larger 10K rpm disks. I personally think seperate WAL disks are vastly overrated, since they haven't shown a big performance gain in our own tests. But as I have said, this is extremely dependant on the type of load you generate, so only your own tests can tell you what you should do in this respect. About RAID types: the fastest RAID type by far is RAID-10. However, this will cost you a lot of useable diskspace, so it isn't for everyone. You need at least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as much useable diskspace as possible and still want to be redundant. RAID-1 is very useful for small (2-disk) arrays. If you have the time and are settled on buying 6 disks, I'd test the following scenarios: - 6-disk RAID-10 array (should perform best) - 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 6-disk RAID-5 array (will probably perform
Re: [PERFORM] Tuning PostgreSQL
AP == Alexander Priem [EMAIL PROTECTED] writes: AP Hmmm. I keep changing my mind about this. My Db would be mostly AP 'selecting', but there would also be pretty much inserting and AP updating done. But most of the work would be selects. So would AP this config be OK? I'm about to order a new server. I haven't decided exactly how many disks I will get, but my plan is to get an 8-disk RAID10 with 15k RPM drives. I don't need the volume, just the speed and number of spindles, so I'm buying the smallest drives that meet my speed probably 18Gb each (sheesh! I remember getting my first 5Mb disk for my 8088 PC in college and thinking that was too much space). My mix is nearly even read/write, but probably a little biased towards the reading. This machine is replacing a 5-disk box that was switched from RAID5 to 4-disk RAID10 for data plus one system disk in January (what a pain that was to re-index, but that's another story). The switch from RAID5 to RAID10 made an enormous improvement in performance. The speedup wasn't from recreating the database: It was restored from a file-level backup so the actual files were not compacted or secretly improved in any way, other than my occasional reindexing. So I think your 6-disk RAID10 will be good. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Dual Xeon + HW RAID question
Mindaugas Riauba wrote: I missed your orig. post, but AFAIK multiprocessing kernels will handle HT CPUs as 2 CPUs each. Thus, our dual Xeon 2.4 is recognized as 4 Xeon 2.4 CPUs. This way, I don't think HT would improve any single query (afaik no postgres process uses more than one cpu), but overall multi-query performance has to improve. When you use hyperthreading, each virtual cpu runs at 70% of a full CPU, so hyperthreading could be slower than non-hyperthreading. On a fully loaded dual cpu system, you are looking at 2.8 cpu's (0.70 * 4), while if it isn't loaded, you are looking at slowing down if you are only using 1 or 2 cpu's. Virtual cpus are not running at 70% of real cpus :). Slowdown will happen if scheduler will run 2 processes on the same real cpu. And I read that there are patches for Linux kernel to fix that. Sooner rather than later they will appear in Linus kernel. Right, I simplified it. The big deal is whether the OS favors the second real CPU over one of the virtual CPU's on the same die --- by default, it doesn't. Ever if it did work perfectly, you are talking about going from 1 to 1.4 or 2 to 2.8, which doesn't seem like much. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning PostgreSQL
On Tue, Jul 22, 2003 at 11:40:35 +0200, Vincent van Leeuwen [EMAIL PROTECTED] wrote: About RAID types: the fastest RAID type by far is RAID-10. However, this will cost you a lot of useable diskspace, so it isn't for everyone. You need at least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as much useable diskspace as possible and still want to be redundant. RAID-1 is very useful for small (2-disk) arrays. Note that while raid 10 requires 4 disks, you get the space of 2 disks. This is the same ratio as for raid 1. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Dual Xeon + HW RAID question
by default -- do you mean there is a way to tell Linux to favor the second real cpu over the HT one? how? G. --- cut here --- - Original Message - From: Bruce Momjian [EMAIL PROTECTED] Sent: Tuesday, July 22, 2003 6:26 PM Subject: Re: [PERFORM] Dual Xeon + HW RAID question Right, I simplified it. The big deal is whether the OS favors the second real CPU over one of the virtual CPU's on the same die --- by default, it doesn't. Ever if it did work perfectly, you are talking about going from 1 to 1.4 or 2 to 2.8, which doesn't seem like much. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Dual Xeon + HW RAID question
SZUCS Gábor wrote: by default -- do you mean there is a way to tell Linux to favor the second real cpu over the HT one? how? Right now there is no way the kernel can tell which virtual cpu's are on each physical cpu's, and that is the problem. Once there is a way, hyperthreading will be more useful, but even then, it doesn't double your CPU throughput, just increases by 40%. Right, I simplified it. The big deal is whether the OS favors the second real CPU over one of the virtual CPU's on the same die --- by default, it doesn't. Ever if it did work perfectly, you are talking about going from 1 to 1.4 or 2 to 2.8, which doesn't seem like much. ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wrong plan or what ?
Gaetano, QUERY PLAN Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual time=11074.21..11134.28 rows=10 loops=1) Hash Cond: (outer.id_user = inner.id_user) - Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) (actual time=0.02..8530.21 rows=1258966 loops=1) OK, here's your problem The planner thinks that you're going to get 40162 rows out of the final join, not 10. If the row estimate was correct, then the Seq Scan would be a reasonable plan. But it's not. Here's some steps you can take to clear things up for the planner: 1) Make sure you've VACUUM ANALYZED 2) Adjust the following postgresql.conf statistics: a) effective_cache_size: increase to 70% of available (not used by other processes) RAM. b) random_page_cost: decrease, maybe to 2. c) default_statistics_target: try increasing to 100 (warning: this will significantly increase the time required to do ANALYZE) Then test again! -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tunning FreeeBSD and PostgreSQL
BM == Bruce Momjian [EMAIL PROTECTED] writes: BM I know Linux has pagable shared memory, and you can resize the maximum BM in a running kernel, so it seems they must have abandonded the linkage BM between shared page tables and the kernel. This looks interesting: Thanks for the info. You can resize it in FreeBSD as well, using the sysctl command to set the various kern.ipc.shm* values. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Dual Xeon + HW RAID question
Jord Tanner wrote: On Tue, 2003-07-22 at 10:39, Bruce Momjian wrote: But CPU affinity isn't realated to hyperthreading, as far as I know. CPU affinity tries to keep processes on the same cpu in case there is still valuable info in the cpu cache. It is true that CPU affinity is designed to prevent the dump of valuable CPU cache. My thought is that if you are trying to prevent CPU contention, you could use CPU affinity to prevent 2 postmaster processes from running simultaneously on the same die. Am I out to lunch here? I've not worked with CPU affinity before, so I'm not familiar with the intimate details. I guess you could but it is the backends that use the cpu. I don't think manually specifying affinity will work for most applications. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Dual Xeon + HW RAID question
On Tue, 2003-07-22 at 11:50, Bruce Momjian wrote: Jord Tanner wrote: On Tue, 2003-07-22 at 10:39, Bruce Momjian wrote: But CPU affinity isn't realated to hyperthreading, as far as I know. CPU affinity tries to keep processes on the same cpu in case there is still valuable info in the cpu cache. It is true that CPU affinity is designed to prevent the dump of valuable CPU cache. My thought is that if you are trying to prevent CPU contention, you could use CPU affinity to prevent 2 postmaster processes from running simultaneously on the same die. Am I out to lunch here? I've not worked with CPU affinity before, so I'm not familiar with the intimate details. I guess you could but it is the backends that use the cpu. I don't think manually specifying affinity will work for most applications. This is beating a dead horse, but I'll take one more kick at it. CPU affinity is defined by a bit mask, so multiple processors can be selected. It is also inherited by child processes, so assigning CPU 0 and CPU 2 (which I assume would be on different dies in a dual processor hyper-threading system) to the parent postmaster should prevent CPU contention with respect to the postgres backend. I would be very interested to see if any advantage could be gained by a combination of multiple HT processors and cpu affinity over multiple non-HT processors. Yet Another Performance Testing To Do (YAPTTD)! -- Jord Tanner [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wrong plan or what ?
Gaetano, SELECT * from user_logs where id_user in ( 10943, 10942, 10934, 10927, 10910, 10909 ); [SNIPPED] Why the planner or the executor ( I don't know ) do not follow the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wrong plan or what ?
Josh Berkus [EMAIL PROTECTED] Gaetano, SELECT * from user_logs where id_user in ( 10943, 10942, 10934, 10927, 10910, 10909 ); [SNIPPED] Why the planner or the executor ( I don't know ) do not follow the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs (cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10 loops=1) Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934) OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909)) Total runtime: 0.41 msec (3 rows) Thank you Gaetano PS: if I execute the query I obtain 10 rows instead of 3 that say the explain analyze. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Wrong plan or what ?
Josh Berkus [EMAIL PROTECTED] Gaetano, QUERY PLAN Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual time=11074.21..11134.28 rows=10 loops=1) Hash Cond: (outer.id_user = inner.id_user) - Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) (actual time=0.02..8530.21 rows=1258966 loops=1) OK, here's your problem The planner thinks that you're going to get 40162 rows out of the final join, not 10. If the row estimate was correct, then the Seq Scan would be a reasonable plan. But it's not. Here's some steps you can take to clear things up for the planner: 1) Make sure you've VACUUM ANALYZED 2) Adjust the following postgresql.conf statistics: a) effective_cache_size: increase to 70% of available (not used by other processes) RAM. b) random_page_cost: decrease, maybe to 2. c) default_statistics_target: try increasing to 100 (warning: this will significantly increase the time required to do ANALYZE) Then test again! No improvement at all, I pushed default_statistics_target to 1000 but the rows expected are still 40612 :-( Of course I restarted the postmaster and I vacuumed analyze the DB Thank you Gaetano ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] One table or many tables for data set
On Tue, 2003-07-22 at 20:34, Castle, Lindsay wrote: Hi all, I'm working on a project that has a data set of approximately 6million rows with about 12,000 different elements, each element has 7 columns of data. Are these 7 columns the same for each element? signature.asc Description: This is a digitally signed message part
Re: [PERFORM] One table or many tables for data set
Ok.. Unless I'm missing something, the data will be static (or near static). It also sounds as if the structure is common for elements, so you probably only want 2 tables. One with 6 million rows and any row information. The other with 6 million * 12000 rows with the element data linking to the row information line with an identifier, and have an 'element type' (I assume there are 12000 types of elements -- or something of that nature). Unique constraint on (row_identifier, element_type) The speed you achieve will be based on what indexes you create. If you spend most of your time with one or a few (5% or less of the structure) element types, create a partial index for those element types only, and a partial index for all of the others. If you have a standard mathematical operation on num1, num2, etc. you may want to make use of functional indexes to index the result of the calculation. Be sure to create the tables WITHOUT OIDS and be prepared for the dataload to take a while, and CLUSTER the table based on your most commonly used index (once they've been setup). To help with speed, we would need to see EXPLAIN ANALYZE results and the query being performed. On Tue, 2003-07-22 at 21:00, Castle, Lindsay wrote: All rows have the same structure, the data itself will be different for each row, the structure is something like this: element date num1 num2 num3 num4 units Thanks, Lindsay Castle EDS Australia Midrange Distributed Tools Infrastructure Tools AP Ph: +61 (0)8 8464 7101 Fax: +61 (0)8 8464 2135 -Original Message- From: Rod Taylor [mailto:[EMAIL PROTECTED] Sent: Wednesday, 23 July 2003 10:24 AM To: Castle, Lindsay Cc: Postgresql Performance Subject: Re: One table or many tables for data set On Tue, 2003-07-22 at 20:34, Castle, Lindsay wrote: Hi all, I'm working on a project that has a data set of approximately 6million rows with about 12,000 different elements, each element has 7 columns of data. Are these 7 columns the same for each element? signature.asc Description: This is a digitally signed message part
Re: [PERFORM] One table or many tables for data set
Apologies, let me clear this up a bit (hopefully) :-) The data structure looks like this: element date num1 num2 num3 num4 units There are approx 12,000 distinct elements for a total of about 6 million rows of data. The scanning technology I want to use may need a different number of rows and different columns depending on the scan formula; eg scan1 may need num1, num2 and num3 from the last 200 rows for element x scan2 may need num1, units from the last 10 rows for element y I can either do the scans and calculate what i need within SQL or drag the data out and process it outside of SQL, my preference is to go inside SQL as I've assumed that would be faster and less development work. If I went with the many tables design I would not expect to need to join between tables, there is no relationship between the different elements that I need to cater for. Cheers, Linz Castle, Lindsay wrote and snipped: I'm working on a project that has a data set of approximately 6million rows with about 12,000 different elements, each element has 7 columns of data. I'm wondering what would be faster from a scanning perspective (SELECT statements with some calculations) for this type of set up; one table for all the data one table for each data element (12,000 tables) one table per subset of elements (eg all elements that start with a in a table) I, for one, am having difficulty understanding exactly what your data looks like, so it's hard to give advice. Maybe some concrete examples of what you are calling rows, elements, and columns would help. Does each of 6 million rows have 12000 elements, each with 7 columns? Or do you mean that out of 6 million rows, there are 12000 distinct kinds of elements? Can I do anything with Indexing to help with performance? I suspect for the majority of scans I will need to evaluate an outcome based on 4 or 5 of the 7 columns of data. Again, this isn't clear to me -- but maybe I'm just being dense ;-) Does this mean you expect 4 or 5 items in your WHERE clause? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] One table or many tables for data set
Thanks Rod My explanations will be better next time. :-) -Original Message- From: Rod Taylor [mailto:[EMAIL PROTECTED] Sent: Wednesday, 23 July 2003 11:41 AM To: Castle, Lindsay Cc: Postgresql Performance Subject: Re: One table or many tables for data set On Tue, 2003-07-22 at 21:50, Rod Taylor wrote: Ok.. Unless I'm missing something, the data will be static (or near static). It also sounds as if the structure is common for elements, so you probably only want 2 tables. I misunderstood. Do what Joe suggested. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] One table or many tables for data set
On Tue, 2003-07-22 at 21:50, Rod Taylor wrote: Ok.. Unless I'm missing something, the data will be static (or near static). It also sounds as if the structure is common for elements, so you probably only want 2 tables. I misunderstood. Do what Joe suggested. signature.asc Description: This is a digitally signed message part
[PERFORM] One table or many tables for data set
Hi all, I'm working on a project that has a data set of approximately 6million rows with about 12,000 different elements, each element has 7 columns of data. I'm wondering what would be faster from a scanning perspective (SELECT statements with some calculations) for this type of set up; one table for all the data one table for each data element (12,000 tables) one table per subset of elements (eg all elements that start with a in a table) The data is static once its in the database, only new records are added on a regular basis. I'd like to run quite a few different formulated scans in the longer term so having efficient scans is a high priority. Can I do anything with Indexing to help with performance? I suspect for the majority of scans I will need to evaluate an outcome based on 4 or 5 of the 7 columns of data. Thanks in advance :-) Linz ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] One table or many tables for data set
Castle, Lindsay wrote: The data structure looks like this: element date num1 num2 num3 num4 units There are approx 12,000 distinct elements for a total of about 6 million rows of data. Ahh, that helps! So are the elements evenly distributed, i.e. are there approx 500 rows of each element? If so, it should be plenty quick to put all the data in one table with an index on element (and maybe a multicolumn key, depending on other factors). The scanning technology I want to use may need a different number of rows and different columns depending on the scan formula; eg scan1 may need num1, num2 and num3 from the last 200 rows for element x scan2 may need num1, units from the last 10 rows for element y When you say last X rows, do you mean sorted by date? If so, you might want that index to be on (element, date). Then do: SELECT num1, num2, num3 FROM mytable WHERE element = 'an_element' order by date DESC LIMIT 20; Replace num1, num2, num3 by whatever columns you want, and LIMIT X as the number of rows you want. HTH, Joe ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match