Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); Thank you for this help Dawid, I'll have to take some time to look at this suggestion. If I must create a domain with all the departments I'll have a problem because the user is creating and deleting departments as it pleases him. Any counter-ideas? I have exactly the same problem with my proposal [1] I just wish there would be some native rows-to-columns aggregate. The other approach I used was something like this: SELECT product_id, a, b, c FROM (SELECT product_id, a FROM pdt) AS a FULL OUTER JOIN USING(product_id) (SELECT product_id, b FROM pdt) AS b FULL OUTER JOIN USING(product_id) (SELECT product_id, c FROM pdt) AS c; ...or similar (I'm typing from memory ;)). Anyway it was good for getting whole table, but performance well, wasn't the gratest. ;)). Regards, Dawid [1]: I was thinking about a trigger on a departaments table, and then recreating the aggregate and view as needed, but it isn't the kind of dynamic I had in mind. ;) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Fri, 28 Jan 2005 09:07:59 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); Thank you for this help Dawid, I'll have to take some time to look at this suggestion. If I must create a domain with all the departments I'll have a problem because the user is creating and deleting departments as it pleases him. Any counter-ideas? I have exactly the same problem with my proposal [1] I just wish there would be some native rows-to-columns aggregate. [1]: I was thinking about a trigger on a departaments table, and then recreating the aggregate and view as needed, but it isn't the kind of dynamic I had in mind. ;) Yep, this is the only thing I also tought: a trigger to add / remove columns when the user add or remove a department... but this is not exactly what I wanted (this is not a very nice db design, from my perspective). Thank you for you help. -- Alexandre Leclerc ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Mon, Jan 24, 2005 at 01:28:29AM +0200, Hannu Krosing wrote: IIRC it hates pg_dump mainly on master. If you are able to run pg_dump from slave, it should be ok. For the sake of the archives, that's not really a good idea. There is some work afoot to solve it, but at the moment dumping from a slave gives you a useless database dump. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
At this point I will interject a couple of benchmark numbers based on a new system we just configured as food for thought. System A (old system): Compaq Proliant Dual Pentium III 933 with Smart Array 5300, one RAID 1, one 3 Disk RAID 5 on 10k RPM drives, 2GB PC133 RAM. Original Price: $6500 System B (new system): Self Built Dual Opteron 242 with 2x3ware 9500S-8MI SATA, one RAID 1 (OS), one 4 drive RAID 10 (pg_xlog), one 6 drive RAID 10 (data) on 10k RPM Raptors, 4GB PC3200 RAM. Current price $7200 System A for our large insert job: 125 minutes System B for our large insert job: 10 minutes. There is no logical way there should be a 12x performance difference between these two systems, maybe 2x or even 4x, but not 12x Bad controler cards/configuration will seriously ruin your day. 3ware escalade cards are very well supported on linux, and work excellently. Compaq smart array cards are not. Bonnie++ benchmarks show a 9MB/sec write, 29MB/sec read on the RAID 5, but a 172MB/sec write on the 6xRAID 10, and 66MB/sec write on the RAID 1 on the 3ware. With the right configuration you can get very serious throughput. The new system is processing over 2500 insert transactions per second. We don't need more RAM with this config. The disks are fast enough. 2500 transaction/second is pretty damn fast. Alex Turner On Fri, 28 Jan 2005 10:31:38 -0500, Andrew Sullivan [EMAIL PROTECTED] wrote: On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote: I was thinking the same! I'd like to know how other databases such as Oracle do it. You mean how Oracle does it. They're the only ones in the market that really have this technology. A -- Andrew Sullivan | [EMAIL PROTECTED] This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor Performance on Postgres 8.0
Tom Lane wrote: Pallav Kalva [EMAIL PROTECTED] writes: I am running this query on postgres 8 beta version and it is not using the right index, where as if i run the same query on postgres 7.4 version it uses the right index . 1. Beta which, exactly? Beta 4 2. Have you ANALYZEd both tables lately? Yes 3. If so, try this to see what it thinks the cost of the reverse plan is: begin; alter table common.string drop constraint pk_string_stringid; explain analyze ... same query ... rollback; what do u mean by rollback exactly ? i can drop the pk constraint and run explain analyze and see how it behaves. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Poor Performance on Postgres 8.0
Pallav Kalva [EMAIL PROTECTED] writes: still doesnt make use of the index on common.attribute table . What do you get from just plain explain analyze select * from common.string text1_ where text1_.value='squareFeet'; I get the impression that it must think this will yield a lot of rows. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Poor Performance on Postgres 8.0
explain analyze select * from common.string text1_ where text1_.value='squareFeet'; QUERY PLAN Seq Scan on string text1_ (cost=0.00..4.41 rows=1 width=21) (actual time=0.283..0.322 rows=1 loops=1) Filter: (value = 'squareFeet'::text) Total runtime: 0.492 ms I am not worried about this table as common.string has only 190 records, where as the other table common.attribute which is very big (200k records) i want it to use index scan on it . The matching column in common.attribute table has only 175 distinct records in common.attribute table , do you think that's the problem ? here is the full query again select attribute0_.attributeid as attribut1_, attribute0_.stringvalue as stringva2_, attribute0_.bigStringvalue as bigStrin3_, attribute0_.integervalue as integerv4_, attribute0_.numericvalue as numericv5_, attribute0_.datevalue as datevalue, attribute0_.booleanvalue as booleanv7_, attribute0_.fknamestringid as fknamest8_ from common.attribute attribute0_, common.string text1_ where (text1_.value='squareFeet' and attribute0_.fknamestringid=text1_.stringid) and (numericValue='775.0') Tom Lane wrote: Pallav Kalva [EMAIL PROTECTED] writes: still doesnt make use of the index on common.attribute table . What do you get from just plain explain analyze select * from common.string text1_ where text1_.value='squareFeet'; I get the impression that it must think this will yield a lot of rows. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Fri, 28 Jan 2005 10:24:37 -0800, Joe Conway [EMAIL PROTECTED] wrote: Alexandre Leclerc wrote: I'm a little bit confused on how to install this contirb. I know my contrib package is installed, but I don't know how to make it work in postgresql. (Using 7.4.5-1mdk on Mandrake Linux.) Find the file tablefunc.sql and redirect it into your database, e.g. psql mydatabase /path/to/contrib/scripts/tablefunc.sql I have no idea where that would be on Mandrake, but you could probably do: locate tablefunc.sql On Fedora Core 1 I find it here: /usr/share/pgsql/contrib/tablefunc.sql Also find and read README.tablefunc. HTH, WHOA! Yess! Exactly the thing! Amazing! :))) Regards, Dawid ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
[EMAIL PROTECTED] (Andrew Sullivan) writes: On Mon, Jan 24, 2005 at 01:28:29AM +0200, Hannu Krosing wrote: IIRC it hates pg_dump mainly on master. If you are able to run pg_dump from slave, it should be ok. For the sake of the archives, that's not really a good idea. There is some work afoot to solve it, but at the moment dumping from a slave gives you a useless database dump. That overstates things a tad; I think it's worth elaborating on a bit. There's a problem with the results of dumping the _schema_ from a Slony-I 'subscriber' node; you want to get the schema from the origin node. The problem has to do with triggers; Slony-I suppresses RI triggers and such like on subscriber nodes in a fashion that leaves the dumped schema a bit broken with regard to triggers. But there's nothing wrong with the idea of using pg_dump --data-only against a subscriber node to get you the data without putting a load on the origin. And then pulling the schema from the origin, which oughtn't be terribly expensive there. -- cbbrowne,@,ca.afilias.info http://dev6.int.libertyrms.com/ Christopher Browne (416) 673-4124 (land) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Bitmap indexes
[EMAIL PROTECTED] (Alex Turner) writes: I was wondering about index types. Oracle has an index type called a 'bitmap' index. They describe this as an index for low cardinality fields, where only the cardinal values are indexed in a b-tree, and then it uses a bitmap below that to describe rows. They say that this type of index is very fast when combined with queries that used the indexed row in 'AND' clauses in a sql statement as the index can 'mask' the results very fast. I have not been able to benchmark the actual effectiveness of this kind of index, but I was wondering if anyone has had experience with this an believes it might be a useful feature for postgres? Yes I have a vested interest in this because alot of my searches are masked against low cardinality fields 'Y' or 'N' type things where this could potentialy benefit me... There are some ideas on this; nothing likely to be implemented in the very short term. If you do a lot of queries on this sort of basis, there's something in PostgreSQL known as a partial index that could be used to improve some queries. What you might do is something like: create index partial_y_for_field_a on some_table (id_column) where field_a = 'Y'; create index partial_n_for_field_a on some_table (id_column) where field_a = 'N'; That could provide speedup for queries that might do joins on id_column where your query has the qualifiers where field_a = 'Y' or where field_a = 'N'. That's not going to provide a generalized answer to star queries, but it is an immediate answer for some cases. -- cbbrowne,@,ca.afilias.info http://dev6.int.libertyrms.com/ Christopher Browne (416) 673-4124 (land) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
PFC wrote: So, here is something annoying with the current approach : Updating rows in a table bloats ALL indices, not just those whose indexed values have been actually updated. So if you have a table with many indexed fields and you often update some obscure timestamp field, all the indices will bloat, which will of course be corrected by VACUUM, but vacuum will have extra work to do. The MVCC approach probably doesn't leave you with many choices here. The index entries point directly to the rows in the table, and since an update creates a new row (it's the equivalent of doing an insert then a delete), all indexes have to be updated to reflect the location of the new row. Unless my understanding of how this works is completely off... -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Poor Performance on Postgres 8.0
Pallav Kalva [EMAIL PROTECTED] writes: On 8 common | attribute | fknamestringid | 0 | 4 | 80 | {2524,2434,2530,2522,2525,2523,2527,2526,2574,2531} | {0.219333,0.199333,0.076,0.064,0.0616667,0.05,0.045,0.042,0.04,0.0286667} | {2437,2528,2529,2538,2539,2540,2554,2562,2575,2584,2637} | 0.0274016 Given those stats, the planner is going to estimate that about 1/80th of the attribute table matches any particular fknamestringid, and that's what's driving it away from using the indexscan. I cannot tell whether there are indeed a couple of thousand rows joining to the 'squareFeet' string row (in which case the condition numericValue='775.0' must be really selective) or whether this is an outlier case that joins to just a few attribute rows. The slightly different stats values for 7.4 would have given it a slightly lower value for the cost of an indexscan by idx_attribute_fknamestringid, but certainly not as low as your original message shows. Perhaps you have some difference in parameter settings in your 7.4 installation --- most likely a lower random_page_cost. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Poor Performance on Postgres 8.0
The random_page_cost value is same on both the versions, the only thing difference between 7.4 version and 8 version is that 7.4 ver has 100k less records. For, now i created index on numericvalue column on attribute table and it used that index and it is much faster that way. it came down to 24msec. Also, i tried to see the matching id for squarefeet in attribute table there are 800 some records in attribute table for 8 version and 700 something in 7.4 version. Tom Lane wrote: Pallav Kalva [EMAIL PROTECTED] writes: On 8 common | attribute | fknamestringid | 0 | 4 | 80 | {2524,2434,2530,2522,2525,2523,2527,2526,2574,2531} | {0.219333,0.199333,0.076,0.064,0.0616667,0.05,0.045,0.042,0.04,0.0286667} | {2437,2528,2529,2538,2539,2540,2554,2562,2575,2584,2637} | 0.0274016 Given those stats, the planner is going to estimate that about 1/80th of the attribute table matches any particular fknamestringid, and that's what's driving it away from using the indexscan. I cannot tell whether there are indeed a couple of thousand rows joining to the 'squareFeet' string row (in which case the condition numericValue='775.0' must be really selective) or whether this is an outlier case that joins to just a few attribute rows. The slightly different stats values for 7.4 would have given it a slightly lower value for the cost of an indexscan by idx_attribute_fknamestringid, but certainly not as low as your original message shows. Perhaps you have some difference in parameter settings in your 7.4 installation --- most likely a lower random_page_cost. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Hervé Piedvache wrote: My point being is that there is no free solution. There simply isn't. I don't know why you insist on keeping all your data in RAM, but the mysql cluster requires that ALL data MUST fit in RAM all the time. I don't insist about have data in RAM but when you use PostgreSQL with big database you know that for quick access just for reading the index file for example it's better to have many RAM as possible ... I just want to be able to get a quick access with a growing and growind database ... If it's an issue of RAM and not CPU power, think about this scenario. Let's just say you *COULD* partition your DB over multiple servers. What are your plans then? Are you going to buy 4 Dual Xeon servers? Ok, let's price that out. For a full-blown rackmount server w/ RAID, 6+ SCSI drives and so on, you are looking at roughly $4000 per machine. So now you have 4 machines -- total of 16GB of RAM over the 4 machines. On the otherhand, let's say you spent that money on a Quad Opteron instead. 4x850 will cost you roughly $8000. 16GB of RAM using 1GB DIMMs is $3000. If you went with 2GB DIMMs, you could stuff 32GB of RAM onto that machine for $7500. Let's review the math: 4X server cluster, total 16GB RAM = $16K 1 beefy server w/ 16GB RAM = $11K 1 beefy server w/ 32GB RAM = $16K I know what I would choose. I'd get the mega server w/ a ton of RAM and skip all the trickyness of partitioning a DB over multiple servers. Yes your data will grow to a point where even the XXGB can't cache everything. On the otherhand, memory prices drop just as fast. By that time, you can ebay your original 16/32GB and get 64/128GB. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Fri, 28 Jan 2005 11:54:57 -0500, Christopher Weimann [EMAIL PROTECTED] wrote: On 01/28/2005-10:59AM, Alex Turner wrote: At this point I will interject a couple of benchmark numbers based on a new system we just configured as food for thought. System A (old system): Compaq Proliant Dual Pentium III 933 with Smart Array 5300, one RAID 1, one 3 Disk RAID 5 on 10k RPM drives, 2GB PC133 RAM. Original Price: $6500 System B (new system): Self Built Dual Opteron 242 with 2x3ware 9500S-8MI SATA, one RAID 1 (OS), one 4 drive RAID 10 (pg_xlog), one 6 drive RAID 10 (data) on 10k RPM Raptors, 4GB PC3200 RAM. Current price $7200 System A for our large insert job: 125 minutes System B for our large insert job: 10 minutes. There is no logical way there should be a 12x performance difference between these two systems, maybe 2x or even 4x, but not 12x Your system A has the absolute worst case Raid 5, 3 drives. The more drives you add to Raid 5 the better it gets but it will never beat Raid 10. On top of it being the worst case, pg_xlog is not on a separate spindle. True for writes, but not for reads. Your system B has a MUCH better config. Raid 10 is faster than Raid 5 to begin with but on top of that you have more drives involved plus pg_xlog is on a separate spindle. I absolutely agree, it is a much better config, thats why we bought it ;).. In system A, the xlog was actualy on the RAID 1, so it was infact on a seperate spindle set. I'd say I am not surprised by your performance difference. I'm not surprised at all that the new system outperformed the old, it's more the factor of improvement. 12x is a _VERY_ big performance jump. Bad controler cards/configuration will seriously ruin your day. 3ware escalade cards are very well supported on linux, and work excellently. Compaq smart array cards are not. Bonnie++ benchmarks show a 9MB/sec write, 29MB/sec read on the RAID 5, but a 172MB/sec write on the 6xRAID 10, and 66MB/sec write on the RAID 1 on the 3ware. What does bonnie say about the Raid 1 on the Compaq? Comparing the two Raid 1s is really the only valid comparison that can be made between these two machines. Other than that you are comparing apples to snow shovels. My main point is that you can spend $7k on a server and believe you have a fast system. The person who bought the original system was under the delusion that it would make a good DB server. For the same $7k a different configuration can yield a vastly different performance output. This means that it's not quite apples to snow shovels. People who _believe_ they have an adequate config are often sorely mistaken, and ask misguided questions about needed 20GB of RAM because the system can't page to disk fast enough, when what they really need is a good RAID 10 with a high quality controler. A six drive RAID 10 is going to run a bit less than 20G of SSD. Alex Turner NetEconomist ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
William Yu [EMAIL PROTECTED] writes: 1 beefy server w/ 32GB RAM = $16K I know what I would choose. I'd get the mega server w/ a ton of RAM and skip all the trickyness of partitioning a DB over multiple servers. Yes your data will grow to a point where even the XXGB can't cache everything. On the otherhand, memory prices drop just as fast. By that time, you can ebay your original 16/32GB and get 64/128GB. a) What do you do when your calculations show you need 256G of ram? [Yes such machines exist but you're not longer in the realm of simply add more RAM. Administering such machines is nigh as complex as clustering] b) What do you do when you find you need multiple machines anyways to divide the CPU or I/O or network load up. Now you need n big beefy servers when n servers 1/nth as large would really have sufficed. This is a big difference when you're talking about the difference between colocating 16 1U boxen with 4G of ram vs 16 4U opterons with 64G of RAM... All that said, yes, speaking as a user I think the path of least resistance is to build n complete slaves using Slony and then just divide the workload. That's how I'm picturing going when I get to that point. Even if I just divide the workload randomly it's easier than building a machine with n times the cpu and i/o. And if I divide the workload up in a way that correlates with data in the database I can probably get close to the same performance as clustering. The actual cost of replicating the unused data is slight. And the simplicity of master-slave makes it much more appealing than full on clustering. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend