Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 8:40 PM, Denis Lussier denis.luss...@enterprisedb.com wrote: Hi all, As the author of BenchmarkSQL and the founder of EnterpriseDB I can assure you that BenchmarkSQL was NOT written specifically for PostgreSQL.It is intended to be a completely database agnostic tpc-c like java based benchmark. With the exception that it analyzes Postgres tables but not Oracle or InnoDB, I agree with that. The goal of BenchmarkSQL was to be a database agnostic benchmark kit. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 6:28 AM, Sergio Lopez sergio.lo...@nologin.eswrote: Hi, I've made a benchmark comparing PostgreSQL, MySQL and Oracle under three environments: GNU/Linux-x86, Solaris-x86 (same machine as GNU/Linux) and Solaris-SPARC. I think you might find it interesting: http://blogs.nologin.es/slopez/archives/17-Benchmarking-Databases-I.-Volatile-Storage..html Sorry Segio, In addition to violating your Oracle license, you need to learn a couple things about benchmarking. First of all, you need to do some research on the benchmark kit itself, rather than blindly downloading and using one. BenchmarkSQL has significant bugs in it which affect the result. I can say that authoritatively as I worked on/with it for quite awhile. Don't trust any result that comes from BenchmarkSQL. If you fix the bugs, Oracle (out of the box in OLTP config) will come out 60%. Oracle comes out twice as fast as PG on Linux. And, unless you're using a significant number of warehouses, MySQL+InnoDB will come out better than PG as well. Second, I didn't see anything in your Oracle settings for parallelism and I/O tuning. Did you set them? And, based on what you presented, you didn't set configure the SGA appropriately given the hardware mentioned. What was your log buffer set to? Third, did you manually analyze the Oracle/MySQL databases, because BenchmarkSQL will automatically analyze Postgres' tables to help the optimizer... did you do the same for the other databases? Fourth, it didn't look like you tuned PG properly either. What was shared_buffers, wal_buffers, and wal_sync_method set to? Fifth, did you do an out-of-the-box install of Oracle, or a custom one? If out of the box, did you choose OLTP or General? There's lots of other things I could go on about in regard to flushing all the caches prior to starting the benchmarks, filesystem options, etc. Not trying to be rude, but *THIS* is why Oracle, IBM, Microsoft, et al. don't want people running benchmarks without their permission. When performing benchmarks, there are a lot of things to take into consideration. If you're just performing out-of-the-box tests, then that's fine, but you have to make sure the benchmark kit doesn't optimize itself for any one of those databases (which it does for PG). -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 1:15 PM, Sergio Lopez sergio.lo...@nologin.eswrote: On the other hand, I've neved said that what I've done is the Perfect-Marvelous-Definitive Benchmark, it's just a personal project, and I don't have an infinite amount of time to invest on it. When you make comments such as As for databases, both Oracle and MySQL show nice numbers, but it's PostgreSQL who stands in the top, giving consistent results with each environment and workload, you should make sure that your test is correct. Otherwise you're making statements without any real basis-in-fact. Having this said, the benchmark is not as unfair as you thought. I've taken care to prepare all databases to meet similar values for their cache, buffers and I/O configuration (to what's possible given their differences), and the I've left the rest as comes by default (for Oracle I've used the OLTP template). Oracle's buffer cache is different than Postgres'. And there are several other tuning paramaters which control how the buffer cache and I/O between cache and disk is performed. Making them the same size means nothing. And, as I said, you still didn't mention other important tuning parameters in MySQL, Postgres, or Oracle. So either you don't know about them, or you didn't bother to tune them, which is odd if you were trying to run a truly comparative benchmark. Yes, BenchmarkSQL is NOT the perfect tool for database benchmarking and it is NOT a valid TPC-C test (I've made this clear in the article), but I've looked at its source (you assume I blindly used it, but actually I've even made some changes to make it work with Ingres for other purposes) and I find it fair enough due to the simplicity of the queries it executes. I found no other evident optimization than the vacuum analyze in the LoadData application. Did you fix the bug in, I believe, the Order Status transaction that can cause an endless loop? I would call giving the Postgres optimizer correct statistics and leaving Oracle and MySQL with defaults an optimization. Obviously, you can optimize the queries to perform better in Oracle, the same way you can do with any other DB, but doing that would be cheating. The key here is to keep the queries as simple as possible, and BenchmarkSQL does this nicely. BenchmarkSQL is flawed. You need to review the code more closely. Of course, my benchmark it's somewhat peculiar by the fact (that you haven't mentioned) that all databases files reside in volatile storage (RAM) by using tmpfs, which makes something similar (but not the same) as using DIRECT_IO with an extremly fast storage. But, again, all databases are given equal consideration. You're right, it's not the same. Oracle can benefit by using real direct I/O, not half-baked simulations which still cause double-buffering between the linux page cache and the database buffer cache. Finally, about the license issue, (also) not trying to be rude, forbiding people to publish benchmark of their products is simply stupid (and it lacks for legal basis in most countries). The only reason they do this is to scare kids and be able to make up their own results. Of course, if you allow people to publish benchmarks there will be some loosely done, but also there'll be others properly made (and made by people non-related with any database vendor). Your benchmark was flawed. You made condescending statements about Oracle and MySQL based on your bad data. That's why they don't let you do it. IMHO, worse than having loosely done benchmarks is having people saying things like if you fix the bugs, Oracle (out of the box in OLTP config) will come out 60% or Oracle comes out twice as fast as PG on Linux without any proof to support this words. At least, benchmarks are refutable by using logic. Your benchmark was flawed, you didn't tune correctly, and you made statements based on bad data; refute that logic :) -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 2:35 PM, Robert Haas robertmh...@gmail.com wrote: First of all, you need to do some research on the benchmark kit itself, rather than blindly downloading and using one. BenchmarkSQL has significant bugs in it which affect the result. I can say that authoritatively as I worked on/with it for quite awhile. Don't trust any result that comes from BenchmarkSQL. If you fix the bugs, Oracle (out of the box in OLTP config) will come out 60%. 60% what? Faster than PG 8.3-dev with 100 warehouses (when I last tested it). Oracle comes out twice as fast as PG on Linux. And, unless you're using a significant number of warehouses, MySQL+InnoDB will come out better than PG as well. I can believe that MySQL could come out faster than PG because I've had previous experience with it being blindingly fast. Of course I've also had experience with it having amazingly poor data integrity. That was MySQL+InnoDB. I haven't really had any integrity problems in that configuration. I would be pretty surprised if Oracle were in general twice as fast as PG - what are they doing that much better than what we're doing? I could certainly imagine it being true in cases that rely on specific features we lack (e.g. join removal)? DIO + AIO + multiple DBWR processes + large buffer cache + properly sized logs/log buffers makes a big difference. There are also several other concurrency-related tunables which contribute to it as well. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 2:48 PM, Jonah H. Harris jonah.har...@gmail.comwrote: Having this said, the benchmark is not as unfair as you thought. I've taken care to prepare all databases to meet similar values for their cache, buffers and I/O configuration (to what's possible given their differences), and the I've left the rest as comes by default (for Oracle I've used the OLTP template). Oracle's buffer cache is different than Postgres'. And there are several other tuning paramaters which control how the buffer cache and I/O between cache and disk is performed. Making them the same size means nothing. And, as I said, you still didn't mention other important tuning parameters in MySQL, Postgres, or Oracle. So either you don't know about them, or you didn't bother to tune them, which is odd if you were trying to run a truly comparative benchmark. Also forgot to ask, what block size did you use in Oracle? You mentioned tuning the shared pool, but you didn't specify db_cache_size or whether you were using automatic SGA tuning. Were those not tuned? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Benchmark comparing PostgreSQL, MySQL and Oracle
On Fri, Feb 20, 2009 at 3:40 PM, Merlin Moncure mmonc...@gmail.com wrote: ISTM you are the one throwing out unsubstantiated assertions without data to back it up. OP ran benchmark. showed hardware/configs, and demonstrated result. He was careful to hedge expectations and gave rationale for his analysis methods. As I pointed out in my last email, he makes claims about PG being faster than Oracle and MySQL based on his results. I've already pointed out significant tuning considerations, for both Postgres and Oracle, which his benchmark did not take into account. This group really surprises me sometimes. For such a smart group of people, I'm not sure why everyone seems to have a problem pointing out design flaws, etc. in -hackers, yet when we want to look good, we'll overlook blatant flaws where benchmarks are concerned. If you think he's wrong, instead of picking on him why don't you run some tests showing alternative results and publish them...leave off the oracle results or use a pseudo-name or something. One of these days I'll get some time and post my results. I'm just pointing out obvious flaws in this benchmark. If Sergio wants to correct them and/or qualify them, that's cool with me. I just don't like people relying on questionable and/or unclear data. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [PERFORM] Identifying the nature of blocking I/O
On Fri, Aug 22, 2008 at 7:52 AM, Peter Schuller [EMAIL PROTECTED] wrote: Is there currently a way of dumping such information? I.e., asking PG what are backends waiting on right now?. Unfortunately, not within Postgres itself. The question, what is the database waiting on? is a good one, and one Oracle understood in the early 90's. It is for that reason that EnterpriseDB added RITA, the Runtime Instrumentation and Tracing Architecture, to their Advanced Server product. RITA gives DBAs some of the same information as the Oracle Wait Interface does regarding what the database is waiting for, such as locks, I/O, and which relation/block. While it's not as efficient as DTrace due to Linux's lack of a good high-resolution user-mode timer, no one has found it to have a noticible overhead on the throughput of a system in benchmarks or real-world applications. If you're on a DTrace platform, I would suggest using it. Otherwise, you can try and use strace/ltrace on Linux, but that's probably not going to get you the answers you need quickly or easily enough. Until enough users ask for this type of feature, the community isn't going to see it as valuable enough to add to the core engine. IIRC, systemtap is pretty much dead :( -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fusion-io ioDrive
On Mon, Jul 7, 2008 at 9:23 AM, Merlin Moncure [EMAIL PROTECTED] wrote: I have a lot of problems with your statements. First of all, we are not really talking about 'RAM' storage...I think your comments would be more on point if we were talking about mounting database storage directly from the server memory for example. Sever memory and cpu are involved to the extent that the o/s using them for caching and filesystem things and inside the device driver. I'm not sure how those cards work, but my guess is that the CPU will go 100% busy (with a near-zero I/O wait) on any sizable workload. In this case, the current pgbench configuration being used is quite small and probably won't resemble this. Also, your comments seem to indicate that having a slower device leads to higher concurrency because it allows the process to yield and do other things. This is IMO simply false. Argue all you want, but this is a fairly well known (20+ year-old) behavior. With faster storage cpu loads will increase but only because the overall system throughput increases and cpu/memory 'work' increases in terms of overall system activity. Again, I said that response times (throughput) would improve. I'd like to see your argument for explaining how you can handle more CPU-only operations when 0% of the CPU is free for use. Presumably as storage approaches speedsof main system memory the algorithms of dealing with it will become simpler (not having to go through acrobatics to try and making everything sequential) and thus faster. We'll have to see. I also find the remarks of software 'optimizing' for strict hardware assumptions (L1+L2) cache a little suspicious. In some old programs I remember keeping a giant C 'union' of critical structures that was exactly 8k to fit in the 486 cpu cache. In modern terms I think that type of programming (sans some specialized environments) is usually counter-productive...I think PostgreSQL's approach of deferring as much work as possible to the o/s is a great approach. All of the major database vendors still see an immense value in optimizing their algorithms and memory structures for specific platforms and CPU caches. Hence, if they're *paying* money for very-specialized industry professionals to optimize in this way, I would hesitate to say there isn't any value in it. As a fact, Postgres doesn't have those low-level resources, so for the most part, I have to agree that they have to rely on the OS. -Jonah -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fusion-io ioDrive
On Tue, Jul 1, 2008 at 8:18 PM, Jeffrey Baker [EMAIL PROTECTED] wrote: Basically the ioDrive is smoking the RAID. The only real problem with this benchmark is that the machine became CPU-limited rather quickly. That's traditionally the problem with everything being in memory. Unless the database algorithms are designed to exploit L1/L2 cache and RAM, which is not the case for a disk-based DBMS, you generally lose some concurrency due to the additional CPU overhead of playing only with memory. This is generally acceptable if you're going to trade off higher concurrency for faster service times. And, it isn't only evidenced in single systems where a disk-based DBMS is 100% cached, but also in most shared-memory clustering architectures. In most cases, when you're waiting on disk I/O, you can generally support higher concurrency because the OS can utilize the CPU's free cycles (during the wait) to handle other users. In short, sometimes, disk I/O is a good thing; it just depends on what you need. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hot Issue
On Wed, Jul 2, 2008 at 8:31 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: Performance of Hot was much better on 30June as compared to 2nd July. Did you happen to VACUUM FULL or CLUSTER anything? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hot Issue
On Wed, Jul 2, 2008 at 9:11 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: hot have a limitation that it do not work if, the index column is updated. But that not the case over here. Another limitation is that HOT won't work if there's not enough space to fit the update on the same page. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hot Issue
On Wed, Jul 2, 2008 at 9:44 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: ok.. But we have set fill_factor = 80 for all the indexes on table1. You need fill factor for the heap table, not the index. Is there a way to check if the page is fill and the update is going on a new page ?? IIRC, I don't think so. I think you'd have to u se something like pg_filedump to see if you have rows migrated to other blocks due to updates. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Mon, Jun 30, 2008 at 9:16 AM, Marko Kreen [EMAIL PROTECTED] wrote: But I want to clarify it's goal - it is not to run pre-determined queries. It is to run pre-determined complex transactions. Yes. And to make those work in a federated database takes huge amount of complexity that PL/Proxy simply sidesteps. At the price of requiring function-based API. But as the function-based API has other advantages even without PL/Proxy, it seems fine tradeoff. Agreed. PL/Proxy has its own set of advantages. As usual, it really just depends on the application and its requirements. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 4:33 PM, kevin kempter [EMAIL PROTECTED] wrote: Anyone have any experiences to share per setting up a federated architecture with PostgreSQL ? I wonder if the dblink contrib works well in a federated scenario, specifically in the setup of the federated views which equate to a select * from the same table on each federated server ? Because Postgres currently lacks the ability to push down predicates to individual nodes over a database link, you have to spend a good amount of time writing PL set-returning functions capable of adding appropriate WHERE clauses to queries sent over the link. There are other things you can do, but it's mostly hackery at this point in time. IIRC, David Fetter is trying to get some of the required predicate information exposed for use in DBI-Link. Not to self-plug, but if you require it, EnterpriseDB includes Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support predicate push-down. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 5:41 PM, Josh Berkus [EMAIL PROTECTED] wrote: Not to self-plug, but if you require it, EnterpriseDB includes Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support predicate push-down. Also check out Skytools: http://skytools.projects.postgresql.org/doc/ Hmm, I didn't think the Skype tools could really provide federated database functionality without a good amount of custom work. Or, am I mistaken? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 6:31 PM, Josh Berkus [EMAIL PROTECTED] wrote: Sure, what do you think pl/proxy is for? Well, considering that an application must be written specifically to make use of it, and for very specific scenarios, I wouldn't consider it as making PostgreSQL a federated database. The pl/proxy architecture certainly doesn't resemble federated in the sense of the other database vendors. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Wed, Jun 25, 2008 at 11:24 AM, Greg Smith [EMAIL PROTECTED] wrote: SMART doesn't detect 100% of drive failures in advance, but you'd be silly to setup a database system where you don't get to take advantage of the ~50% it does catch before you lose any data. Can't argue with that one. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big O notation for postgres?
On Wed, May 21, 2008 at 10:10 AM, H. Hall [EMAIL PROTECTED] wrote: Does anyone know if there is a source that provides Big O notation for postgres's aggregate functions and operations? For example is count(*) = O(1) or O(n)? I don't know of any document containing the complexity of each aggregate, but it's sometimes left as a comment in the souce code. IIRC, COUNT (non-distinct) is currently O(n), where n also includes evaluation of tuples not represented in the final count (due to Postgres' MVCC design). -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: Query Optimization with Kruskal’s Algorithm
Repost to -hackers, you're more likely to get a response on this topic. On Sat, May 10, 2008 at 1:31 PM, Rauan Maemirov [EMAIL PROTECTED] wrote: On May 8, 2:09 am, [EMAIL PROTECTED] (Alexander Staubo) wrote: On 5/7/08, Tarcizio Bini [EMAIL PROTECTED] wrote: I'm working on optimizing queries using the Kruskal algorithm (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4318118). That paper looks very interesting. I would love to hear what the PostgreSQL committers think of this algorithm. Alexander. -- Sent via pgsql-performance mailing list ([EMAIL PROTECTED]) To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance I also would like to hear from them. But seems like the thread is loosed in tonn of other threads. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: Re: [PERFORM] Re: Query Optimization with Kruskal’s Algorithm
On Sat, May 10, 2008 at 5:12 PM, Tom Lane [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: Repost to -hackers, you're more likely to get a response on this topic. Probably not, unless you cite a more readily available reference. (I dropped my IEEE membership maybe fifteen years ago ...) Yeah, I don't have one either. Similarly, I couldn't find anything applicable to the PG implementation except references to the paper. Wikipedia has the algorithm itself (http://en.wikipedia.org/wiki/Kruskal's_algorithm), but I was more interested in the actual applicability to PG and any issues they ran into. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
On Mon, Apr 28, 2008 at 5:37 PM, Adonias Malosso [EMAIL PROTECTED] wrote: Thank you for the answer. Good to know about this enterprise DB feature. No problem. I´ll follow using pgloader. That's fine. Though, I'd really suggest pg_bulkload, it's quite a bit faster. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso [EMAIL PROTECTED] wrote: I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table from ORACLE to PGSQL. The fastest and easiest method would be to dump the data from Oracle into CSV/delimited format using something like ociuldr (http://www.anysql.net/en/ociuldr.html) and load it back into PG using pg_bulkload (which is a helluva lot faster than COPY). Of course, you could try other things as well... such as setting up generic connectivity to PG and inserting the data to a PG table over the database link. Similarly, while I hate to see shameless self-plugs in the community, the *fastest* method you could use is dblink_ora_copy, contained in EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI connection to COPY the data directly from Oracle into Postgres, which also saves you the intermediate step of dumping the data. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] viewing source code
On Dec 14, 2007 2:03 PM, Bill Moran [EMAIL PROTECTED] wrote: I disagree here. If they're connecting remotely to PG, they have no direct access to the disk. pg_read_file? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] viewing source code
On Dec 14, 2007 4:24 PM, Andreas Kretschmer [EMAIL PROTECTED] wrote: Some days ago i have seen a pl/pgsql- code - obfuscator, iirc somewhere under http://www.pgsql.cz/index.php/PostgreSQL, but i don't know how it works, and i can't find the correkt link now, i'm sorry... I started one awhile ago... but it may have been part of my mass purge for disk space. I searched that site and can't find one... but it would be a nice-to-have for a lot of users. Of course, I know it's easy to get around obfuscation, but it makes people *think* it's secure, and as JD always says, it just makes it difficult for the average user to understand what it's doing. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Utilizing multiple cores for one query
On Dec 1, 2007 8:21 AM, henk de wit [EMAIL PROTECTED] wrote: I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3) are able to utilize multiple cores for the execution of a single query? Nope. This is one thing that systems like SQL Server and Oracle have been able to do for quite some time. I haven't seen much in the documentation that hints that this may be possible in PG, nor did I find much in the mailinglists about this. The only thing I found was a topic that discussed some patches that may eventually lead to a sequence scan being handled by multiple cores. I believe the threads you're talking about were related to scanning, not parallel query. Though, when Qingqing and I were discussing parallel query a little over a year ago, I do seem to recall several uninformed opinions stating that sequential scans were the only thing it could be useful for. Could someone shed some light on the current or future abilities of PG for making use of multiple cores to execute a single query? Currently, the only way to parallelize a query in Postgres is to use pgpool-II. http://pgpool.projects.postgresql.org/ -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Utilizing multiple cores for one query
On Dec 1, 2007 9:42 AM, henk de wit [EMAIL PROTECTED] wrote: Wouldn't this offer some opportunities for running things on multiple cores? No, it's not actually parallel in the same sense. Yes, I noticed this project before. At the time it was not really clear how stable and/or how well supported this is. It indeed seems to support parallel queries automatically by being able to rewrite standard queries. It does seem it needs different DB nodes and is thus probably not able to use multiple cores of a single DBMS. I've seen it actually set up to use multiple connections to the same DBMS. How well it would work is pretty much dependent on your application and the amount of parallelization you could actually gain. Also, I could not really find how well pgpool-II is doing at making judgments of the level of parallelization it's going to use. E.g. when there are 16 nodes in the system with a currently low utilization, a single query may be split into 16 pieces. On the other hand, when 8 of these nodes are heavily utilized, splitting to 8 pieces might be better. etc. IIRC, it doesn't plan parallelization that way. It looks at what is partitioned (by default) on different nodes and parallelizes based on that. As I said earlier, you can partition a single node and put pgpool-II on top of it to gain some parallelization. Unfortunately, it isn't capable of handling things like parallel index builds or other useful maintenance features... but it can do fairly good query result parallelization. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] tuning for TPC-C benchmark
On Nov 22, 2007 10:45 AM, Kevin Grittner [EMAIL PROTECTED] wrote: I suggest testing with some form of connection pooling. Yeah, that's one of the reasons I suggested DBT-2. It pools connections and is the most mature TPC-C-like test for Postgres. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 16, 2007 10:56 AM, Dave Dutcher [EMAIL PROTECTED] wrote: I don't know about that. There are times when it is the right plan: Agreed. IMHO, there's nothing wrong with nested-loop join as long as it's being used properly. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 16, 2007 3:36 PM, Josh Trutwin [EMAIL PROTECTED] wrote: Agreed. IMHO, there's nothing wrong with nested-loop join as long as it's being used properly. Can you explain further please? (I'm not disagreeing with you, just want to know when nested loops are not used properly - does the planner make mistakes that you have to watch out for?) As long as statistics are updated properly, it's generally not an issue. You just don't want the system using a nested-loop join incorrectly (like when table sizes are equal, the outer table is larger than the inner table, or the inner table itself is overly large). -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 9, 2007 7:06 AM, Ivan Voras [EMAIL PROTECTED] wrote: I just read this document and thought I should share it with this list: http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf Nice presentation. Thanks for posting it on here. Among other things (FreeBSD advocacy, mostly :) ), it contains a direct comparison between MySQL and PostgreSQL on various platforms, with PostgreSQL winning! :) -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] need help with a query
On 10/20/07, Pavel Velikhov [EMAIL PROTECTED] wrote: Left the query running for 10+ hours and had to kill it. I guess there really was no need to have lots of shared buffers (the hope was that postgresql will cache the whole table). I ended up doing this step inside the application as a pre-processing step. Can't have postgres running with different fsych options since this will be part of an easy to install and run app, that should just require a typical PosgreSQL installation. Is the size always different? If not, you could limit the updates: UPDATE links SET target_size = size FROM articles WHERE articles.article_id = links.article_to AND links.target_size != articles.size; Since this is a huge operation, what about trying: CREATE TABLE links_new AS SELECT l.col1, l.col2, a.size as target_size, l.col4, ... FROM links l, articles a WHERE a.article_id = l.article_to; Then truncate links, copy the data from links_new. Alternatively, you could drop links, rename links_new to links, and recreate the constraints. I guess the real question is application design. Why doesn't this app store size at runtime instead of having to batch this huge update? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] need help with a query
On 10/19/07, Pavel Velikhov [EMAIL PROTECTED] wrote: Hi, I am updating a big table (90M records) with data from another rather large table (4M entries). Here is my update query: update links set target_size = ( select size from articles where articles.article_id = links.article_to) try: UPDATE links SET target_size = size FROM articles WHERE articles.article_id = links.article_to; -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance problems with prepared statements
On 10/10/07, Theo Kramer [EMAIL PROTECTED] wrote: When running the query directly from psql it returns the required rows in less than 100 milli-seconds. However, when using a prepared statement from my C application on the above query and executing it the query duration is as follows ... Row[s] = 25, Duration = 435409.474 ms How are you timing it? Does it really take 435 seconds to complete? Try the following in psql: postgres# PREPARE yourplan (VARCHAR, INT, INT) AS SELECT oid, * FROM calllog WHERE calllog_mainteng = $1 AND calllog_phase = $2 AND calllog_self $3 OR calllog_mainteng = $1 AND calllog_phase 8 ORDER BY calllog_mainteng DESC, calllog_phase DESC, calllog_self DESC limit 25; postgres# EXECUTE yourplan('124 ', 8, 366942); -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Possible explanations for catastrophic performace deterioration?
You didn't specify the database size, but my guess is that the total data size about enough to fit in shared_buffers or kernel cache. On the new system (or dropped/recreated database), it would've all or mostly fit in memory which would make things like count(*) work quickly. On the old database, you probably had a lot of fragmentation which would've caused significantly more I/O to be performed thereby causing a slowdown. You could compare relation sizes to check easily. My guess is that a vacuum full would've brought the other database back up to speed. In the future, you probably want to set fillfactor to a reasonable amount to account for updates-to-blocks-between-vacuum to try and capture as few row-migrations as possible. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Possible explanations for catastrophic performace deterioration?
On 9/23/07, Carlos Moreno [EMAIL PROTECTED] wrote: Wait a second --- am I correct in understanding then that the bloating you guys are referring to occurs *in memory*?? No, bloating occurs on-disk; but this does affect memory. Bloat means that even though your table data may take up 1G after the initial load, due to poor vacuuming, table layouts, etc. it to equal something more... say 2G. The thing is, even though the table only stores 1G of data, it is now physically 2G. So, anything that would need to read the entire table (like COUNT(*)), or large sections of it sequentially, are performing twice as many I/Os to do so. Which means you're actually waiting on two things, I/O and additional CPU time reading blocks that have very little viable data in them. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Possible explanations for catastrophic performance deterioration?
On 9/23/07, Carlos Moreno [EMAIL PROTECTED] wrote: Yes, that part I understand --- I think I now know what the error is in my logic. I was thinking as follows: We read 2GB of which 1900MB are dead tuples. But then, once they're read, the system will only keep in memory the 100MB that are valid tuples. Yes, this is wrong. I'm now thinking that the problem with my logic is that the system does not keep anything in memory (or not all tuples, in any case), since it is only counting, so it does not *have to* keep them, and since the total amount of reading from the disk exceeds the amount of physical memory, then the valid tuples are pushed out of memory. Yes, it does keep some in memory, but not all of it. So, the second time I execute the query, it will still need to scan the disk (in my mind, the way I was seeing it, the second time I execute the select count(*) from customer, the entire customer table would be in memory from the previous time, and that's why I was thinking that the bloating would not explain why the second time it is still slow). Yes, it is still performing additional I/Os and additional CPU work to read bloated data. Am I understanding it right? Now, I think so. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] utilising multi-cpu/core machines?
On 9/5/07, Thomas Finneid [EMAIL PROTECTED] wrote: how does pg utilise multi cpus/cores, i.e. does it use more than one core? and possibly, how, are there any documentation about this. Unlike other systems which manage their own affinity and prioritization, Postgres relies solely on the OS to handle process management across multiple CPUs/cores. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL publishes first real benchmark
On 7/9/07, Jignesh K. Shah [EMAIL PROTECTED] wrote: I think this result will be useful for performance discussions of postgresql against other databases. I'm happy to see an industry-standard benchmark result for PostgreSQL. Great work guys! -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] PostgreSQL publishes first real benchmark
On 7/9/07, Greg Smith [EMAIL PROTECTED] wrote: There's just enough hardware differences between the two configurations that it's not quite a fair fight. For example, the MySQL test had 10K RPM drives in the database server storage array, while the PostgreSQL one had 15K RPM ones. A few other small differences as well if you dig into the configurations, all of which I noted favored the PG system. Agreed. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL publishes first real benchmark
On 7/9/07, Joshua D. Drake [EMAIL PROTECTED] wrote: PostgreSQL still beats MySQL ;) Agreed. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Filesystem Direct I/O and WAL sync option
On 7/9/07, Jim C. Nasby [EMAIL PROTECTED] wrote: BTW, it might be worth trying the different wal_sync_methods. IIRC, Jonah's seen some good results from open_datasync. On Linux, using ext3, reiser, or jfs, I've seen open_sync perform quite better than fsync/fdatasync in most of my tests. But, I haven't done significant testing with direct I/O lately. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [ADMIN] Postgres VS Oracle
On 6/18/07, David Tokmatchi [EMAIL PROTECTED] wrote: Scalability ? Performance? Benchmark ? Availability ? Architecture ? Limitation : users, volumes ? Resouces needed ? Support ? Aside from the Wikipedia database comparison, I'm not aware of any direct PostgreSQL-to-Oracle comparison. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [ADMIN] Postgres VS Oracle
On 6/18/07, Andreas Kostyrka [EMAIL PROTECTED] wrote: As a cynic, I might ask, what Oracle is fearing? As a realist, I might ask, how many times do we have to answer this type of anti-commercial-database flamewar-starting question? -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [PERFORM] [ADMIN] Postgres VS Oracle
On 6/18/07, Joshua D. Drake [EMAIL PROTECTED] wrote: Depends? How many times are you going to antagonize the people that ask? As many times as necessary. Funny how the anti-proprietary-database arguments can continue forever and no one brings up the traditional RTFM-like response of, hey, this was already discussed in thread XXX, read that before posting again. 1. It has *nothing* to do with anti-commercial. It is anti-proprietary which is perfectly legitimate. As long as closed-mindedness is legitimate, sure. 2. Oracle, Microsoft, and IBM have a lot to fear in the sense of a database like PostgreSQL. We can compete in 90-95% of cases where people would traditionally purchase a proprietary system for many, many thousands (if not hundreds of thousands) of dollars. They may well have a lot to fear, but that doesn't mean they do; anything statement in that area is pure assumption. I'm in no way saying we can't compete, I'm just saying that the continued closed-mindedness and inside-the-box thinking only serves to perpetuate malcontent toward the proprietary vendors by turning personal experiences into sacred-mailing-list gospel. All of us have noticed the anti-MySQL bashing based on problems with MySQL 3.23... Berkus and others (including yourself, if I am correct), have corrected people on not making invalid comparisons against ancient versions. I'm only doing the same where Oracle, IBM, and Microsoft are concerned. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [PERFORM] [ADMIN] Postgres VS Oracle
On 6/18/07, Joshua D. Drake [EMAIL PROTECTED] wrote: Yeah funny how you didn't do that ;) (of course neither did I). I agree, an oops on my part :) It is amazing how completely misguided you are in this response. I haven't said anything closed minded. I only responded to your rather antagonistic response to a reasonably innocuous question of: As a cynic, I might ask, what Oracle is fearing? I wasn't responding to you, just to the seemingly closed-mindedness of the original question/statement. We're all aware of the reasons, for and against, proprietary system licenses prohibiting benchmarking. It is a good question to ask, and a good question to discuss. Certainly, but can one expect to get a realistic answer to an, is Oracle fearing something question on he PostgreSQL list? Or was it just a backhanded attempt at pushing the topic again? My vote is for the latter; it served no purpose other than to push the competitiveness topic again. I haven't seen any bashing going on yet. Shall we start with the closed mindedness and unfairness of per cpu license and support models? Not preferably, you make me type too much :) -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [GENERAL] [pgsql-advocacy] [PERFORM] [ADMIN] Postgres VS Oracle
On 6/18/07, Andrew Sullivan [EMAIL PROTECTED] wrote: It would appear that this was the flame-fest that was predicted. Particularly as this has been copied to five lists. If you all want to have an argument about what Oracle should or should not do, could you at least limit it to one list? Yeah, Josh B. asked it to be toned down to the original list which should've been involved. Which I think should be pgsql-admin or pgsql-advocacy... your thoughts? I think the Oracle discussion is over, David T. just needs URL references IMHO. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Thousands of tables versus on table?
On 6/6/07, Craig James [EMAIL PROTECTED] wrote: They're blowing smoke if they think Oracle can do this. Oracle could handle this fine. Oracle fell over dead, even with the best indexing possible, tuned by the experts, and using partitions keyed to the customerID. I don't think so, whoever tuned this likely didn't know what they were doing. It's telling that Oracle's license contract prohibits you from publishing comparisons and benchmarks. You have to wonder why. They did this for the same reason as everyone else. They don't want non-experts tuning the database incorrectly, writing a benchmark paper about it, and making the software look bad. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Thousands of tables versus on table?
On 6/6/07, Craig James [EMAIL PROTECTED] wrote: Last time I checked, Oracle didn't have anything close to this. When did you check, 15 years ago? Oracle has direct-path import/export and data pump; both of which make generic COPY look like a turtle. The new PostgreSQL bulk-loader takes similar concepts from Oracle and is fairly faster than COPY. Don't get me wrong, I'm pro-PostgreSQL... but spouting personal observations on other databases as facts just boasts an PostgreSQL-centric egotistical view of the world. If you don't tune Oracle, it will suck. If you don't understand Oracle architecture when you tune an application, it will suck; just like PostgreSQL. People who don't have extensive experience in the other databases just hear what you say and regurgitate it as fact; which it is not. Look at how many people in these lists still go on and on about MySQL flaws based on their experience with MySQL 3.23. Times change and it doesn't do anyone any good to be ignorant of other databases. If you're going to speak about another database in a comparison, please stay current or specify the database you're comparing against. This is nothing against you, but it always starts an avalanche of, look how perfect we are compared to everyone else. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Thousands of tables versus on table?
On 6/6/07, Andrew Sullivan [EMAIL PROTECTED] wrote: Well, you will always have to deal with the sort of people who will base their technical prescriptions on the shiny ads they read in SuperGlobalNetworkedExecutiveGoFast, or whatever rag they're reading these days. Always. I usually encourage such people actually to perform the analysis of the license, salary, contingency, and migrations costs Yes, this is the best way. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: control of benchmarks (was: [PERFORM] Thousands of tables)
On 6/6/07, Andrew Sullivan [EMAIL PROTECTED] wrote: But I think the above is giving Oracle Corp a little too much credit. Perhaps. However, Oracle has a thousand or so knobs which can control almost every aspect of every subsystem. If you know how they interact with each other and how to use them properly, they can make a huge difference in performance. Most people do not know all the knobs or understand what difference each can make given the theory and architecture of the system, which results in poor general configurations. Arguably, there is a cost associated with having someone staffed and/or consulted that has the depth of knowledge required to tune it in such a manner which goes back to a basic cost/benefit analysis. Oracle, while seeming like a one-size-fits-all system, has the same basic issue as PostgreSQL and everyone else; to get optimum performance, it has to be tuned specifically for the application/workload at hand. Corporations exist to make money, and the reason they prohibit doing anything with their software and then publishing it without their approval is because they want to control all the public perception of their software, whether deserved or not. Of course. Which is why audited benchmarks like SPEC and TPC are around. While they may not represent one's particular workload, they are the only way to fairly demonstrate comparable performance. Every user of any large software system (Oracle or otherwise) has their favourite horror story about the grotty corners of that software; Of course, but they also never say why it was caused. With Oracle, almost all bad-performance cases I've seen are related to improper tuning and/or hardware; even by experienced DBAs. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Thousands of tables versus on table?
On 6/6/07, Craig James [EMAIL PROTECTED] wrote: You didn't read my message. I said that *BOTH* Oracle and Postgres performed well with table-per-customer. Yes, I did. My belief is that Oracle can handle all customers in a single table. The technical question is simple: Table-per-customer or big-table-for-everyone. The answer is, it depends. I agree, it does depend on the data, workload, etc. No one-size-fits-all answer there. The reason I assert (and stand by this) that They're blowing smoke when they claim Oracle has the magic cure, is because Oracle and Postgres are both relational databases, they write their data to disks, and they both have indexes with O(log(N)) retrieval/update times. Oracle doesn't have a magical workaround to these facts, nor does Postgres. Agreed that they are similar on the basics, but they do use significantly different algorithms and optimizations. Likewise, there is more tuning that can be done with Oracle given the amount of time and money one has to spend on it. Again, cost/benefit analysis on this type of an issue... but you're right, there is no magic cure. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] setting up raid10 with more than 4 drives
On 5/29/07, Luke Lonergan [EMAIL PROTECTED] wrote: AFAIK you can't RAID1 more than two drives, so the above doesn't make sense to me. Yeah, I've never seen a way to RAID-1 more than 2 drives either. It would have to be his first one: D1 + D2 = MD0 (RAID 1) D3 + D4 = MD1 ... D5 + D6 = MD2 ... MD0 + MD1 + MD2 = MDF (RAID 0) -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On 4/28/07, Harald Armin Massa [EMAIL PROTECTED] wrote: about your feature proposal: as I learned, nearly all Perfomance.Configuration can be done by editing the .INI file and making the Postmaster re-read it. Um, shared_buffers is one of the most important initial parameters to set and it most certainly cannot be set after startup. So, WHY at all should those parameters be guessed at the installation of the database? Because a lot of good assumptions can be made on the initial install. Likewise, some of the most important parameters cannot be tuned after startup. Maybe there is even a pointy flashy version possible (perhaps even for money :) which gives nice graphics and optimized, like those Windows Optimizers. :) I am sure, some DBAs in BIGCOMPs would be thrilled :) I'd suggest that you not make snide remarks about someone else's design when your own analysis is somewhat flawed. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance of count(*)
On 3/22/07, Merlin Moncure [EMAIL PROTECTED] wrote: As others suggest select count(*) from table is very special case which non-mvcc databases can optimize for. Well, other MVCC database still do it faster than we do. However, I think we'll be able to use the dead space map for speeding this up a bit wouldn't we? -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] increasing database connections
On 3/1/07, Shiva Sarna [EMAIL PROTECTED] wrote: I am sorry if it is a repeat question but I want to know if database performance will decrease if I increase the max-connections to 2000. At present it is 100. Most certainly. Adding connections over 200 will degrade performance dramatically. You should look into pgpool or connection pooling from the application. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] profiling PL/pgSQL?
On 11/3/06, Richard Huxton dev@archonet.com wrote: There's a GUI debugger from EnterpriseDB I believe, but I've no idea how good it is. Any users/company bods care to let us know? If you visit: http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/edb-debugger/#dirlist We have both a PL/pgSQL profiler and tracer available. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizer internals
On 16 Jun 2006 07:23:26 -0400, Greg Stark [EMAIL PROTECTED] wrote: The flip side is that Oracle and others like it have to do a lot of extra footwork to do if you query data that hasn't been committed yet. That footwork has performance implications. Not disagreeing here at all, but considering that Oracle, DB2, and SQL Server, et al have proven themselves to perform extremely well under heavy load (in multiple benchmarks), the overhead of an UNDO implementation has a calculable break even point. Feel free to debate it, but the optimistic approach adopted by nearly every commercial database vendor is *generally* a better approach for OLTP. Consider Weikum Vossen (p. 442): We also need to consider the extra work that the recovery algorithm incurs during normal operation. This is exactly the catch with the class of no-undo/no-redo algorithms. By and large, they come at the expense of a substantial overhead during normal operations that may increase the execution cost per transaction by a factor of two or even higher. In other words, it reduces the achievable transaction throughput of a given server configuration by a factor of two or more. Now, if we're considering UPDATES (the worst case for PostgreSQL's current MVCC architecture), then this is (IMHO) a true statement. There aren't many *successful* commercial databases that incur the additional overhead of creating another version of the record, marking the old one as having been updated, inserting N-number of new index entries to point to said record, and having to WAL-log all aforementioned changes. I have yet to see any successful commercial RDBMS using some sort of no-undo algorithm that doesn't follow the, factor of two or more performance reduction. However, if you consider an INSERT or DELETE in PostgreSQL, those are implemented much better than in most commercial database systems due to PostgreSQL's MVCC design. I've done a good amount of research on enhancing PostgreSQL's MVCC in UPDATE conditions and believe there is a nice happy medium for us. /me waits for the obligatory and predictable, the benchmarks are flawed response. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizer internals
On 16 Jun 2006 09:21:01 -0400, Greg Stark [EMAIL PROTECTED] wrote: Well Oracle has to do almost all that same work, it's just doing it in a separate place called a rollback segment. Well, it's not really the same work. The process by which Oracle manages UNDO is actually pretty simple and efficient, but complex in its implementation. There has also been some significant performance improvements in this area in both 9i and 10g. There are pros and cons especially where it comes to indexes, but also where it comes to what happens when the new record is larger than the old one. Certainly, you want to avoid row chaining at all costs; which is why PCTFREE is there. I have researched update-in-place for PostgreSQL and can avoid row-chaining... so I think we can get the same benefit without the management and administration cost. IMHO the biggest problem Postgres has is when you're updating a lot of records in a table with little free space. Yes, this is certainly the most noticible case. This is one reason I'm behind the freespace patch. Unfortunately, a lot of inexperienced people use VACUUM FULL and don't understand why VACUUM is *generally* better.(to free up block-level freespace and update FSM) assuming they have enough hard disk space for the database. That and of course the visibility bitmap that has been much-discussed I'd certainly like to see it. I wouldnt' say the benchmarks are flawed but I also don't think you can point to any specific design feature and say it's essential just on the basis of bottom-line results. You have to look at the actual benefit the specific wins. True. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres fsync off (not needed) with NetApp
On 14 Jun 2006 23:33:53 -0400, Greg Stark [EMAIL PROTECTED] wrote: In fact the benefit of the NVRAM is precisely that it makes sure you *don't* have any reason to turn fsync off. It should make the fsync essentially free. Having run PostgreSQL on a NetApp with input from NetApp, this is correct. fsync should be turned on, but you will not incur the *real* direct-to-disk cost of the sync, it will be direct-to-NVRAM. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgres fsync off (not needed) with NetApp
On 6/15/06, Dan Gorman [EMAIL PROTECTED] wrote: shelfs. Any generic advice other than the NetApp (their NFS oracle tuning options) that might be useful? (e.g. turning off snapshots) I was using PostgreSQL on a 980c, but feature-wise they're probably pretty close. What type of application are you running? OLTP? If so, what type of transaction volume? Are you planning to use any Flex* or Snap* features? What type of volume layouts are you using? -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres fsync off (not needed) with NetApp
On 6/15/06, Jonah H. Harris [EMAIL PROTECTED] wrote: On 6/15/06, Dan Gorman [EMAIL PROTECTED] wrote: shelfs. Any generic advice other than the NetApp (their NFS oracle tuning options) that might be useful? (e.g. turning off snapshots) I was using PostgreSQL on a 980c, but feature-wise they're probably pretty close. What type of application are you running? OLTP? If so, what type of transaction volume? Are you planning to use any Flex* or Snap* features? What type of volume layouts are you using? Also, you mentioned NFS... is that what you were planning? If you licensed iSCSI, it's a bit better for the database from a performance angle. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
On 5/30/06, Waldomiro [EMAIL PROTECTED] wrote: Is there a command to Insert a record If It does not exists and a update if It exists? Sure, it's called MERGE. See http://en.wikipedia.org/wiki/Merge_%28SQL%29 I mean the postgres should test if a record exist before insert and if It exist then the postgres must do an update instead an insert. PostgreSQL does not support MERGE at the moment, sorry. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Last night I implemented a non-recursive introsort in C... let me test it a bit more and then I'll post it here for everyone else to try out.On 2/16/06, Markus Schaber [EMAIL PROTECTED] wrote:Hi, Ron, Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods.Sounds interesting, could you give us some pointers (names, URLs,papers) to such algorithms?Thanks a lot,Markus--Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GISFight against software patents in EU! www.ffii.org www.nosoftwarepatents.org---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org-- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation732.331.1324