Re: [HACKERS] Multi-Dimensional Histograms
On Mon, Jun 29, 2009 at 10:22 PM, Robert Haasrobertmh...@gmail.com wrote: I'm finding myself unable to follow all the terminology on this thead. What's dimension reduction? What's PCA? [snip] Imagine you have a dataset with two variables, say height in inches and age in years. For tue purpose of discussion lets pretend for a moment that all the people in your sample have height the same as their age. You could create a 2d histogram of your data: |0002 |0060 a|0300 g|4000 e|0003 |0010 |0100 | height You could store this 2d histogram as is and use it for all the things you'd use histograms for or you could make an observation of the structure and apply a rotation and flattening of the data and convert it to a 1d histogram [0113426200...] which is far more compact. Often data has significant correlation, so it's often possible to reduce the dimensionality without reducing the selectivity of the histogram greatly. This becomes tremendously important as the number of dimensions goes up because the volume of a N dimensional space increases incredibly fast as the number of dimensions increase. PCA is used as one method of dimensionality reduction. In PCA you find a linear transformation (scaling, rotation) of the data that aligns the data so that the axis lines cut through the data-space in the orientations with the greatest variance. I have no clue how you would apply PCA to postgresql histograms, since to build the PCA transform you need to do some non-trivial operations with the data. Perhaps PCA could be done on a random sample of a table, then that transformation could be stored and used to compute the histograms. I'm sure there has been a lot of research on this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Significantly larger toast tables on 8.4?
On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout klep...@svana.org wrote: So you compromise. You split the data into say 1MB blobs and compress each individually. Then if someone does a substring at offset 3MB you can find it quickly. This barely costs you anything in the compression ratio mostly. Implementation though, that's harder. The size of the blobs is tunable also. I imagine the optimal value will probably be around 100KB. (12 blocks uncompressed). Or have the database do that internally: With the available fast compression algorithms (zlib; lzo; lzf; etc) the diminishing return from larger compression block sizes kicks in rather quickly. Other algos like LZMA or BZIP gain more from bigger block sizes, but I expect all of them are too slow to ever consider using in PostgreSQL. So, I expect that the compression loss from compressing in chunks of 64kbytes would be minimal. The database could then include a list of offsets for the 64kbyte chunks at the beginning of the field, or something like that. A short substring would then require decompressing just one or two blocks, far less overhead then decompressing everything. It would probably be worthwhile to graph compression ratio vs block size for some reasonable input. I'd offer to do it; but I doubt I have a reasonable test set for this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlock backoff algorithm
On Nov 14, 2007 10:12 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: http://www.intel.com/performance/server/xeon/intspd.htm http://www.intel.com/performance/server/xeon/fpspeed.htm That says precisely nothing about the matter at hand. Someone should simply change it and benchmark it in pgsql. I doubt you'll see a difference there on regular AMD/Intel ... and if it makes the sun hyperthreaded cpu happier... ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] GIST and GIN indexes on varchar[] aren't working in CVS.
There seems to be some behavior change in current CVS with respect to gist and gin indexes on varchar[]. Some side effect of the tsearch2 merge? \d search_pages Table public.search_pages Column |Type | Modifiers ---+-+--- page_name | character varying | cats | character varying[] | Indexes: search_pages_page UNIQUE, btree (page_name) create index search_pages_cats on search_pages using gin (cats); ERROR: missing support function 1 for attribute 1 of index search_pages_cats create index search_pages_cats on search_pages using gist (cats); ERROR: data type character varying[] has no default operator class for access method gist HINT: You must specify an operator class for the index or define a default operator class for the data type. This works fine in 8.2, for example: \d search_pages Table public.search_pages Column |Type | Modifiers ---+-+--- page_name | character varying | cats | character varying[] | Indexes: search_pages_page UNIQUE, btree (page_name) search_pages_cats gin (cats) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Load Distributed Checkpoints test results
On 6/15/07, Gregory Stark [EMAIL PROTECTED] wrote: While in theory spreading out the writes could have a detrimental effect I think we should wait until we see actual numbers. I have a pretty strong suspicion that the effect would be pretty minimal. We're still doing the same amount of i/o total, just with a slightly less chance for the elevator algorithm to optimize the pattern. ..and the sort patching suggests that the OS's elevator isn't doing a great job for large flushes in any case. I wouldn't be shocked to see load distributed checkpoints cause an unconditional improvement since they may do better at avoiding the huge burst behavior that is overrunning the OS elevator in any case. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Sorted writes in checkpoint
On 6/14/07, Simon Riggs [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 16:39 +0900, ITAGAKI Takahiro wrote: Greg Smith [EMAIL PROTECTED] wrote: On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote: If the kernel can treat sequential writes better than random writes, is it worth sorting dirty buffers in block order per file at the start of checkpoints? I wrote and tested the attached sorted-writes patch base on Heikki's ldc-justwrites-1.patch. There was obvious performance win on OLTP workload. tests| pgbench | DBT-2 response time (avg/90%/max) ---+-+--- LDC only | 181 tps | 1.12 / 4.38 / 12.13 s + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s + Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s (*) Don't write buffers that were dirtied after starting the checkpoint. machine : 2GB-ram, SCSI*4 RAID-5 pgbench : -s400 -t4 -c10 (about 5GB of database) DBT-2 : 60WH (about 6GB of database) I'm very surprised by the BM_CHECKPOINT_NEEDED results. What percentage of writes has been saved by doing that? We would expect a small percentage of blocks only and so that shouldn't make a significant difference. I thought we discussed this before, about a year ago. It would be easy to get that wrong and to avoid writing a block that had been re-dirtied after the start of checkpoint, but was already dirty beforehand. How long was the write phase of the checkpoint, how long between checkpoints? I can see the sorted writes having an effect because the OS may not receive blocks within a sufficient time window to fully optimise them. That effect would grow with increasing sizes of shared_buffers and decrease with size of controller cache. How big was the shared buffers setting? What OS scheduler are you using? The effect would be greatest when using Deadline. Linux has some instrumentation that might be useful for this testing, echo 1 /proc/sys/vm/block_dump Will have the kernel log all physical IO (disable syslog writing to disk before turning it on if you don't want the system to blow up). Certainly the OS elevator should be working well enough to not see that much of an improvement. Perhaps frequent fsync behavior is having unintended interaction with the elevator? ... It might be worthwhile to contact some Linux kernel developers and see if there is some misunderstanding. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Index greater than 8k
On 11/1/06, Teodor Sigaev [EMAIL PROTECTED] wrote: [snip] Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob bar ba ar'. And make GIN functional index over your column (to save disk space). [snip] Time of search in GIN weak depend on number of words (opposite to tsearch2/GiST), but insertion of row may be slow enough With the right folding the number of possible trigrams for ascii text is fairly small.. much smaller than the number of words in used in a large corpus of text so the GIN performance for searches should be pretty good. Real magic would be to teach the regex operator to transparently make use of such an index. ;) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] New CRC algorithm: Slicing by 8
On 10/24/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I wasn't aware that a system could protect against this. :-) I write 8 Kbytes - how can I guarantee that the underlying disk writes all 8 Kbytes before it loses power? And why isn't the CRC a valid means of dealing with this? :-) [snip] A file system with an apropreiate transaction method could do this.. In *theory* reiser4 write()s are atomic. No one has verified, however, that there is no torn page risk introduced in some other part of the kernel. I'm not aware of any other system which can guaranteed the atomicity of 8k writes. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] New CRC algorithm: Slicing by 8
On 10/21/06, Tom Lane [EMAIL PROTECTED] wrote: [snip] It hasn't even been tested. One thing I'd want to know about is the performance effect on non-Intel machines. On Opteron 265 his test code shows SB8 (the intel alg) is 2.48x faster for checksum and 1.95x faster for verify for the 800 * 1024 blocks of 1 KB each workload. For 10 blocks of 8k I got simmlar results as well. It looks like the new code may have a larger cache footprint, so actual performance may differ from the microbenchmark. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Replication
On 8/21/06, Alvaro Herrera [EMAIL PROTECTED] wrote: But the confirmation that needs to come is that the WAL changes have been applied (fsync'ed), so the performance will be terrible. So bad, that I don't think anyone will want to use such a replication system ... Okay. I give up... Why is waiting for fsync on a fast local network which takes 15us to send a message (infiniband is cheap..) an unimaginable delay when we tolerate a local 8ms fsync delay on systems without writeback cache? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] How does the planner deal with multiple possible indexes?
On 7/19/06, Jim C. Nasby [EMAIL PROTECTED] wrote: [snip] \d does list bdata__ident_filed_departure before bdata_ident; I'm wondering if the planner is finding the first index with ident_id in it and stopping there? From my own experience it was grabbing the first that has the requested field as its first member.. I haven't looked at the code to see if that is the intended behavior. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compression and on-disk sorting
Oh come on, Sorry to troll but this is too easy. On 5/15/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: You guys have to kill your Windows hate - in jest or otherwise. It's zealous, and blinding. [snip] Why would it be assumed, that a file system designed for use from a desktop, would be optimal at all for database style loads? It wouldn't. Why would someone use a desktop OS for a database? Why would you call the result of answering the previous question zealous and blinding? PG's use of the OS's block cache is a good move because it makes PG tend to 'just work' where the alternatives require non-trivial tuning (sizing their caches not to push the OS into swap). The advantages of this are great enough that if additional smarts are needed in the OS cache it might well be worth the work to add it there and to ask for new fadvise flags to get the desired behavior. That's something that would be easy enough for a dedicated hacker to do, or easy enough to collaborate with the OS developers if the need could be demonstrated clearly enough. What reasonable OS couldn't you do that with? :) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Support Parallel Query Execution in Executor
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote: Gregory Maxwell [EMAIL PROTECTED] writes: For example, one case made in this thread involved bursty performance with seqscans presumably because the I/O was stalling while processing was being performed. Actually, the question that that raised in my mind is why isn't the kernel doing read-ahead properly? When we're doing nonsequential access like an indexscan, it's unsurprising that the kernel can't guess which block we need next, but in a plain seqscan you'd certainly expect the read-ahead algorithm to kick in and ensure that the next block is fetched before we need it. So before we go inventing complicated bits of code with lots of added overhead, we should first find out exactly why the system doesn't already work the way it's supposed to. But is that really the behavior we should expect? How much memory can we expect the OS to spend on opportunistic read-in? How much disk access should be spent on a guess? There is an intrinsic tradeoff here, applications tend to be bursty so just because you're reading a lot now doesn't mean you'll continue... and the filesystem will have fragmentation, so a failed guess can translate into a lot of pointless seeking. As I recall, in Linux 2.6 you have something like a max of 128KB of readahead. Given that and a disk subsystem that reads at 200MB/sec you can't spend more than 600us processing before requesting enough additional blocks put the disk back into readhead or you will stall the disk. Stalling the disk costs more than you'd expect, due to FS fragmentation there can be terrific gains from allowing the OS and disk to issue reads out of order from a large request queue. It would probably be reasonable to say that the OS should be using much larger readhead buffers, especially on systems with fast disk subsystems... But that doesn't come for free and can slaughter performance for many workloads (consider, what if it was triggering 5MB of file oriented read-ahead for every index scan seek we did?). There is an adaptive readahead patch for Linux which should improve things (http://lwn.net/Articles/176279/ and if you google around there are some benchmarks) but I doubt that even that would be able to keep a 200MB/sec+ disk subsystem saturated with the sort of access patterns PG has... To address this in a cross platform way will be a challenge. I doubt Linux is alone at having skimpy readahead (because big readahead translates into huge losses if you get it wrong). Given this information, a stupid 'fault-in' process should probably give huge gains for seqscans... but I think the extra work required to find a solution which is also useful for index operations is probably worth it as well. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Support Parallel Query Execution in Executor
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote: Certainly. If the OS has readahead logic at all, it ought to think that a seqscan of a large table qualifies. Your arguments seem to question whether readahead is useful at all --- but they would apply *just as well* to an app doing its own readahead, which is what is really getting proposed in this thread. We know we're going to read the whole table, the OS doesn't. We can be confident that we're will not use our read-ahead when we're really doing random accesses. The OS has to deal with many applications with many workloads running on a wider spectrum of hardware. It's important that it does the right thing, but probably more important that it doesn't do the wrong thing. This encourages erroring on the side of small readahead. Before we go replacing a standard OS-level facility with our own version, we need to have a much clearer idea of why the OS isn't getting the job done for us. Otherwise we're likely to write a large amount of code and find out that it doesn't work very well either. Thats a fair position... It would be useful to know much much readahead PG needs in order to keep a high speed disk subsystem saturated. This would involve profiling how frequently PG requests data, how much it requests when running out of a hot cache. We could then say that the OS would need to readahead xMB to keep a yMB/s disk subsystem saturated. It would be good to know how much FBSD will readahead... It might also be interesting for someone with the right testing rig on linux to try the adaptive readahead patch to see if that improves PG's ability to keep the disk busy. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Support Parallel Query Execution in Executor
On 4/9/06, Luke Lonergan [EMAIL PROTECTED] wrote: Gregory, On 4/9/06 1:36 PM, Gregory Maxwell [EMAIL PROTECTED] wrote: It might also be interesting for someone with the right testing rig on linux to try the adaptive readahead patch to see if that improves PG's ability to keep the disk busy. the adaptive readahead patch? Did I miss one? We will happily test experimental patches that improve I/O utilitization. We have an assortment of gear with high speed I/O, mostly Linux now. Linux kernel patch, I'd mentioned it in a prior post. http://www.vanheusden.com/ara/ It increases Linux's maximum readahead from 128K to 1meg .. and it should be smart enough that you could crank it up further without too much risk of hurting performance elsewhere. If PG's bottlenecked on seqscan due to insufficient readahead, I'd expect this to show an improvement... although I am still somewhat doubtful that it'll be enough to keep the disk saturated if PG's behavior is highly unoptimal. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Support Parallel Query Execution in Executor
On 4/8/06, Tom Lane [EMAIL PROTECTED] wrote: This is exactly the bit of optimism I was questioning. We've already been sweating blood trying to reduce multiprocessor contention on data structures in which collisions ought to be avoidable (ie, buffer arrays where you hope not everyone is hitting the same buffer at once). I think passing large volumes of data between different processes is going to incur quite a lot of locking overhead, pipeline stalls for cache line transfers, etc, etc, because heavy contention for the transfer buffer is simply not going to be avoidable. We should consider true parallel execution and overlapping execution with I/O as distinct cases. For example, one case made in this thread involved bursty performance with seqscans presumably because the I/O was stalling while processing was being performed. In general this can be avoided without parallel execution through the use of non-blocking I/O and making an effort to keep the request pipeline full. There are other cases where it is useful to perform parallel I/O without parallel processing.. for example: a query that will perform an index lookup per row can benefit from running some number of those lookups in parallel in order to hide the lookup latency and give the OS and disk elevators a chance to make the random accesses a little more orderly. This can be accomplished without true parallel processing. (Perhaps PG does this already?) Parallel execution to get access to more CPU and memory bandwidth is a fine thing, and worth the costs in many cases... but it shouldn't be used as an easy way to get parallel IO without careful consideration. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] [GENERAL] A real currency type
On 3/21/06, Jim C. Nasby [EMAIL PROTECTED] wrote: ISTM that having a currency type is pretty common for most databases; I don't really see any reason not to just include it. Likewise for a type that actually stores timezone info with a timestamp. This really should be generalized to work with all the base types because there are strong use cases for each. (timezones with timestamps, currencies with numeric, physical units with floats and ints) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
On 2/17/06, Ragnar [EMAIL PROTECTED] wrote: Say again ? Let us say you have 1 billion rows, where the column in question contains strings like baaaaaa baaaaab baaaaac ... not necessarily in this order on disc of course The minimum value would be keyed as 0001h, the next one as 0002h and so on. Now insert new value 'a' Not only will you have to update 1 billion records, but also all the values in your map. please explain No comment on the usefulness of the idea overall.. but the solution would be to insert with the colliding value of the existing one lesser than it.. It will falsly claim equal, which you then must fix with a second local sort which should be fast because you only need to sort the duplicates/false dupes. If you insert too much then this obviously becomes completely useless. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why don't we allow DNS names in pg_hba.conf?
On 2/13/06, Joshua D. Drake [EMAIL PROTECTED] wrote: Well as one of the people that deploys and managees many, many postgresql installations I can say I have never run into the need to have dns names and the thought of dns names honestly seems silly. It will increase overhead and dependencies that I just wouldn't want in my installations. It is not uncommon for an environment that has already suffered through one forced renumbering to forbid the use of hard set IPs in application software. With IPv6 we will just see more and more of that. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Fixing row comparison semantics
On 12/26/05, Pavel Stehule [EMAIL PROTECTED] wrote: (1,1) * (1,2) = true (1,2) * (2,1) is NULL (2,3) * (1,2) = false it's usefull for multicriterial optimalisation This is indeed a sane and useful function which should be adopted by the SQL standard.. in postgresql this would easily enough be implemented as a user function so I'm not sure we need special support for it. The idea is that in a multidimension comparison you can only sometimes say when one tuple is strictly less than (or greater than) another because differing dimensions are incomparable. So, like his example, we can not say if (1,2) is lesser or greater than (2,1) because saying so would require some priority of the dimensions which may not be known or may not exist, it is only clear that they are not equal.. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Upcoming PG re-releases
On 12/8/05, Bruce Momjian pgman@candle.pha.pa.us wrote: A script which identifies non-utf-8 characters and provides some context, line numbers, etc, will greatly speed up the process of remedying the situation. I think the best we can do is the iconv -c with the diff idea, which is already in the release notes. I suppose we could merge the iconv and diff into a single command, but I don't see a portable way to output the iconv output to stdout., /dev/stdin not being portable. No, what is needed for people who care about fixing their data is a loadable strip_invalid_utf8() that works in older versions.. then just select * from bar where foo != strip_invalid_utf8(foo); The function would be useful in general, for example, if you have an application which doesn't already have much utf8 logic, you want to use a text field, and stripping is the behaviour you want. For example, lots of simple web applications. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Replication on the backend
On 12/6/05, Jan Wieck [EMAIL PROTECTED] wrote: IMO this is not true. You can get affordable 10GBit network adapters, so you can have plenty of bandwith in a db server pool (if they are located in the same area). Even 1GBit Ethernet greatly helps here, and would make it possible to balance read-intensive (and not write intensive) applications. We using linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is something you need to have a quite some harddisks to reach that. Latency is not bad too. It's not so much the bandwidth but more the roundtrips that limit your maximum transaction throughput. Remember, whatever the priority, you can't increase the speed of light. Eh, why would light limited delay be any slower than a disk on FC the same distance away? :) In any case, performance of PG on iscsi is just fine. You can't blame the network... Doing multimaster replication is hard because the locking primitives that are fine on a simple multiprocessor system (with a VERY high bandwidth very low latency interconnect between processors) just don't work across a network, so you're left finding other methods and making them work... But again, multimaster isn't hard because there of some inherently slow property of networks. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits
On 12/5/05, Tom Lane [EMAIL PROTECTED] wrote: Not only does 4000! not work, but 400! doesn't even work. I just lost demo wow factor points! It looks like the limit would be about factorial(256). The question remains, though, is this computational range good for anything except demos? I've hesitated commenting, because I think it might be a silly reason, but perhaps it's one other people share. ... I use PG as a calculator for big numbers because it's the only user friendly thing on my system that can do factorial(300) - factorial(280). I'd rather use something like octave, but I've found its pretty easy to escape its range. If the range for computation is changed, then I'll probably keep an old copy around just for this, though I'm not quite sure how much I'd be affected.. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Upcoming PG re-releases
On 12/4/05, Tom Lane [EMAIL PROTECTED] wrote: Paul Lindner [EMAIL PROTECTED] writes: On Sun, Dec 04, 2005 at 11:34:16AM -0500, Tom Lane wrote: Paul Lindner [EMAIL PROTECTED] writes: iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql Is that really a one-size-fits-all solution? Especially with -c? I'd say yes, and the -c flag is needed so iconv strips out the invalid characters. That's exactly what's bothering me about it. If we recommend that we had better put a large THIS WILL DESTROY YOUR DATA warning first. The problem is that the data is not invalid from the user's point of view --- more likely, it's in some non-UTF8 encoding --- and so just throwing away some of the characters is unlikely to make people happy. Nor is it even guarenteed to make the data load: If the column is unique constrained and the removal of the non-UTF characters makes two rows have the same data where they didn't before... The way to preserve the data is to switch the column to be a bytea. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Reducing relation locking overhead
On 02 Dec 2005 15:25:58 -0500, Greg Stark [EMAIL PROTECTED] wrote: I suspect this comes out of a very different storage model from Postgres's. Postgres would have no trouble building an index of the existing data using only shared locks. The problem is that any newly inserted (or updated) records could be missing from such an index. To do it you would then have to gather up all those newly inserted records. And of course while you're doing that new records could be inserted. And so on. There's no guarantee it would ever finish, though I suppose you could detect the situation if the size of the new batch wasn't converging to 0 and throw an error. After you're mostly caught up, change locking behavior to block further updates while the final catchup happens. This could be driven by a hurestic that says make up to N attempts to catch up without blocking, after that just take a lock and finish the job. Presumably the catchup would be short compared to the rest of the work. Are their enviroments which could not tolerate even this minimal hit? Probably, which leaves the choice of telling them 'don't reindex then' or providingaA knob which would tell it to never block (would just try N times and then give up, failing the reindex). ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] generalizing the planner knobs
On 02 Dec 2005 15:49:02 -0500, Greg Stark [EMAIL PROTECTED] wrote: Rod Taylor [EMAIL PROTECTED] writes: The missing capability in this case is to be able to provide or generate (self learning?) statistics for a function that describe a typical result and the cost of getting that result. Ok, try WHERE radius_authenticate(user, (select ...), ?) The point is that you can improve the estimates the planner gets. But you can never make them omniscient. There will always be cases where the user knows his data more than the planner. And those hints are still valid when a new optimizer has new plans available. Actually... If a statistics engine stores the entire query as well and used that as a key I don't see why it couldn't figure this out. I.e. in queries that look like Z operation X has historically had selectivity Y. The the instruction to the user is simple: 'make sure that queries with different results look different' . This is often naturally the case. The challenge becomes how do you group together queries which are mostly the same so that you get enough data, but not falsely cluster queries with different statistics. The simplest way check the statistics list for the most similar query match, and use that information. If the result is similar to what is expected, use it to update the statistics record. If the measured selectivity is too different make a new record which will then attract similar queries. Sounds like a good research project for someone. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] generalizing the planner knobs
On 12/1/05, Pollard, Mike [EMAIL PROTECTED] wrote: Optimizer hints were added because some databases just don't have a very smart optimizer. But you are much better served tracking down cases in which the optimizer makes a bad choice, and teaching the optimizer how to make a better one. That way, all users get the benefit of the fix. Remember, the purpose of SQL is to isolate the end user from having to care about how the data is retrieved; that is the RDBMS' problem. (the other thing forgotten was that it was supposed to be a natural language. NVL. Bah.) The flipside there is that a good set of hinting options may increase the amount of detailed feedback we get from users on improvements needed in the optimizer. The current knobs are pretty blunt and don't do as much as I'd like when trying to track down exactly where the optimiser has gone wrong. If we'd really like to avoid people using the knobs to rig queries, how about making them only work with explain analyze, useful for debugging but not so useful for actual queries. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Improving count(*)
On 11/21/05, Jim C. Nasby [EMAIL PROTECTED] wrote: What about Greg Stark's idea of combining Simon's idea of storing per-heap-block xmin/xmax with using that information in an index scan? ISTM that's the best of everything that's been presented: it allows for faster index scans without adding a lot of visibility overhead to the index heap, and it also allows VACUUM to hit only pages that need vacuuming. Presumably this could also be used as the on-disk backing for the FSM, or it could potentially replace the FSM. This should be a big win all around, especially now since in memory bitmaps make it more likely that some classes of queries will be pure index. I still think it would be useful to have a estimated_count() which switches to whatever method is needed to get a reasonably accurate count quickly (stats when there are no wheres we can't predict, sampling otherwise if the involved tables are large, and a normal count in other cases.) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Improving count(*)
On 11/18/05, Merlin Moncure [EMAIL PROTECTED] wrote: In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL Server) the leaf level of the narrowest index on the table is scanned, following a linked list of leaf pages. Leaf pages can be pretty dense under Sybase, because they do use prefix compression. A count(*) on a table with 100 million rows is going to take a few minutes, but it is going to be at least an order of magnitude faster than a data page scan -- maybe two orders of magnitude faster. MS SQL server (pre 2005) is not an MVCC database, so it's not apples to apples with pg. Many of the people who wander on this list and complain about count(*) either come from one of those or some other non-MVCC database or worse, a flat-file xbase type system. A performance comparison between MS 2005 and pg would be much more interesting. Personally, I don't know what all the fuss is about [although I wouldn't complain about an optimization ;)]. count(*) WHERE 1 is indeed a corner case that few to no real applications should care about... If we were having to choose between improving that case and preserving the performance and maintainability of PG then I think the discussion would already be over. However, some great ideas have been proposed here which would not only help in that case but would otherwise be quite useful. *Inclusion of a 'MVCC inflight' bit in indexes which would allow skipping MVCC checks in clumps of an index scan which have no pending changes. This would further close the performance gap between PG and non-MVCC databases for some workloads. *Introduction of high performance table sampling, which would be useful in many applications (including counting where there is a where clause) as well as for testing and adhoc queries. and *a estimate_count() that provides the planner estimate, which would return right away and provide what is really needed most of the time people try to count(*) on a large table. So, while this is a silly case to optimize for it's one where it appears that the proposed solutions will make PG better all around. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Réf. : RE: [HACKERS] Running PostGre on DVD
On 11/15/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I don't understand why an user can't WILLINGLY (by EXPLICITLY setting an OPTION) allow a privileged administrator to run PostGre. It is a MAJOR problem for me, that will force me to use another database because my database will be on a DVD and I'm not sure that on the PC on which it will be executed, the user isn't an admin or that I can create an unprivileged user. To resume, I don't want my user to be unable to run my application for that. The persons specifying this option would know perfectly well the risks linked to it. I'm starting to think the PostGre developpers think the users are children. I'm deeply disappointed to be forced to compile my own PostGre and I will not. You can do it. Modify the source, it's a one line change. Be grateful that you have this privilege that you would lack with a proprietary database. Running as an administrator isn't a matter of taste, it's fundamentally broken from a security perspective. Just as you are (usually) asked to jump through hoops to break the normal promises that the database provide, you will be asked to do so on this one. If you are unable to make a one line change to the source and rebuild the application then you probably are unable to understand the security implications of your decision. I wouldn't call this treating you like a child, I'd call this expecting you to be an adult. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
On 11/13/05, Robert Treat [EMAIL PROTECTED] wrote: On Saturday 12 November 2005 04:06, Matteo Beccati wrote: | 1 |1 | NULL | Wow, that seems ugly maybe there's a reason for it, but I'm not sure we could deviate from my$ql's behavior on this even if we wanted... they are the standard here. I don't think that's ugly, I think that's exactly working as advertised. Replace behaves exactly like deleting the record with the matching primary key and inserting the provided input. ... not merging together old data with new. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] SIGSEGV taken on 8.1 during dump/reload
On 11/8/05, Tom Lane [EMAIL PROTECTED] wrote: Teodor Sigaev [EMAIL PROTECTED] writes: Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. Does look a bit suspicious ... Robert, are you *sure* you've got the right version of pgsphere linked in? Did you compile it against the right set of Postgres header files? So it turned out that he didn't... Is this a sign that we need to include a versioning symbol in SOs so we can give a nice clear error message module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL 8.1. Is there ever a case where we want people using modules compiled against an old version, are there cases where users can't recompile their modules but the old ones would work? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Interval aggregate regression failure (expected seems
On 07 Nov 2005 14:22:37 -0500, Greg Stark [EMAIL PROTECTED] wrote: IIRC, floating point registers are actually longer than a double so if the entire calculation is done in registers and then the result rounded off to store in memory it may get the right answer. Whereas if it loses the extra bits on the intermediate values (the infinite repeating fractions) that might be where you get the imprecise results. Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse. SSE is a much better choice on P4 for performance reasons, and never has excess precision. I'm guessing from the above that I'm incorrect, in which case we should always be compiled with -mfpmath=sse -msse2 when we are complied -march=pentium4, this should remove problems caused by excess precision. The same behavior can be had on non sse platforms with -ffloat-store. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing the overhead of NUMERIC data
On 11/4/05, Martijn van Oosterhout kleptog@svana.org wrote: Yeah, and while one way of removing that dependance is to use ICU, that library wants everything in UTF-16. So we replace copying to add NULL to string with converting UTF-8 to UTF-16 on each call. Ugh! The argument for UTF-16 is that if you're using a language that doesn't use ASCII at all, UTF-8 gets inefficient pretty quickly. Is this really the case? Only unicode values 000800 - 00 are smaller in UTF-16 than in UTF-8, and in their case it's three bytes vs two. Cyrilic, Arabic, Greek, Latin, etc are all two bytes in both. So, yes in some cases UTF-8 will use three bytes where UTF-16 would be two, but thats less inefficient than UTF-16 for ASCII, which many people find acceptable. Locale sensetive, efficient storage, fast comparisons, pick any two! I don't know that the choices are that limited, as I indicated earlier in the thread I think it's useful to think of all of these encodings as just different compression algorithms. If our desire was to have all three, the backend could be made null safe and we could use the locale-sensitive and fast representation (Probably UTF-16 or UTF-32) in memory, and store on disk whatever is most efficient for storage. (lz compressed UTF-whatever for fat fields, UTF-8 for mostly ascii small fields, SCSU for non-ascii short fields (http://www.unicode.org/reports/tr6/), etc) My guess is that in the long run there would be two basic string datatypes, one UTF-8, null terminated string used in the backend code as a standard C string, default collation strcmp. The other UTF-16 for user data that wants to be able to collate in a locale dependant way. So if we need locale dependant colation we suffer 2x inflation for many texts, and multibyte complexity still required if we are to collate correctly when there are characters outside of the BMP. Yuck. Disk storage type, memory strorage type, user API type, and collation should be decoupled. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing the overhead of NUMERIC data
On 11/4/05, Tom Lane [EMAIL PROTECTED] wrote: Martijn van Oosterhout kleptog@svana.org writes: Yeah, and while one way of removing that dependance is to use ICU, that library wants everything in UTF-16. Really? Can't it do UCS4 (UTF-32)? There's a nontrivial population of our users that isn't satisfied with UTF-16 anyway, so if that really is a restriction of ICU then we're going to have to look elsewhere :-( The correct question to ask is something like Does it support non-bmp characters? or Does it really support UTF-16 or just UCS2? UTF-16 is (now) a variable width encoding which is a strict superset of UCS2 which allows the representation of all Unicode characters. UCS2 is fixed width and only supports characters from the basic multilingual plane. UTF-32 and UCS4 are (now) effectively the same thing and can represent all unicode characters with a 4 byte fixed length word. The code can demand UTF-16 but still be fine for non-BMP characters. However, many things which claim to support UTF-16 really only support UCS2 or at least have bugs in their handling of non-bmp characters. Software that supports UTF-8 is somewhat more likely to support non-bmp characters correctly since the variable length code paths get more of a workout in many environments. :) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing the overhead of NUMERIC data
On 11/4/05, Martijn van Oosterhout kleptog@svana.org wrote: [snip] : ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not : support surrogates, and UTF-16 does support surrogates. This means : that UCS-2 only supports UTF-16's Base Multilingual Plane (BMP). The : notion of UCS-2 is deprecated and dead. Unicode 2.0 in 1996 changed : its default encoding to UTF-16. snip This means it's fine.. ICU's use of UTF-16 will not break our support for all of unicode. Conversion too and from UTF-16 isn't cheap, however, if you're doing it all the time. Storing ASCII in UTF-16 is pretty lame. Widespread use of UTF-16 tends to hide bugs in the handling of non-bmp characters. ... I would be somewhat surprised to see a substantial performance difference in working with UTF-16 data over UTF-8, but then again ... they'd know and I wouldn't. Other lame aspects of using unicode encodings other than UTF-8 internally is that it's harder to figure out what is text in GDB output and such.. can make debugging more difficult. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing the overhead of NUMERIC data
On 11/3/05, Martijn van Oosterhout kleptog@svana.org wrote: That's called UTF-16 and is currently not supported by PostgreSQL at all. That may change, since the locale library ICU requires UTF-16 for everything. UTF-16 doesn't get us out of the variable length character game, for that we need UTF-32... Unless we were to only support UCS-2, which is what some databases do for their Unicode support. I think that would be a huge step back and as you pointed out below, it is not efficient. :) The question is, if someone declares a field CHAR(20), do they really mean to fix 40 bytes of storage for each and every row? I doubt it, that's even more wasteful of space than a varlena header. Which puts you right back to variable length fields. Another way to look at this is in the context of compression: With unicode, characters are really 32bit values... But only a small range of these values is common. So we store and work with them in a compressed format, UTF-8. The costs of compression is that fixed width fields can not be fixed width, and the some operations are much more expensive than they would be otherwise. As such it might be more interesting to ask some other questions like: are we using the best compression algorithm for the application, and, why do we sometimes stack two compression algorithms? For longer fields would we be better off working with UTF-32 and being more agressive about where we LZ compress the fields? I dunno... no opinion on the matter here, but I did want to point out that the field can be fixed length without a header. Those proposing such a change, however, should accept that this may result in an overall expense. The only time this may be useful is for *very* short fields, in the order of 4 characters or less. Else the overhead swamps the varlena header... Not even 4 characters if we are to support all of unicode... Length + UTF-8 is a win vs UTF-32 in most cases for fields with more than one character. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
On 10/31/05, Jim C. Nasby [EMAIL PROTECTED] wrote: On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote: There is no way if the system has some incorrect value whether that would later corrupt the data or not. Anything the system does that it shouldn't do is a potential corruption problem. But is it safe to say that there are areas where a failed assert is far more likely to result in data corruption? And that there's also areas where there's likely to be difficult/impossible to find bugs, such as race conditions? ISTM that it would be valuable to do some additional checking in these critical areas. There are, no doubt, also places where an assert has minimal to no performance impact. I'd wager a guess that the intersection of low impact asserts, and asserts which measure high risk activities, is small enough to be uninteresting. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Differences in UTF8 between 8.0 and 8.1
On 10/26/05, Christopher Kings-Lynne [EMAIL PROTECTED] wrote: iconv -c -f UTF8 -t UTF8 recode UTF-8..UTF-8 dump_in.sql dump_out.sql I've got a file with characters that pg won't accept that recode does not fix but iconv does. Iconv is fine for my application, so I'm just posting to the list so that anyone looking for why recode didn't work for them will find the suggestion to use iconv. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] enums
On 10/27/05, Andrew Dunstan [EMAIL PROTECTED] wrote: Yes, MySQL is broken in some regards, as usual. However, the API isn't bad (except for the fact that it doesn't care what invalid crap you throw at it), and more importantly there are thousands of apps and developers who think around that interface. We should copy it without the brokenness as much as possible unless we have good cause otherwise. mmm ... no. It is too broken. We should do enums orthogonally with other type definitions in PostgreSQL. Where I would like to get to is that we have a flavor of CREATE TYPE that will create the enum type for us, including all the support that I build into my little kit. And if you want to change the enumeration set on a column, you would use ALTER TABLE foo ALTER COLUMN bar TYPE newtype USING ... eh, Well that we have a reasonable user extensiable type system is reasonable reason. What I was mostly objecting to was the use of lexical collation the don't mess with what people already expect argument was just the most handy strawman available. :) And in doing so you could insert a enum in the middle of the existing list without breaking the values already in the table? If so that would be very useful. Inline declarations of enums does not strike me as good. You're right, it's a property of a type. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] enums
On 10/27/05, Jim Nasby [EMAIL PROTECTED] wrote: Adding -hackers back to the list... You could as equally say that it's ordering it by the order of the enum declaration, which seems quite reasonable to me. I don't really see why that's considered reasonable, especially as a default. I could maybe see an argument for having a means to see this ordering, but IMO anything depending on that is broken. I don't think we should be making any guarantees about how enums are stored in the database (including ordering). Your examples show why I don't think it's a good idea to use MySQL as a guide for how to do enums. Yes, MySQL is broken in some regards, as usual. However, the API isn't bad (except for the fact that it doesn't care what invalid crap you throw at it), and more importantly there are thousands of apps and developers who think around that interface. We should copy it without the brokenness as much as possible unless we have good cause otherwise. If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an artifact of our storage mechanism. This means supporting things like being able to re-order the accepted values in an enum. But like I said, I just don't see the use case for doing that. So what do you propose we do for a default ordering? I hope you don't think we should force a sort as though the enum labels were text... That almost certainly incorrect for most applications of enums, which are used to make opaque labels more human compatible. MySQL's behavior of allowing the user to specify the collation in the typedef makes a lot of sense to me, it doesn't matter that it actually works as an artifact of the storage backend. I'd argue that it would make sense to sort by the specification order even if we changed the backend to use varchars rather than numbers. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] enums
On 10/27/05, Andrew Dunstan [EMAIL PROTECTED] wrote: That seems counter-intuitive. It's also exposing an implimentation detail (that the enum is stored internally as a number). No it is not. Not in the slightest. It is honoring the enumeration order defined for the type. That is the ONLY correct behaviour, IMNSHO. Otherwise, you could just as easily use a domain with a check constraint. In fact, mysql's behaviour is laughably, even ludicrously, inconsistent: [snip] So for order by it honors the enumeration order, but for it uses the lexical ordering. Lovely, eh? Oh wow. That is broken, I didn't try that case because I figured it would do it right (i.e. use the enum order). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] On externals sorts and other IO bottlenecks in postgresql.
I don't recall this being mentioned in the prior threads: http://www.cs.duke.edu/TPIE/ GPLed, but perhaps it has some good ideas. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] A Better External Sort?
On 10/3/05, Ron Peacetree [EMAIL PROTECTED] wrote: [snip] Just how bad is this CPU bound condition? How powerful a CPU is needed to attain a DB IO rate of 25MBps? If we replace said CPU with one 2x, 10x, etc faster than that, do we see any performance increase? If a modest CPU can drive a DB IO rate of 25MBps, but that rate does not go up regardless of how much extra CPU we throw at it... Single threaded was mentioned. Plus even if it's purely cpu bound, it's seldom as trivial as throwing CPU at it, consider the locking in both the application, in the filesystem, and elsewhere in the kernel. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
On 9/30/05, Ron Peacetree [EMAIL PROTECTED] wrote: 4= I'm sure we are paying all sorts of nasty overhead for essentially emulating the pg filesystem inside another filesystem. That means ~2x as much overhead to access a particular piece of data. The simplest solution is for us to implement a new VFS compatible filesystem tuned to exactly our needs: pgfs. We may be able to avoid that by some amount of hacking or modifying of the current FSs we use, but I suspect it would be more work for less ROI. On this point, Reiser4 fs already implements a number of things which would be desirable for PostgreSQL. For example: write()s to reiser4 filesystems are atomic, so there is no risk of torn pages (this is enabled because reiser4 uses WAFL like logging where data is not overwritten but rather relocated). The filesystem is modular and extensible so it should be easy to add whatever additional semantics are needed. I would imagine that all that would be needed is some more atomicity operations (single writes are already atomic, but I'm sure it would be useful to batch many writes into a transaction),some layout and packing controls, and some flush controls. A step further would perhaps integrate multiversioning directly into the FS (the wandering logging system provides the write side of multiversioning, a little read side work would be required.). More importantly: the file system was intended to be extensible for this sort of application. It might make a good 'summer of code' project for someone next year, ... presumably by then reiser4 will have made it into the mainline kernel by then. :) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
On 9/28/05, Ron Peacetree [EMAIL PROTECTED] wrote: 2= We use my method to sort two different tables. We now have these very efficient representations of a specific ordering on these tables. A join operation can now be done using these Btrees rather than the original data tables that involves less overhead than many current methods. If we want to make joins very fast we should implement them using RD trees. For the example cases where a join against a very large table will produce a much smaller output, a RD tree will provide pretty much the optimal behavior at a very low memory cost. On the subject of high speed tree code for in-core applications, you should check out http://judy.sourceforge.net/ . The performance (insert, remove, lookup, AND storage) is really quite impressive. Producing cache friendly code is harder than one might expect, and it appears the judy library has already done a lot of the hard work. Though it is *L*GPLed, so perhaps that might scare some here away from it. :) and good luck directly doing joins with a LC-TRIE. ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On 9/15/05, Tom Lane [EMAIL PROTECTED] wrote: Yesterday's CVS tip: 1 32s 2 46s 4 88s 8 168s plus no-cmpb and spindelay2: 1 32s 2 48s 4 100s 8 177s plus just-committed code to pad LWLock to 32: 1 33s 2 50s 4 98s 8 179s alter to pad to 64: 1 33s 2 38s 4 108s 8 180s I don't know what to make of the 2-process time going down while 4-process goes up; that seems just weird. But both numbers are repeatable. It is odd. In the two process case there is, assuming random behavior, a 1/2 chance that you've already got the right line, but in the 4 process case only a 1/4 chance (since we're on a 4 way box). This would explain why we don't see as much cost in the intentionally misaligned case. You'd expect the a similar pattern of improvement with the 64byte alignment (some in the two process case, but more in the 4 case), but here we see more improvement in the two way case. If I had to guess I might say that the 64byte alignment is removing much of the unneeded line bouncing in the the two process case but is at the same time creating more risk of bouncing caused by aliasing. Since two processes have 1/2 chance the aliasing isn't a problem so the change is a win, but in the four process case it's no longer a win because with aliasing there is still a lot of fighting over the cache lines even if you pack well, and the decrease in packing makes odd aliasing somewhat more likely. This might also explain why the misaligned case performed so poorly in the 4process case, since the misalignment didn't just increase the cost 2x, it also increased the likelihood of a bogus bounce due to aliasing.. If this is the case, then it may be possible through very careful memory alignment to make sure that no two high contention locks that are likely to be contended at once share the same line (through either aliasing or through being directly within the same line). Then again I could be completely wrong, my understanding of multiprocessor cache coherency is very limited, and I have no clue how cache aliasing fits into it... So the above is just uninformed conjecture. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pl/Ruby, deprecating plPython and Core
On 8/16/05, Joshua D. Drake [EMAIL PROTECTED] wrote: Sure... it hasn't been found. We can play the it might have or might not have game all day long but it won't get us anywhere. Today, and yesterday pl/Ruby can be run trust/untrusted, pl/python can not. Both of these things could be said about Python when it was about the same age Ruby is now. But they can't be said about Python now. Again I love Python but I can't use it the way I want to in the database. I believe that unless plPython can either be fixed Fixed how ? Be able to be trusted. Really a lot of your points seem either to be appealing to the fad appeal of Ruby or misinformation about Python. It's silliness. The inclusion of pl/ruby should be considered independently of pl/python, they are separate matters. I promise that the aggregate work required for all coders who know Python to switch to ruby is far far greater than the work required to fix the issues with pl/python. :) I'd like to propose a more useful goal for consideration: PostgreSQL users should be able to use whatever language they write their frontend in to write their PL code. This doesn't mean it would be reasonable to include everything under the sun in the main distro, just as Linux distros don't normally ship ADA or Haskall compilers. But rather, any PL language which meets a certain level of capability (and yes, I'd propose having trusted support as being one of those requirements in languages where it makes sense) and has a sufficiently large user-base that we can reasonably expect it to be well supported should either be included in the main distro, or at least in a side-car PostgreSQL-PL package if driven there due to licensing concerns. Obviously there are costs in maintaining many PLs, but at the same time it doesn't really make sense to say that we're going to include PL/bar, and PL/baz but not PL/foo if all have comparable abilities and userbases. I see there being two rational paths, 1) support only one (or perhaps two where one is C and the other is something with trusted support) PL and claim that developers need to learn this PL in addition to what they write their frontends in. or 2) support a wealth of PLs with the intention of allowing developers to use the same language for their frontends as their database PL code. Any other position creates silly arguments, like replacing PL/Python with PL/Ruby. In terms of PostgreSQL's competitiveness in the marketplace of databases, my position would serve well: Other databases will have a more difficult time providing broad PL support, since PG already has a good head start there and joe-random application developer who doesn't care what database he uses will feel a lot more comfortable when he knows he can use the same language he's comfortable with for both front and back end support. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pl/Ruby, deprecating plPython and Core
On 8/16/05, David Fetter [EMAIL PROTECTED] wrote: It's not. In PL/parlance, trusted means prevented from ever opening a filehandle or a socket, and PL/PythonU is called PL/Python*U* (U for *un*trusted) because it cannot be so prevented. If somebody has figured out a way to make a PL/Python (without the U), that's great, but nothing has happened on this front in a couple of years, and Guido said that it was a problem with the language that he wasn't going to fix. It's not a problem in the *language*, it's a problem in the implementation. There are other implementations of python, including one inside the JavaVM. It's also one which could be worked around with the existing python implementation by completely sandboxing the process running python (i.e. via seccomp in linux for example). Yes, it's a problem, yes it should be fixed. But it is BS to claim that python fundamentally has a problem and needs to be removed because of it, just as much as it would be BS to claim that ruby should forbidden because it permits the same sort of unmaintainable syntax that has plagued perl for years. :) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] O_DIRECT for WAL writes
On 6/23/05, Gavin Sherry [EMAIL PROTECTED] wrote: inertia) but seeking to a lot of new tracks to write randomly-positioned dirty sectors would require significant energy that just ain't there once the power drops. I seem to recall reading that the seek actuators eat the largest share of power in a running drive... I've seen discussion about disks behaving this way. There's no magic: they're battery backed. Nah this isn't always the case, for example some of the IBM deskstars had a few tracks at the start of the disk reserved.. if the power failed the head retracted all the way and used the rotational energy to power it long enough to write out the cache.. At start the drive would read it back in and finish flushing it. unfortunately firmware bugs made it not always wait until the head returned to the start to begin writing... I'm not sure what other drives do this (er, well do it correctly :) ). ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] LGPL
On 6/18/05, Tom Lane [EMAIL PROTECTED] wrote: What is important is that it is possible, and useful, to build Postgres in a completely non-GPL environment. If that were not so then I think we'd have some license issues. But the fact that building PG in a GPL-ized environment creates a GPL-ized binary is not a problem from my point of view. You've already bought into the GPL if you're using that environment. Put another way: Linking to a GPLed library creates a gpled result, but being GPLed is completely and totally irrelevant to *users* because the GPL places no restrictions on use whatsoever. ... But is it really the case that PostgreSQL developers are being paid to code because PG is BSDed and proprietary forks are possible? ... There is no harm in being BSDed, but I question that the users of PostgreSQL are gaining enough advantage that there needs to be so much paranoia about making sure that the code is as easy as possible to make propritary forks of... ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] User/Group Quotas Revisited
- Who has permissions to set the user's quota per tablespace, the superuser and the tablespace owner? It would be nice if this were nestable, that is, if the sysadmin could carve out a tablespace for a user then the user could carve that into seperately quotated sub tables.. The idea being, a user may have several tables, some of which are likely to get big and fat and gain lots of crud, but some of which will never grow too big but you really don't want to fail just because someone floodded the other table and used up your quota. It would be nice if the user could manage that subassignment as he saw fit without assistance from the admin. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Bloom Filter indexes?
Has any thought been given to adding bloom filter indexes to PostgreSQL? A bloom index would be created on a column, and could then be used to accelerate exact matches where it is common that the user may query for a value that doesn't exist. For example, with the query select userid from user_table where name=notauser, the failure could be returned instantly, in most cases. A bloom filter index could be used to accelerate joins, esp full outer joins. Insertions into a bloom filter are very cheap. Updates could be done as an insert. Deletes are expensive (either you make a refcounted filter or you regenerate the filter). However, since bloom filters have false positives, it would be acceptable to regenerate the filter during a vacuum if there have been entries deleted or updated. The filter could be resized at vacuum time based on statistics gathered during execution. It would also be useful to have an array bloom index: store a bloom filter per record for an arrayed field, as well as the bloom filter for all records. This would allow membership tests for a field containing large arrays to happen very quickly. Perhaps useful for GIS and full text indexing applications. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Bloom Filter indexes?
Has any thought been given to adding bloom filter indexes to PostgreSQL? A bloom index would be created on a column, and could then be used to accelerate exact matches where it is common that the user may query for a value that doesn't exist. For example, with the query select userid from user_table where name=notauser, the failure could be returned instantly, in most cases. A bloom filter index could be used to accelerate joins, esp full outer joins. Insertions into a bloom filter are very cheap. Updates could be done as an insert. Deletes are expensive (either you make a refcounted filter or you regenerate the filter). However, since bloom filters have false positives, it would be acceptable to regenerate the filter during a vacuum if there have been entries deleted or updated. The filter could be resized at vacuum time based on statistics gathered during execution. It would also be useful to have an array bloom index: store a bloom filter per record for an arrayed field, as well as the bloom filter for all records. This would allow membership tests for a field containing large arrays to happen very quickly. Perhaps useful for GIS and full text indexing applications. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq