Re: [HACKERS] extended operator classes vs. type interfaces
Hi, First, I like the way you got back to the needs before trying to organize an approach to find a solution. Having said it allows me to cut a lot of your text, it's the one I agree with :) Robert Haas robertmh...@gmail.com writes: Given a type T, I think we'd like to be able to define a type U as the natural type to be added to or subtracted from T. As Jeff pointed out to me, this is not necessarily the same as the underlying type. For example, if T is a timestamp, U is an interval; if T is a numeric, U is also a numeric; if T is a cidr, U is an integer. Then we'd like to define a canonical addition operator and a canonical subtraction operator. I think that would be sufficient for the needs of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be nearly sufficient for range types, but in that case you also need to specify the unit increment within U - i.e. a 1 value for the datatype. It may or may not be worth building the concept of a unit increment into the type interface machinery, though: one could imagine two different range types built over the same base type with different unit increments - e.g. one timestamp range with unit increment = 1s, and one with unit increment = 1m. Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Do we want to enable support for string based ranges, as in the contributed prefix_range type? Thoughts? I like the type interface approach and I think this concept has been studied in great details in math and that we should start from existing concepts, even if most of them are way over my head. The ORDER BY problem refers to a metric space, defined by a distance function. Continuing your proposal the distance function return type would be of domain U. KNNGist is then a way to use the GiST index to sort by distance. http://archives.postgresql.org/pgsql-hackers/2010-02/msg01107.php You'll see in this mail a proposal for an operator group notion, which could get renamed to type interface if we think we won't need rings and such rather than just groups in the future. And there's opportunity for multi-type interfaces too (think families), like what's the distance between a point and a circle? The math groups already have a notion of neutral element, which for the addition is 0 (zero), we could expand our version of it with a unity element, which would be in the T domain. Then the range type could expand on this and provide a different unity value in their own interface, in the U domain this time. IMO tying the precision of the range interval into the type interface is a bad abstraction. As you said we want to possibly have several ranges types atop this. We can say that [1,3] = [1,4) when considering a default integer range because 4-3 = unity(integer). When considering a range over timestamps with a range interval unity of 1s, we are still able to do the math, and we can have another range over timestamps with a range interval unity of 10 mins in the same database. (I'm using this later example with the period datatype in a real application). While speaking of all that, in the prefix_range case, it'd be useful to have a new kind of typemod system at the range level, to allow for defining prefix text range with the '/' separator, say. Then greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?) Whereas currently = select '/etc/baz'::prefix_range | '/etc/bar'; ?column? -- /etc/ba[r-z] (1 row) Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining Streaming Replication Open Items
On Thu, Apr 8, 2010 at 11:00 PM, Robert Haas robertmh...@gmail.com wrote: I think you could shut it down at the first point at which it is holding no locks, rather than letting it continue recovering and potentially retake some new locks. That would be more consistent with the general idea of what a smart shutdown is supposed to be about. I think the real question is whether it's worth the code complexity. I don't think it's worth. So I agree to just remove the TODO item: Redefine smart shutdown in standby mode to exist as soon as all read-only connections are gone. http://wiki.postgresql.org/wiki/Todo#Standby_server_mode Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to modify default Type (TSQuery) behaviour?
Hi all I've asked related question on General list, but got no answer, although I have been able to work around that issue a little bit. There is one snag I encountered and I have no idea on how to work it out. I need to modify TSQuery object (add another operator and do some stuff with it) - this is done and works properly at C code level. Now I need to implement changes done in C on PG level. I really do not want to make another type, tsquery2 or such... What I've tried: - the manual states that you can create your own data types with CREATE TYPE command and alter some of their proprieties with ALTER TYPE. But it is not possible to modify INPUT or OUTPUT function. - I have created updated PG versions of the functions to_tsquery (the CAST function) and tsqueryout (the TYPE OUTPUT) function and they are in the public schema. - I have created a CAST from text to tsquery pointing to public.to_tsquery function. - I have changed the search_path so the public schema is first. How do I overwrite the default behaviour of Postgres 8.4, so I can successfully run the following queries: --1. SELECT 'cat dog mouse'::tsquery; --uses the built in function SELECT to_tsquery('cat dog mouse'); --also uses the built in function --but SELECT public.to_tsquery('cat dog mouse'); --uses new functions --2. SELECT CAST('dogs cat' AS tsquery); --uses the built in function --but SELECT CAST('dogs cat'::text AS tsquery); --uses new functions --3. SELECT public.to_tsquery('dog mouse'); --new operator (doublequote), works fine until OUTPUT function is called and as it is the default one an error is displayed --but SELECT public.tsqueryout(CAST('dogs cat'::text AS tsquery)); --uses new functions and displays correctly I thought that setting schema so public has priority over all other would make PG use those functions in the first place. Also is the string between the single quotes in SQL commands not treated as text type? Is this why a CAST to ::text make it work in example 2? Thanks in advance. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing phonetic search support for more languages - GSoC 2010
Hello Please find my project proposal at the following link: https://docs.google.com/fileview?id=0B4sVSOdX9RZKNjI1MDZlNDgtZGU0MS00NDE4LThiZDItMjZhMGZkYjUzMWExhl=en I would be glad to have your review/feedback on the same. -- Regards Dhiraj Lohiya
Re: [HACKERS] Win32 timezone matching
On Wed, Apr 7, 2010 at 00:48, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Wed, Apr 7, 2010 at 00:02, Tom Lane t...@sss.pgh.pa.us wrote: Oh, another thought here: what is the effect of the combination of this with your other proposal to add more timezones to the list? I've applied the patch to add the missing timezone names. Before I did that I wrote a small perlscript that reads pgtz.c and compares what's there to what's in the registry of the current machine. Turns out I had missed one, which is Argentina Standard Time. This script should probably live in CVS, and be run when Microsoft releases new timezone data. Where should I put it - src/timezone or somewhere in src/tools? (it does read pgtz.c in the current directory, but it doesn't actually edit the file - just outputs on stdout a list of changes to be made to the file manually) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
Robert Haas wrote: Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Thoughts? The examples with units look a lot like the IVLPQ datatype from HL7, see http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm About a type interface, the HL7 spec talks about promotion from e.g. a timestamp to an interval (hl7 speak for range) of timestamps (a range), and demotion for the back direction. Every 'quantity type', which is any type with a (possibly partially) lineair ordered domain, can be promoted to an interval of that type. In PostgreSQL terms, this could perhaps mean that by 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Win32 timezone matching
On Wed, Apr 7, 2010 at 21:06, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: ereport(LOG, (errmsg(could not determine system time zone, defaulting to \%s\, GMT), BTW, does anyone remember the reason for making GMT nonlocalizable in these messages? It seems more straightforward to do Nope, can't recall that. (errmsg(could not determine system time zone, defaulting to \GMT\), I suppose we had a reason for doing it the first way but I can't see what. GMT seems a fairly English-centric way of referring to UTC anyhow; translators might wish to put in UTC instead, or some other spelling. Shouldn't we let them? UTC and GMT aren't actually the same thing. In fact, it might be more sensible to fall back to UTC than GMT. Both in the message *and* the code, in that case. They only differ in fractions of seconds, but we do deal in fractions of seconds... It also carries the nice property that it's *supposed* to be abbreviated the same way regardless of language (which is why it's UTC and not CUT). And either way, it's an abbreviation, and we don't normally translate those, do we? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Win32 timezone matching
On Wed, Apr 7, 2010 at 21:01, Tom Lane t...@sss.pgh.pa.us wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Even if if keep the current fallback behaviour we should at least fix the windows codepath to do the same as the unix codepath does - as in actually logging that the fallback to GMT happened... +1 for that anyway. There already are WARNING messages for the various Windows failure cases, but compared to the Unix code ereport(LOG, (errmsg(could not determine system time zone, defaulting to \%s\, GMT), errhint(You can specify the correct timezone in postgresql.conf.))); they lack either the note about defaulting to GMT or the hint. I guess we should add both of those to the failure cases in the Windows version of identify_system_timezone. Should we also change the WARNING errlevel to LOG? I think the latter is more likely to actually get into the log. You are suggesting adding this after the could not find match message, correct? Not replacing it? Because if we replace it, we loose the information of what we failed to match. So basically like attached? Also, would LOG be *more* likely to be seen than a WARNING? Why would that be? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ win32_tz_warning.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
Hi all, I am new at open source project however in a user point of view I must confess that usability is a really though issue ,even if the performance of a database is crucial. As to my idea for improve postgresql is ; http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html in cavetaes section is mentioned that The schemes shown here assume that the partition key column(s) of a row never change, or at least do not change enough to require it to move to another partition. An UPDATE that attempts to do that will fail because of the CHECK constraints. If you need to handle such cases, you can put suitable update triggers on the partition tables, but it makes management of the structure much more complicated. Fixing this issue will help to improve the usability of partitions since the users do not want to deal with low-level integrity issues such as CHECK constraint. Roughly, I can say that if we want to deal with this issue,the first operation would be writing a trigger to check if an update operation causes a transfer issue between partitions.Then, if it is inevitable the user should be prompted about they are doing. Warning the system or user would generallry causes more trouble this point we need to decide on possible fixing ways and give more details about which choise will cause in what results. Then, creating a temprory table before commiting something will hellp us to conrol completeness and correctness. I tried to give more details about what I want to do.If you anything should be fixed in my proposal please earn me. Thanks 2010/4/8 Necati Batur necatiba...@gmail.com Benefits of Project Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE is far faster than a bulk operation. It also entirely avoids the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media. Delivarables *The trigger based operations can be done automatically *The stored procedures can help us to do some functionalities like check constraint problem *manual VACUUM or ANALYZE commands can be handled by using triggers DBMS SQL can help to provide faster executions *Some more functionalities can be added to UPDATE operations to make administrations easy Timeline (not exact but most probably) Start at june 7 and End around 7 september *Warm up to environment to Postgresql(1-2 weeks) *Determine exact operations to be addded on postgresql *Initial coding as to workbreakdown structure *Start implementing on distributed environment to check inital functions work *Write test cases for code *Further implementation to support full functionalities on ideas *Write it to discussion site and collect feedbacks *More support upon feedbacks *Last tests and documentation of final operations About me I am a senior student at computer engineering at iztech in turkey. My areas of inetrests are information management, OOP(Object Oriented Programming) and currently bioinformatics. I have been working with a Asistan Professor(Jens Allmer) in molecular biology genetics department for one year.Firstly, we worked on a protein database 2DB and we presented the project in HIBIT09 organization. The Project was “Database management system independence by amending 2DB with a database access layer”. Currently, I am working on another project (Kerb) as my senior project which is a general sqeuential task management system intend to reduce the errors and increase time saving in biological experiments. We will present this project in HIBIT2010 too. Moreover,I am good at data structures and implementations on C. Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)
[HACKERS] is_absolute_path incorrect on Windows
Here's a thread that incorrectly started on the security list, but really is more about functionality. Looking for comments: The function is_absolute_path() is incorrect on Windows. As it's implemented, it considers the following to be an absolute path: * Anything that starts with / * Anything that starst with \ * Anything alphanumerical, followed by a colon, followed by either / or \ Everything else is treated as relative. However, it misses the case with for example E:foo, which is a perfectly valid path on windows. Which isn't absolute *or* relative - it's relative to the current directory on the E: drive. Which will be the same as the current directory for the process *if* the process current directory is on drive E:. In other cases, it's a different directory. This function is used in the genfile.c functions to read and list files by admin tools like pgadmin - to make sure we can only open files that are in our own data directory - by making sure they're either relative, or they're absolute but rooted in our own data directory. (It rejects anything with .. in it already). The latest step in that thread is this comment from Tom: Yeah. I think the fundamental problem is that this code assumes there are two kinds of paths: absolute and relative to CWD. But on Windows there's really a third kind, relative with a drive letter. I believe that is_absolute_path is correct on its own terms, namely to identify a fully specified path. If we change it to allow cases that aren't really fully specified we will break other uses, such as in make_absolute_path. I'm inclined to propose adding an additional path test operator, along the lines of has_drive_specifier(path) (always false on non-Windows), and use that where needed to reject relative-with-drive-letter paths. I think I agree with this point, but we all agreed that we should throw the question out for the wider audience on -hackers for more comments. So - comments? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] is_absolute_path incorrect on Windows
Magnus Hagander mag...@hagander.net wrote: it considers the following to be an absolute path: * Anything that starts with / * Anything that starst with \ These aren't truly absolute, because the directory you find will be based on your current work directory's drive letter; however, if the point is to then check whether it falls under the current work directory, even when an absolute path is specified, it works. * Anything alphanumerical, followed by a colon, followed by either / or \ I assume we reject anything where what precedes the colon doesn't match the current drive's designation? However, it misses the case with for example E:foo, which is a perfectly valid path on windows. Which isn't absolute *or* relative - it's relative to the current directory on the E: drive. This function is used in the genfile.c functions to read and list files by admin tools like pgadmin - to make sure we can only open files that are in our own data directory - by making sure they're either relative, or they're absolute but rooted in our own data directory. (It rejects anything with .. in it already). Well, if that's a good idea, then you would need to reject anything specifying a drive which doesn't match the drive of the data directory. Barring the user from accessing directories on the current drive which aren't under the data directory on that drive, but allowing them to access any other drive they want, is just silly. It does raise the question of why we need to check this at all, rather than counting on OS security to limit access to things which shouldn't be seen. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
On Fri, Apr 9, 2010 at 9:10 AM, Necati Batur necatiba...@gmail.com wrote: I am new at open source project however in a user point of view I must confess that usability is a really though issue ,even if the performance of a database is crucial. Sure. Nobody is saying otherwise. As to my idea for improve postgresql is ; http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html in cavetaes section is mentioned that The schemes shown here assume that the partition key column(s) of a row never change, or at least do not change enough to require it to move to another partition. An UPDATE that attempts to do that will fail because of the CHECK constraints. If you need to handle such cases, you can put suitable update triggers on the partition tables, but it makes management of the structure much more complicated. Fixing this issue will help to improve the usability of partitions since the users do not want to deal with low-level integrity issues such as CHECK constraint. Roughly, I can say that if we want to deal with this issue,the first operation would be writing a trigger to check if an update operation causes a transfer issue between partitions.Then, if it is inevitable the user should be prompted about they are doing. Warning the system or user would generallry causes more trouble this point we need to decide on possible fixing ways and give more details about which choise will cause in what results. Then, creating a temprory table before commiting something will hellp us to conrol completeness and correctness. I tried to give more details about what I want to do.If you anything should be fixed in my proposal please earn me. This issue is, as Greg says, far more complicated than you realize. I would like to recommend again, as I did previously off-list, that you pick an easier project. Here again is the link to some ideas I wrote up previously. http://archives.postgresql.org/pgsql-hackers/2010-03/msg01034.php If you insist on pursuing a problem that you don't really understand and that is far larger than what you can tackle in one summer, then you are not going to be successful. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga yebhavi...@gmail.com wrote: Robert Haas wrote: Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Thoughts? The examples with units look a lot like the IVLPQ datatype from HL7, see http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm About a type interface, the HL7 spec talks about promotion from e.g. a timestamp to an interval (hl7 speak for range) of timestamps (a range), and demotion for the back direction. Every 'quantity type', which is any type with a (possibly partially) lineair ordered domain, can be promoted to an interval of that type. In PostgreSQL terms, this could perhaps mean that by 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. The way we've handled array types is, quite frankly, horrible. It's bad enough that we now have two catalog entries in pg_type for each base type; what's even worse is that if we actually wanted to enforce things like the number of array dimensions we'd need even more - say, seven per base type, one for the base type itself, one for a one-dimensional array, one for a two-dimensional array, one for a three-dimensional array. And then if we want to support range types that's another one for every base type, maybe more if there's more than one kind of range over a base type. It's just not feasible to handle derived types in a way that require a new instance of each base type to be created for each kind of derived type. It scales as O(number of base types * number of kinds of derived type), and that rapidly gets completely out of hand ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 10:33 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga yebhavi...@gmail.com wrote: Robert Haas wrote: Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Thoughts? The examples with units look a lot like the IVLPQ datatype from HL7, see http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm About a type interface, the HL7 spec talks about promotion from e.g. a timestamp to an interval (hl7 speak for range) of timestamps (a range), and demotion for the back direction. Every 'quantity type', which is any type with a (possibly partially) lineair ordered domain, can be promoted to an interval of that type. In PostgreSQL terms, this could perhaps mean that by 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. The way we've handled array types is, quite frankly, horrible. It's bad enough that we now have two catalog entries in pg_type for each base type; what's even worse is that if we actually wanted to enforce things like the number of array dimensions we'd need even more - say, seven per base type, one for the base type itself, one for a one-dimensional array, one for a two-dimensional array, one for a three-dimensional array. And then if we want to support range types that's another one for every base type, maybe more if there's more than one kind of range over a base type. It's just not feasible to handle derived types in a way that require a new instance of each base type to be created for each kind of derived type. It scales as O(number of base types * number of kinds of derived type), and that rapidly gets completely out of hand ...which by the way, doesn't mean that your idea is bad (although it might not be what I would choose to do), just that I don't think our current infrastructure can support it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] is_absolute_path incorrect on Windows
On Fri, Apr 9, 2010 at 16:02, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Magnus Hagander mag...@hagander.net wrote: it considers the following to be an absolute path: * Anything that starts with / * Anything that starst with \ These aren't truly absolute, because the directory you find will be based on your current work directory's drive letter; however, if the point is to then check whether it falls under the current work directory, even when an absolute path is specified, it works. That is true. However, since we have chdir():ed into our data directory, we know which drive we are on. So I think we're safe. * Anything alphanumerical, followed by a colon, followed by either / or \ I assume we reject anything where what precedes the colon doesn't match the current drive's designation? Define reject? We're just answering the question is absolute path?. It's then up to the caller. For example, in the genfiles function, we will take the absolute path and compare it to the path specified for the data directory, to make sure we can't go outside it. However, it misses the case with for example E:foo, which is a perfectly valid path on windows. Which isn't absolute *or* relative - it's relative to the current directory on the E: drive. This function is used in the genfile.c functions to read and list files by admin tools like pgadmin - to make sure we can only open files that are in our own data directory - by making sure they're either relative, or they're absolute but rooted in our own data directory. (It rejects anything with .. in it already). Well, if that's a good idea, then you would need to reject anything specifying a drive which doesn't match the drive of the data directory. Barring the user from accessing directories on the current drive which aren't under the data directory on that drive, but allowing them to access any other drive they want, is just silly. Yes. That's what the code does - once it's determined that it's an absolute directory, it will compare the start of it to the data directory. This will obviously not match if the data directory is on a different drive. It does raise the question of why we need to check this at all, rather than counting on OS security to limit access to things which shouldn't be seen. That is a different question, of course. For reading, it really should. But there was strong opposition to that when the functions were added, so this was added as an extra security check. This is why we're not treating it as a security problem. The callpoints require you to have superuser, so this is really just a way to make it a bit harder to do things wrong. There are other ways to get to the information, so it's not a security issue. It's more about preventing you from doing the wrong thing by mistake. Say a \copy foo e:foo.csv instead of e:/foo.csv, that might overwrite the wrong file by mistake - since the path isn't fully specified. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 4:10 AM, Dimitri Fontaine dfonta...@hi-media.com wrote: Do we want to enable support for string based ranges, as in the contributed prefix_range type? Yes, probably, but that doesn't require as much knowledge of the underlying data type, so I didn't feel it needed to be brought up in this context. There is no x such that ['a','b') = ['a',x]; it's generally impossible to convert between open and closed intervals in this type of range type. That's the case where type interfaces are needed; if you're not converting between different kinds of intervals then you can probably get by with the existing system of using the default btree opclass to find equality and comparison operators. I like the type interface approach and I think this concept has been studied in great details in math and that we should start from existing concepts, even if most of them are way over my head. I'm not too excited about patterning this too closely after mathematical concepts; I think we need to have a pragmatic approach that focuses on what the database actually needs. We need to think generally enough about what we're trying to provide that we don't box ourselves into a corner, but we're not trying to build a theorem-prover. You'll see in this mail a proposal for an operator group notion, which could get renamed to type interface if we think we won't need rings and such rather than just groups in the future. And there's opportunity for multi-type interfaces too (think families), like what's the distance between a point and a circle? Yeah, that needs some thought. The math groups already have a notion of neutral element, which for the addition is 0 (zero), we could expand our version of it with a unity element, which would be in the T domain. I don't know what that would mean in this case. We're trying to add and subtract from T, so a unit or identity element makes sense for U, but not for T. Then the range type could expand on this and provide a different unity value in their own interface, in the U domain this time. IMO tying the precision of the range interval into the type interface is a bad abstraction. As you said we want to possibly have several ranges types atop this. Right - so I think there's no point in specifying this in the type interface at all. We can always add it later if we find a real need for it. We can say that [1,3] = [1,4) when considering a default integer range because 4-3 = unity(integer). When considering a range over timestamps with a range interval unity of 1s, we are still able to do the math, and we can have another range over timestamps with a range interval unity of 10 mins in the same database. (I'm using this later example with the period datatype in a real application). While speaking of all that, in the prefix_range case, it'd be useful to have a new kind of typemod system at the range level, to allow for defining prefix text range with the '/' separator, say. Then greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?) Whereas currently = select '/etc/baz'::prefix_range | '/etc/bar'; ?column? -- /etc/ba[r-z] (1 row) Not sure I'm really following this. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
Hi, All I want to contribute to the project a liitle. I do not claim that I can actually solve all the issues about partitioning. Of course there are lots of ideas ,some looks pretty easy however, the distribution issue seems too attractive to me that I am dying to work on. I have checked the development stages and I know I am focused and I can do something really beneficail to the community too. Thanks all for attention :), PS: Even if I would not be selected for gsoc I would still contribute teh postgresql due to this communication :) 2010/4/9 Robert Haas robertmh...@gmail.com On Fri, Apr 9, 2010 at 9:10 AM, Necati Batur necatiba...@gmail.com wrote: I am new at open source project however in a user point of view I must confess that usability is a really though issue ,even if the performance of a database is crucial. Sure. Nobody is saying otherwise. As to my idea for improve postgresql is ; http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html in cavetaes section is mentioned that The schemes shown here assume that the partition key column(s) of a row never change, or at least do not change enough to require it to move to another partition. An UPDATE that attempts to do that will fail because of the CHECK constraints. If you need to handle such cases, you can put suitable update triggers on the partition tables, but it makes management of the structure much more complicated. Fixing this issue will help to improve the usability of partitions since the users do not want to deal with low-level integrity issues such as CHECK constraint. Roughly, I can say that if we want to deal with this issue,the first operation would be writing a trigger to check if an update operation causes a transfer issue between partitions.Then, if it is inevitable the user should be prompted about they are doing. Warning the system or user would generallry causes more trouble this point we need to decide on possible fixing ways and give more details about which choise will cause in what results. Then, creating a temprory table before commiting something will hellp us to conrol completeness and correctness. I tried to give more details about what I want to do.If you anything should be fixed in my proposal please earn me. This issue is, as Greg says, far more complicated than you realize. I would like to recommend again, as I did previously off-list, that you pick an easier project. Here again is the link to some ideas I wrote up previously. http://archives.postgresql.org/pgsql-hackers/2010-03/msg01034.php If you insist on pursuing a problem that you don't really understand and that is far larger than what you can tackle in one summer, then you are not going to be successful. ...Robert
Re: [HACKERS] extended operator classes vs. type interfaces
Robert Haas wrote: On Fri, Apr 9, 2010 at 10:33 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga yebhavi...@gmail.com wrote: Robert Haas wrote: Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Thoughts? The examples with units look a lot like the IVLPQ datatype from HL7, see http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm About a type interface, the HL7 spec talks about promotion from e.g. a timestamp to an interval (hl7 speak for range) of timestamps (a range), and demotion for the back direction. Every 'quantity type', which is any type with a (possibly partially) lineair ordered domain, can be promoted to an interval of that type. In PostgreSQL terms, this could perhaps mean that by 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. The way we've handled array types is, quite frankly, horrible. It's bad enough that we now have two catalog entries in pg_type for each base type; what's even worse is that if we actually wanted to enforce things like the number of array dimensions we'd need even more - say, seven per base type, one for the base type itself, one for a one-dimensional array, one for a two-dimensional array, one for a three-dimensional array. And then if we want to support range types that's another one for every base type, maybe more if there's more than one kind of range over a base type. It's just not feasible to handle derived types in a way that require a new instance of each base type to be created for each kind of derived type. It scales as O(number of base types * number of kinds of derived type), and that rapidly gets completely out of hand ...which by the way, doesn't mean that your idea is bad (although it might not be what I would choose to do), just that I don't think our current infrastructure can support it. Well yeah the idea was to 'automagically' have a range type available, if the underlying type would support it, i.e. has a lineair order and therefore ,,= etc defined on it, just like the array types, from a user / datatype developer perspective. From the implementers perspective, IMHO an extra catalog entry in pg_type is not bad on its own, you would have one anyway if the range type was explicitly programmed. About different kinds of range types - I would not know how to 'promote' integer into anything else but just one kind of 'range of integer' type. So the number of extra pg_types would be more like O(number of linear ordered base types). regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
From the implementers perspective, IMHO an extra catalog entry in pg_type is not bad on its own, you would have one anyway if the range type was explicitly programmed. About different kinds of range types - I would not know how to 'promote' integer into anything else but just one kind of 'range of integer' type. So the number of extra pg_types would be more like O(number of linear ordered base types). .. I now see the example of different ranges in your original mail with different unit increments. Making that more general so there could be continuous and discrete ranges and for the latter, what would the increment be.. OTOH is a range of integers with increment x a different type from range of integers with increment y, if xy? Maybe the increment step and continuous/discrete could be typmods. regards Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
Necati Batur escribió: Hi, All I want to contribute to the project a liitle. I do not claim that I can actually solve all the issues about partitioning. Of course there are lots of ideas ,some looks pretty easy however, the distribution issue seems too attractive to me that I am dying to work on. Partitioning is an issue that has had hundreds if not thousands of emails written about it. I suggest you have a look at the archives for previous discussions about how to tackle it. If you think that you can attack a small portion of the problem in a nonconnected way, prepare to be disappointed. The TODO list contains pointers to the previous discussions. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga yebhavi...@gmail.com wrote: From the implementers perspective, IMHO an extra catalog entry in pg_type is not bad on its own, you would have one anyway if the range type was explicitly programmed. About different kinds of range types - I would not know how to 'promote' integer into anything else but just one kind of 'range of integer' type. So the number of extra pg_types would be more like O(number of linear ordered base types). .. I now see the example of different ranges in your original mail with different unit increments. Making that more general so there could be continuous and discrete ranges and for the latter, what would the increment be.. OTOH is a range of integers with increment x a different type from range of integers with increment y, if xy? Maybe the increment step and continuous/discrete could be typmods. Nope, not enough bits available there. This is fundamentally why the typid/typmod system is so broken - representing a type as a fixed size object is extremely limiting. A fixed size object that MUST consist of a 32-bit unsigned OID and a 32-bit signed integer is even more limiting. Fortunately, we don't need to solve that problem in order to implement range types: we can just have people explicitly create the ones they need. This will, for example, avoid creating ranges over every composite type that springs into existence because a table is created, even though in most cases a fairly well-defined range type could be constructed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Gsoc XQuery
*IDEA:XMLQuery* * * *Abstract* SQL/XML makes it possible to store your XML documents in your SQL database, to query those documents using XPath and XQuery, and to publish your existing SQL data in the form of XML documents. *Benefits of Project* SQL/XML is an ANSI and ISO standard that provides support for using XML in the context of an SQL database system. Because SQL is the standard language for accessing and managing data stored in relational databases, it is natural that enterprises and users worldwide need the ability to integrate their XML data into their relational data through the use of SQL facilities.SQL/XML makes it possible to store your XML documents in your SQL database, to query those documents using XPath and XQuery, and to publish your existing SQL data in the form of XML documents. Nowadays Xml datatype is generally used to store,manipulate and export data between cross-platforms.We intend to implement and XML/SQL query for the postgresql based on XQuery,XPath. Postgresql 8.3 implemetns core functionalities for the SQL/XML but the XQuery is still needed to supported. Since XML is semi-structured it is flexible to use by other programming languages,reporting web-based applications as well as data exchange between different DBMSs. *Delivarables on Timeline* The implementation of XQuery support can ce achieved by; *Understanding the basics of the SQL/XML (1 week) *A research of on other kinds of XQuery supported DBMSs scuh as; Oracle and SQLServer(1-2 week) *The design for the core functionalities of the XQuery (2-4 week) *First few hard-coded functionalities(1-2 week) *Testing and adding more details during an iterative development(1-2 week) *Implementation of a GUI for XQuery for the PostgreSql users (2 week) *Final tests and documentation (1-2 week) *About me* I am a senior student at computer engineering at iztechhttp://english.iyte.edu.tr/main_eng.jsp?pageName=main.htm in turkey. My areas of inetrests are information management, OOP(Object Oriented Programming) and currently bioinformatics. I have been working with a Asistan Professor(Jens Allmer http://jens.allmer.de/) in molecular biology genetics department for one year.Firstly, we worked on a protein database 2DB http://www.2db.de.ms/ and we presented the project in HIBIT09http://hibit09.ii.metu.edu.tr/organization. The Project was “Database management system independence by amending 2DB with a database access layer”. Currently, I am working on another project (Kerb) as my senior project which is a general sqeuential task management system intend to reduce the errors and increase time saving in biological experiments. We will present this project in HIBIT2010http://hibit2010.ii.metu.edu.tr/ too. Moreover,I am good at data structures and implementations on C. Contact: e-mails; necatiba...@gmail.com ,
Re: [HACKERS] extended operator classes vs. type interfaces
On 04/09/2010 07:33 AM, Robert Haas wrote: On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga yebhavi...@gmail.com wrote: 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. The way we've handled array types is, quite frankly, horrible. It's bad enough that we now have two catalog entries in pg_type for each base type; what's even worse is that if we actually wanted to enforce things like the number of array dimensions we'd need even more - say, seven per base type, one for the base type itself, one for a one-dimensional array, one for a two-dimensional array, one for a three-dimensional array. And then if we want to support range types that's another one for every base type, maybe more if there's more than one kind of range over a base type. It's just not feasible to handle derived types in a way that require a new instance of each base type to be created for each kind of derived type. It scales as O(number of base types * number of kinds of derived type), and that rapidly gets completely out of hand Perhaps off the original topic (and thinking out loud), but I agree with you on the handling of array types. I have long thought (and at least once played with the idea) that a single array type, anyarray, made up of elements, anyelement, could be made to work. Further, anyelement should be defined to be any valid existing type, including anyarray. Essentially, at least by my reading of the SQL spec, a multidimensional array ought to be an array of arrays, which is different in subtle ways from what we have today. Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] extended operator classes vs. type interfaces
Robert Haas robertmh...@gmail.com wrote: Given a type T, I think we'd like to be able to define a type U as the natural type to be added to or subtracted from T. As Jeff pointed out to me, this is not necessarily the same as the underlying type. For example, if T is a timestamp, U is an interval; if T is a numeric, U is also a numeric; if T is a cidr, U is an integer. Then we'd like to define a canonical addition operator and a canonical subtraction operator. As it is de rigueur for someone to escalate the proposed complexity of an idea by at least one order of magnitude, and everyone else has fallen down on this one: ;-) I've often thought that if we rework the type system, it would be very nice to support a concept of hierarchy. If you could subclass money to have a subclass like assessable, which in turn has subclasses of fine, fee, restitution, etc. you could then automatically do anything with a subclass which you could do with the superclass, and support such things as treating the sum of various classes as the lowest common subclass. It seems like this sort of approach, if done right, might allow some easier way to establish sensible operations between types (like distance / speed = time or speed * time = distance). Just a thought -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 1:13 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: As it is de rigueur for someone to escalate the proposed complexity of an idea by at least one order of magnitude, and everyone else has fallen down on this one: ;-) Gee, thanks for filling in? I've often thought that if we rework the type system, it would be very nice to support a concept of hierarchy. If you could subclass money to have a subclass like assessable, which in turn has subclasses of fine, fee, restitution, etc. you could then automatically do anything with a subclass which you could do with the superclass, and support such things as treating the sum of various classes as the lowest common subclass. It seems like this sort of approach, if done right, might allow some easier way to establish sensible operations between types (like distance / speed = time or speed * time = distance). Just a thought I dowanna rework the type system. I'm not even 100% sure I want to implement what I actually proposed. I do want to find out if people think the framework makes sense and whether it's the right way forward for those projects that need these features. What you're proposing here sounds suspiciously like something that should be handled by creating domains - but in any case it's almost entirely unrelated to what I was talking about. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] is_absolute_path incorrect on Windows
Magnus Hagander mag...@hagander.net wrote: On Fri, Apr 9, 2010 at 16:02, Kevin Grittner I assume we reject anything where what precedes the colon doesn't match the current drive's designation? Define reject? I guess I made that comment thinking about the example of usage farther down. We're just answering the question is absolute path?. It's then up to the caller. For example, in the genfiles function, we will take the absolute path and compare it to the path specified for the data directory, to make sure we can't go outside it. I would say that a function which tells you whether a path is absolute should, under Windows, return false if there isn't a leading slash or backslash after any drive specification. Whether lack of a drive specification should cause it to return false or whether that should be a separate test doesn't seem like it makes a big difference, as long as it's clear and documented. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to modify default Type (TSQuery) behaviour?
Oleg Bartunov suggested that I should explicitly use set search_path=public,pg_catalog; That's a good tip, as I missed the info in the manual, but it solved only one instance of my problem, namely SELECT to_tsquery('cat dog mouse'); --this now works as it should But the other instances still use the default INPUT/OUTPUT functions and are not affected by schema setting: SELECT 'cat dog mouse'::tsquery; --uses the built in function SELECT CAST('dogs cat' AS tsquery); --uses the built in function SELECT public.to_tsquery('dog mouse'); --new operator (doublequote), works fine until OUTPUT function is called and as it is the default one an error is displayed Is there any other way I could overwrite the default functions for a built in type in Postgres? 2010/4/9 Łukasz Dejneka l.dejn...@gmail.com: Hi all I've asked related question on General list, but got no answer, although I have been able to work around that issue a little bit. There is one snag I encountered and I have no idea on how to work it out. I need to modify TSQuery object (add another operator and do some stuff with it) - this is done and works properly at C code level. Now I need to implement changes done in C on PG level. I really do not want to make another type, tsquery2 or such... What I've tried: - the manual states that you can create your own data types with CREATE TYPE command and alter some of their proprieties with ALTER TYPE. But it is not possible to modify INPUT or OUTPUT function. - I have created updated PG versions of the functions to_tsquery (the CAST function) and tsqueryout (the TYPE OUTPUT) function and they are in the public schema. - I have created a CAST from text to tsquery pointing to public.to_tsquery function. - I have changed the search_path so the public schema is first. How do I overwrite the default behaviour of Postgres 8.4, so I can successfully run the following queries: --1. SELECT 'cat dog mouse'::tsquery; --uses the built in function SELECT to_tsquery('cat dog mouse'); --also uses the built in function --but SELECT public.to_tsquery('cat dog mouse'); --uses new functions --2. SELECT CAST('dogs cat' AS tsquery); --uses the built in function --but SELECT CAST('dogs cat'::text AS tsquery); --uses new functions --3. SELECT public.to_tsquery('dog mouse'); --new operator (doublequote), works fine until OUTPUT function is called and as it is the default one an error is displayed --but SELECT public.tsqueryout(CAST('dogs cat'::text AS tsquery)); --uses new functions and displays correctly I thought that setting schema so public has priority over all other would make PG use those functions in the first place. Also is the string between the single quotes in SQL commands not treated as text type? Is this why a CAST to ::text make it work in example 2? Thanks in advance. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
Robert Haas robertmh...@gmail.com wrote: I dowanna rework the type system. I'm not even 100% sure I want to implement what I actually proposed. I do want to find out if people think the framework makes sense and whether it's the right way forward for those projects that need these features. What you proposed sounds like it would be cleaner and less work than further perverting the index system as a source of information about types or hard-coding knowledge anywhere else. What you're proposing here sounds suspiciously like something that should be handled by creating domains Not really. Unless I've missed something domains are a single-level layer over a data type. I find them very useful and use them heavily, but the standard implementation is rather limited. Perhaps that would be the area to add the functionality I suggested, though. I'm totally at the hand-waving stage on it, with no concrete ideas. I just thought that if you were adding more type information, oriented aournd the types themselves rather than index AMs, some form of inheritence might fit in gracefully. in any case it's almost entirely unrelated to what I was talking about. OK -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
Robert Haas wrote: On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga yebhavi...@gmail.com wrote: .. I now see the example of different ranges in your original mail with different unit increments. Making that more general so there could be continuous and discrete ranges and for the latter, what would the increment be.. OTOH is a range of integers with increment x a different type from range of integers with increment y, if xy? Maybe the increment step and continuous/discrete could be typmods. Nope, not enough bits available there. This is fundamentally why the typid/typmod system is so broken - representing a type as a fixed size object is extremely limiting. A fixed size object that MUST consist of a 32-bit unsigned OID and a 32-bit signed integer is even more limiting. Fortunately, we don't need to solve that problem in order to implement range types: we can just have people explicitly create the ones they need. This will, for example, avoid creating ranges over every composite type that springs into existence because a table is created, even though in most cases a fairly well-defined range type could be constructed. Ok, no typmod, not default extra types for base types, but the concept of an still there is one aspect of ranges (may I say intervals?) of 'things' is something generic, and code to handle intervals of things could be shared between datatype implementations. A way to have generic support without automatic new types could be with something that looks like: What about CREATE TYPE ivl_int AS INTERVAL OF integer; SELECT '[1;2]'::ivl_int; etc regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
Alvaro Herrera alvhe...@commandprompt.com writes: Necati Batur escribió: Hi, All I want to contribute to the project a liitle. I do not claim that I can actually solve all the issues about partitioning. Of course there are lots of ideas ,some looks pretty easy however, the distribution issue seems too attractive to me that I am dying to work on. Partitioning is an issue that has had hundreds if not thousands of emails written about it. I suggest you have a look at the archives for previous discussions about how to tackle it. If you think that you can attack a small portion of the problem in a nonconnected way, prepare to be disappointed. The TODO list contains pointers to the previous discussions. I guess a GSoC of reasonable size would be to define a spec for how to implement partitioning in PostgreSQL with a sound and accepted proposal on independent steps to contribute separately, in order to reach the full implementation in an incremental fashion and by different hackers. Then you could pick up one of those items. By then I mean after the summary and the plan both have been accepted by core people and by contributors who said in the past they wanted to spend precious hours on the topic. But I don't know if a GSoC can be completed without even coding. -- dim Please, if this first step is in good shape, give us pointers to a current document with the details, I'd happily stand corrected! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 4:01 PM, Yeb Havinga yebhavi...@gmail.com wrote: Robert Haas wrote: On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga yebhavi...@gmail.com wrote: .. I now see the example of different ranges in your original mail with different unit increments. Making that more general so there could be continuous and discrete ranges and for the latter, what would the increment be.. OTOH is a range of integers with increment x a different type from range of integers with increment y, if xy? Maybe the increment step and continuous/discrete could be typmods. Nope, not enough bits available there. This is fundamentally why the typid/typmod system is so broken - representing a type as a fixed size object is extremely limiting. A fixed size object that MUST consist of a 32-bit unsigned OID and a 32-bit signed integer is even more limiting. Fortunately, we don't need to solve that problem in order to implement range types: we can just have people explicitly create the ones they need. This will, for example, avoid creating ranges over every composite type that springs into existence because a table is created, even though in most cases a fairly well-defined range type could be constructed. Ok, no typmod, not default extra types for base types, but the concept of an still there is one aspect of ranges (may I say intervals?) of 'things' is something generic, and code to handle intervals of things could be shared between datatype implementations. A way to have generic support without automatic new types could be with something that looks like: What about CREATE TYPE ivl_int AS INTERVAL OF integer; SELECT '[1;2]'::ivl_int; etc Yeah, that's how it has to work, I think. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc XQuery
Necati Batur wrote: *IDEA:XMLQuery* * * *Abstract* SQL/XML makes it possible to store your XML documents in your SQL database, to query those documents using XPath and XQuery, and to publish your existing SQL data in the form of XML documents. *Benefits of Project* SQL/XML is an ANSI and ISO standard that provides support for using XML in the context of an SQL database system. Because SQL is the standard language for accessing and managing data stored in relational databases, it is natural that enterprises and users worldwide need the ability to integrate their XML data into their relational data through the use of SQL facilities.SQL/XML makes it possible to store your XML documents in your SQL database, to query those documents using XPath and XQuery, and to publish your existing SQL data in the form of XML documents. Nowadays Xml datatype is generally used to store,manipulate and export data between cross-platforms.We intend to implement and XML/SQL query for the postgresql based on XQuery,XPath. Postgresql 8.3 implemetns core functionalities for the SQL/XML but the XQuery is still needed to supported. Since XML is semi-structured it is flexible to use by other programming languages,reporting web-based applications as well as data exchange between different DBMSs. *Delivarables on Timeline* The implementation of XQuery support can ce achieved by; *Understanding the basics of the SQL/XML (1 week) *A research of on other kinds of XQuery supported DBMSs scuh as; Oracle and SQLServer(1-2 week) *The design for the core functionalities of the XQuery (2-4 week) *First few hard-coded functionalities(1-2 week) *Testing and adding more details during an iterative development(1-2 week) *Implementation of a GUI for XQuery for the PostgreSql users (2 week) *Final tests and documentation (1-2 week) *About me* I am a senior student at computer engineering at iztech http://english.iyte.edu.tr/main_eng.jsp?pageName=main.htm in turkey. My areas of inetrests are information management, OOP(Object Oriented Programming) and currently bioinformatics. I have been working with a Asistan Professor(Jens Allmer http://jens.allmer.de/) in molecular biology genetics department for one year.Firstly, we worked on a protein database 2DB http://www.2db.de.ms/ and we presented the project in HIBIT09 http://hibit09.ii.metu.edu.tr/organization. The Project was “Database management system independence by amending 2DB with a database access layer”. Currently, I am working on another project (Kerb) as my senior project which is a general sqeuential task management system intend to reduce the errors and increase time saving in biological experiments. We will present this project in HIBIT2010 http://hibit2010.ii.metu.edu.tr/ too. Moreover,I am good at data structures and implementations on C. Have you reviewed the discussions that have already occurred regarding XQuery? See for example here: http://archives.postgresql.org/pgsql-hackers/2010-02/msg01350.php and following emails in the thread. This is not something to be done in isolation. (Personally I wish some SOC student would do LATERAL. It would be useful and is a feature of about the right size, I think). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
Hello, I am sending my proposal on GSoC. Details are listed below. Please, if you have ideas, tips, or if you only want to say you opinion about my project, go ahead. thanks, Pavel Baros Abstract: It is effort to implement snapshot materialized view (are only updated when refreshed) in PostgreSQL. In this time I finished some part of it and I am trying to publish my present work on git.postgresql.org Benefits to the PostgreSQL Community First of all, it would be the best if my work is helpful to everybody who misses materialized views in PostgreSQL, because PostgreSQL do not have still implemented materialized views. In addition, MV is mentioned as feature in TODO list. Deliverables First of all, at the end of whole my project is not only writing bachelors thesis, but finish it as patch and if possible, get patch into next PostgrSQL release, or keep git repository actual to last PosgreSQL version. I have also personal goals. Arouse the interest about implementing MV in PostgreSQL, or at least arouse discussion about it. Project Schedule My work goes quite good, I am on good way to finish main parts on backend in few weeks. After that I will make and run tests and implement related changes to PosgreSQL tools (psql, pg_dump, etc.). I am also waiting for approval for my repository named materialized_view on git.postgresql.org, so I could publish completed parts. For now next step will be to discuss implementation on postgresql.hackers. Bio I am from Czech Republic and I am studying on Faculty of Electrical Engineering on Czech Technical University in Prague www.fel.cvut.cz/en/. My bachelor thesis is based on this project, implementing MV in PostgreSQL. I've experienced many different jobs. The best experience for me was, when I've worked as tester and software engineer in C/C++ and C# for Radiant Systems Inc. for more than year. After that I've worked as Web developer with Internet technologies (PHP, HTML, CSS, ...), where the goal was to make an internal system for an advertising agency. Finally my recent job was as Windows Mobile Developer. Except the first experience, others lasts only few months mainly because those were temporary projects. For now I am looking for some part time job, of course, preferably something closer to database systems. Implementation: could be divided to few steps: 1) create materialized view - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ... - change executor, so that it will create physical table defined by select statement 2) change rewriter - usually, view is relation with defined rule and when rewriting, rule is fired and relation (view) is replaced by definition of view. If relation do not have rule, planner and executor behave to it as physical table (relation). In case of materialized view we want to rewrite select statement only in case when we refreshing MV. In other cases rewriter should skip rewriting and pick up physical relation. Exclude situation when other rewrite rules which are not related to MV definition are specified. 3) create command that takes snapshot (refresh MV) - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH; - taking snapshot (refreshing) is similar to command SELECT INTO ... and I decided to follow the way it works. After parsing query and before transformation is MANUALLY created tree representation of SELECT * INTO ... with flag IntoClause-isrefresh set true, indicating it is refreshing materialized view. Everithing acts as it would be regular SELECT INTO ... except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel is created temp table (without catalog) and set as destination for result of select. In function CloseIntoRel executor swap relfilenode's of temp table and original table and finally delete temp table. Behavior of CloseIntoRel function is inspired by CLUSTER statement. Contacts: baro...@seznam.cz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] testing hot standby
Hi, i'm startint to try Hot Standby Streaming Replication, so i started a replication: 1) Install master server with regression database 2) Start WAL archive (archive_mode=on, archive_command='cp %p /usr/local/pgsql/wal_archive/%f') 3) select pg_start_backup('standby test'); 4) cp -R /usr/local/pgsql/9.0/data /usr/local/pgsql/9.0slave/data 5) select pg_stop_backup(); at this point i checked wal_archive directory: postg...@casanova14:/usr/local/pgsql/9.0$ ls ../wal_archive/ 00010003 00010004 00010004.0020.backup 6) started standby recovery (archive_mode=off, standy_mode=on, primary_conninfo = 'host=127.0.0.1 port=5432 user=postgres') wait a little and check logs: LOG: database system was interrupted; last known up at 2010-04-09 14:48:16 ECT LOG: entering standby mode LOG: restored log file 00010004 from archive LOG: redo starts at 0/420 LOG: consistent recovery state reached at 0/500 LOG: database system is ready to accept read only connections LOG: restored log file 00010005 from archive cp: no se puede efectuar `stat' sobre «/usr/local/pgsql/wal_archive/00010006»: No existe el fichero ó directorio LOG: unexpected pageaddr 0/200 in log file 0, segment 6, offset 0 cp: no se puede efectuar `stat' sobre «/usr/local/pgsql/wal_archive/00010006»: No existe el fichero ó directorio LOG: streaming replication successfully connected to primary mmm... are we waiting for a WAL file that doesn't exist? 7) i then, restart standby server LOG: received smart shutdown request FATAL: terminating walreceiver process due to administrator command LOG: shutting down LOG: database system is shut down LOG: database system was interrupted while in recovery at log time 2010-04-09 15:06:23 ECT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode cp: no se puede efectuar `stat' sobre «/usr/local/pgsql/wal_archive/00010006»: No existe el fichero ó directorio LOG: invalid record length at 0/680 cp: no se puede efectuar `stat' sobre «/usr/local/pgsql/wal_archive/00010006»: No existe el fichero ó directorio 8) i initialize pgbench tables, which create missing WAL files (bin/pgbench -i) and then it could connect to the primary, and some minutes later it could accept connections LOG: streaming replication successfully connected to primary FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up LOG: redo starts at 0/680 LOG: consistent recovery state reached at 0/6A0 FATAL: the database system is starting up LOG: database system is ready to accept read only connections but, my main concern is why it was asking for 00010006? is this normal? is this standby's way of saying i'm working but i have nothing to do? when that happens after a standby restart, is normal that i have to wait until the file is created before it can accept connections? sorry, if this questions sound very simple but i haven't following all the design details :) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
pavelbaros baro...@seznam.cz wrote: I am also waiting for approval for my repository named materialized_view on git.postgresql.org They seem to prefer that you get a repository under your name and use materialized_view as a branch name. See my account on git.postgresql.org and its serializable branch for an example. I learned by putting in a request similar to your pending one. ;-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc XQuery
On Fri, Apr 9, 2010 at 4:37 PM, Andrew Dunstan and...@dunslane.net wrote: (Personally I wish some SOC student would do LATERAL. It would be useful and is a feature of about the right size, I think). Actually, I think that requires two rounds of significant executor refactoring. This is round two: http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php Round one, which Tom said he was already planning to do for 9.1, is to eliminate the hack by which outer tuples are passed down into nest-loops-with-inner-indexscan. Once those two things are done the remaining work might be suitable for a GSoC project. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc XQuery
Necati Batur escribió: *Delivarables on Timeline* The implementation of XQuery support can ce achieved by; *Understanding the basics of the SQL/XML (1 week) *A research of on other kinds of XQuery supported DBMSs scuh as; Oracle and SQLServer(1-2 week) *The design for the core functionalities of the XQuery (2-4 week) *First few hard-coded functionalities(1-2 week) *Testing and adding more details during an iterative development(1-2 week) *Implementation of a GUI for XQuery for the PostgreSql users (2 week) What, you will *also* implement a GUI? Gimme a break. There's no way you can achieve most of these in 1-2 weeks. Are you Tom Lane perchance? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
Well, If the project criterias and other neccessary information were collected under a single link that would be great for not only gsoc students but also for other enthusiastic students :).By provided info we would spend less time to understand the project phases and requirements and more time on dediciding part we will develop and do more research on how to develop. As far as I can see the mailing list is a good way of communication but not a really good way to inform newbies. It may be another open-source project to have a project idea to have a single site and a treeview of all projects and the detailed project information part to have a really good way of information exchange :) Thanks all, 2010/4/8 Necati Batur necatiba...@gmail.com Benefits of Project Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE is far faster than a bulk operation. It also entirely avoids the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media. Delivarables *The trigger based operations can be done automatically *The stored procedures can help us to do some functionalities like check constraint problem *manual VACUUM or ANALYZE commands can be handled by using triggers DBMS SQL can help to provide faster executions *Some more functionalities can be added to UPDATE operations to make administrations easy Timeline (not exact but most probably) Start at june 7 and End around 7 september *Warm up to environment to Postgresql(1-2 weeks) *Determine exact operations to be addded on postgresql *Initial coding as to workbreakdown structure *Start implementing on distributed environment to check inital functions work *Write test cases for code *Further implementation to support full functionalities on ideas *Write it to discussion site and collect feedbacks *More support upon feedbacks *Last tests and documentation of final operations About me I am a senior student at computer engineering at iztech in turkey. My areas of inetrests are information management, OOP(Object Oriented Programming) and currently bioinformatics. I have been working with a Asistan Professor(Jens Allmer) in molecular biology genetics department for one year.Firstly, we worked on a protein database 2DB and we presented the project in HIBIT09 organization. The Project was “Database management system independence by amending 2DB with a database access layer”. Currently, I am working on another project (Kerb) as my senior project which is a general sqeuential task management system intend to reduce the errors and increase time saving in biological experiments. We will present this project in HIBIT2010 too. Moreover,I am good at data structures and implementations on C. Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)
Re: [HACKERS] C-Language Fun on VC2005 ERROR: could not load library
2010/4/9 chaoyong wang catcher_w...@hotmail.com: Hi, I'm using VC2005 to create PG C-language Fun in my contrib xml_index, which import other library, and I have add the include and lib directory by changing Mkvcbuild.pm and config.pl. But after I executed the following commands: build DEBUG perl install.pl C:\Program Files\PostgreSQL\8.3 initdb.exe -D C:\Program Files\PostgreSQL\8.3\data -E UTF8 --locale=C pg_ctl -D C:/Program Files/PostgreSQL/8.3/data -l logfile start createdb test psql test when I trying to create the function by the following commands: CREATE OR REPLACE FUNCTION create_xml_value_i ndex(text,text,text) RETURNS bool AS '$libdir/xml_index' LANGUAGE C STRICT IMMUTABLE; It reports an ERROR: could not load library C:\Program Files\PostgreSQL\8.3\lib/xml_index.dll: The specified module could not be found. I checked the directory C:\Program Files\PostgreSQL\8.3\lib, xml_index.dll exists. I tried to changed $libdir/xml_index by C:\\Program Files\\PostgreSQL\\8.3\\lib\\xml_index, error remains. Has anyone ever encountered this problem ? Thanks in advance. Does your library xml_index.dll depend on another library which isn't on your PATH? Regards, Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, 2010-04-09 at 12:50 -0500, Kevin Grittner wrote: I just thought that if you were adding more type information, oriented aournd the types themselves rather than index AMs, some form of inheritence might fit in gracefully. There are already some specific proposals for inheritance in database theory literature. For instance: Databases, Types, and the Relational Model by C.J. Date addresses inheritance explicitly (and the appendices have some interesting discussion). I'm not sure how compatible it is with SQL, though; and I am not very optimistic that we could accomplish such a restructuring of the type system while maintaining a reasonable level of backwards compatibility. Either way, I think it's a separate topic. Two types that are not related by any subtype/supertype relationship (like strings and ints) can conform to the same interface (total ordering); while the very same type can conform to two different interfaces. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc XQuery
On Fri, Apr 9, 2010 at 5:02 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Necati Batur escribió: *Delivarables on Timeline* The implementation of XQuery support can ce achieved by; *Understanding the basics of the SQL/XML (1 week) *A research of on other kinds of XQuery supported DBMSs scuh as; Oracle and SQLServer(1-2 week) *The design for the core functionalities of the XQuery (2-4 week) *First few hard-coded functionalities(1-2 week) *Testing and adding more details during an iterative development(1-2 week) *Implementation of a GUI for XQuery for the PostgreSql users (2 week) What, you will *also* implement a GUI? Gimme a break. There's no way you can achieve most of these in 1-2 weeks. Are you Tom Lane perchance? I think the problem with this and the other proposal from the same student is that, according to his submissions and statements, he doesn't really know what the design for any of these features is going to be. So his plan is to first figure out the design, and then implement it. I don't believe he's done any work reading through existing mailing list discussions, wiki pages, or even maybe our core documentation. I believe we should really be expecting students to have a fairly detailed design at the time they submit the project - the summer is for implementing it, not for figuring out what it is and then implementing it. This might be just barely acceptable if he started with a small project for which we already pretty much know what the design has to be. But for a major undertaking like the two he's proposed so far, you can't think that you're going to start with hand-waving and end up with something useful. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, 2010-04-09 at 11:14 -0400, Robert Haas wrote: range of integers with increment y, if xy? Maybe the increment step and continuous/discrete could be typmods. Nope, not enough bits available there. I think the problem is deeper than that. Typmods aren't carried along as part of the result of a function call, so typmod is not really a part of the type at all -- it's more a property of the column. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Thu, 2010-04-08 at 22:29 -0400, Robert Haas wrote: 1. knngist wants to use index scans to speed up queries of the form SELECT ... ORDER BY column op constant (as opposed to the existing machinery which only knows how to use an index for SELECT ... ORDER BY column). 2. Window functions want to define windows over a range of values defined by the underlying data type. To do this, we need to define what addition and subtraction mean for a particular data type. 3. Jeff Davis is interested in implementing range types. When the underlying base type is discrete, e.g. integers, you can say that [1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in that order). To give some context, I started a thread a while ago: http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php Tom provided some interesting suggestions in that thread, but I'm not sure they would work for #1 or #2. It may or may not be worth building the concept of a unit increment into the type interface machinery, though: one could imagine two different range types built over the same base type with different unit increments - e.g. one timestamp range with unit increment = 1s, and one with unit increment = 1m. Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Right. Part of the interface could be a unit() function, and that can return whatever you want. I was originally thinking about it in terms of next() and prev(), but you could build those from +, -, and unit(). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 5:49 PM, Jeff Davis pg...@j-davis.com wrote: It may or may not be worth building the concept of a unit increment into the type interface machinery, though: one could imagine two different range types built over the same base type with different unit increments - e.g. one timestamp range with unit increment = 1s, and one with unit increment = 1m. Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Right. Part of the interface could be a unit() function, and that can return whatever you want. I was originally thinking about it in terms of next() and prev(), but you could build those from +, -, and unit(). The advantage of specifying a + and a - in the type interface is that the unit definition can then be specified as part of the type declaration itself. So you can do: CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); All of the stuff about defining + and - is hidden from the user - it's part of the type interface, which is pre-created. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
2010/4/9 pavelbaros baro...@seznam.cz: Implementation: could be divided to few steps: 1) create materialized view - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ... - change executor, so that it will create physical table defined by select statement This basically needs to work the same was as CREATE TABLE ... AS SELECT ... - save that it should also stuff the rewritten query someplace, so that it can be re-executed. I think one of the important design questions here is figuring out exactly where that someplace should be. I also suspect that we want to block any write access to the relation except for view refreshes. IOW, INSERT, UPDATE, and DELETE on the underlying relation should be rejected (though perhaps rewrite rules redirecting such operations to other tables could be allowed). 2) change rewriter - usually, view is relation with defined rule and when rewriting, rule is fired and relation (view) is replaced by definition of view. If relation do not have rule, planner and executor behave to it as physical table (relation). In case of materialized view we want to rewrite select statement only in case when we refreshing MV. In other cases rewriter should skip rewriting and pick up physical relation. Exclude situation when other rewrite rules which are not related to MV definition are specified. 3) create command that takes snapshot (refresh MV) - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH; - taking snapshot (refreshing) is similar to command SELECT INTO ... and I decided to follow the way it works. After parsing query and before transformation is MANUALLY created tree representation of SELECT * INTO ... with flag IntoClause-isrefresh set true, indicating it is refreshing materialized view. Everithing acts as it would be regular SELECT INTO ... except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel is created temp table (without catalog) and set as destination for result of select. In function CloseIntoRel executor swap relfilenode's of temp table and original table and finally delete temp table. Behavior of CloseIntoRel function is inspired by CLUSTER statement. I'll have to read the code before I can comment on the rest of this in detail. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
The advantage of specifying a + and a - in the type interface is that the unit definition can then be specified as part of the type declaration itself. So you can do: CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); All of the stuff about defining + and - is hidden from the user - it's part of the type interface, which is pre-created. The disadvantage is that it does not permit irregularly spaced units. -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] testing HS/SR - 1 vs 2 performance
Using 9.0devel cvs HEAD, 2010.04.08. I am trying to understand the performance difference between primary and standby under a standard pgbench read-only test. server has 32 GB, 2 quadcores. primary: tps = 34606.747930 (including connections establishing) tps = 34527.078068 (including connections establishing) tps = 34654.297319 (including connections establishing) standby: tps = 700.346283 (including connections establishing) tps = 717.576886 (including connections establishing) tps = 740.522472 (including connections establishing) transaction type: SELECT only scaling factor: 1000 query mode: simple number of clients: 20 number of threads: 1 duration: 900 s both instances have max_connections = 100 shared_buffers = 256MB checkpoint_segments = 50 effective_cache_size= 16GB See also: http://archives.postgresql.org/pgsql-testers/2010-04/msg5.php (differences with scale 10_000) I understand that in the scale=1000 case, there is a huge cache effect, but why doesn't that apply to the pgbench runs against the standby? (and for the scale=10_000 case the differences are still rather large) Maybe these differences are as expected. I don't find any explanation in the documentation. thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 7:18 PM, Nathan Boley npbo...@gmail.com wrote: The advantage of specifying a + and a - in the type interface is that the unit definition can then be specified as part of the type declaration itself. So you can do: CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); All of the stuff about defining + and - is hidden from the user - it's part of the type interface, which is pre-created. The disadvantage is that it does not permit irregularly spaced units. True. The only types I can think of that have irregularly spaced units would be things based on floating points, and I was assuming that people would only want continuous intervals on those. If someone really wants to be able to deduce that [1.0,3.0) = [1.0,3.0-epsilon), then we need a different design. But I find it hard to believe that's very useful. Maybe you feel otherwise? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
On Fri, Apr 9, 2010 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Apr 9, 2010 at 7:18 PM, Nathan Boley npbo...@gmail.com wrote: The advantage of specifying a + and a - in the type interface is that the unit definition can then be specified as part of the type declaration itself. So you can do: CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); All of the stuff about defining + and - is hidden from the user - it's part of the type interface, which is pre-created. The disadvantage is that it does not permit irregularly spaced units. True. The only types I can think of that have irregularly spaced units would be things based on floating points, and I was assuming that people would only want continuous intervals on those. If someone really wants to be able to deduce that [1.0,3.0) = [1.0,3.0-epsilon), then we need a different design. But I find it hard to believe that's very useful. Maybe you feel otherwise? Er, that [1.0,3.0) = [1.0,3.0-epsilon], rather. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc XQuery
On Fri, Apr 9, 2010 at 5:02 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Necati Batur escribió: *Delivarables on Timeline* ... *Implementation of a GUI for XQuery for the PostgreSql users (2 week) What, you will *also* implement a GUI? Gimme a break. There's no way you can achieve most of these in 1-2 weeks. Are you Tom Lane perchance? Indeed. I've learned it's not a good idea to throw a GUI into a proposal when the rest of the project is already substantial, as it can lead to embarrassment when it doesn't get realized ;-) I did that last year. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
On Fri, Apr 9, 2010 at 4:08 PM, Dimitri Fontaine dfonta...@hi-media.com wrote: I guess a GSoC of reasonable size would be to define a spec for how to implement partitioning in PostgreSQL with a sound and accepted proposal on independent steps to contribute separately, in order to reach the full implementation in an incremental fashion and by different hackers. Then you could pick up one of those items. By then I mean after the summary and the plan both have been accepted by core people and by contributors who said in the past they wanted to spend precious hours on the topic. But I don't know if a GSoC can be completed without even coding. According to the link below, GSoC proposals for documentation aren't accepted. This probably extends to other non-coding work as well. http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#doc_proposals -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
pavelbaros wrote: I am also waiting for approval for my repository named materialized_view on git.postgresql.org, so I could publish completed parts. Presuming that you're going to wander there and get assigned what looks like an official repo name for this project is a bit...optimistic. I would recommend that you publish to something like github instead (you can fork http://github.com/postgres/postgres ), and if the work looks good enough that it gets picked up by the community maybe you migrate it onto the main site eventually. git.postgresql.org is really not setup to be general hosting space for everyone who has a PostgreSQL related project; almost every repo on there belongs to someone who has already been a steady project contributor for a number of years. (Switching to boilerplate mode for a paragraph...) You have picked a PostgreSQL feature that is dramatically more difficult than it appears to be, and I wouldn't expect you'll actually finish even a fraction of your goals in a summer of work. You're at least in plentiful company--most students do the same. As a rule, if you see a feature on our TODO list that looks really useful and fun to work on, it's only still there because people have tried multiple times to build it completely but not managed to do so because it's harder than it appears. This is certainly the case with materialized views. You've outlined a reasonable way to build a prototype that does a limited implementation here. The issue is what it will take to extend that into being production quality for the real-world uses of materialized views. How useful your prototype is depends on how well it implements a subset of that in a way that will get used by the final design. The main hidden complexity in this particular project relates to handling view refreshes. The non-obvious problem is that when the view updates, you need something like a SQL MERGE to really handle that in a robust way that doesn't conflict with concurrent access to queries against the materialized view. And work on MERGE support is itself blocked behind the fact that PostgreSQL doesn't have a good way to lock access to a key value that doesn't exist yet--what other databases call key range locking. See the notes for Add SQL-standard MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo for more information. You can work around that to build a prototype by grabbing a full table lock on the materialized view when updating it, but that's not a production quality solution. Solving that little detail is actually more work than the entire project you've outlined. Your suggested implementation--In function CloseIntoRel executor swap relfilenode's of temp table and original table and finally delete temp table--is where the full table lock is going to end up at. The exact use cases that need materialized views cannot handle a CLUSTER-style table recreation each time that needs an exclusive lock to switchover, so that whole part of your design is going to be a prototype that doesn't work at all like what needs to get built to make this feature committable. It's also not a reasonable assumption that you have enough disk space to hold a second copy of the MV in a production system. Once there's a good way to merge updates, how to efficiently generate them against the sort of large data sets that need materalized views--so you just write out the updates rather than a whole new copy--is itself a large project with a significant quantity of academic research to absorb before starting. Dan Colish at Portland State has been playing around with prototypes for the specific problem of finding a good algorithm for view refreshing that is compatible with PostgreSQL's execution model. He's already recognized the table lock issue here and for the moment is ignoring that part. I don't have a good feel yet for how long the targeted update code will take to mature, but based on what I do know I suspect that little detail is also a larger effort than the entire scope you're envisioning. There's a reason why the MIT Press compendium Materialized Views: Techniques, Implementations, and Applications is over 600 pages long--I hope you've already started digging through that material. Now, with all that said, that doesn't mean there's not a useful project for you buried in this mess. The first two steps in your plan: 1) create materialized view 2) change rewriter Include building a prototype grammer, doing an initial executor implementation, and getting some sort of rewriter working. That is potentially good groundwork to lay here. I would suggest that you completely drop your step 3: 3) create command that takes snapshot (refresh MV) Because you cannot built that in a way that will be useful (and by that I mean committable quality) until there's a better way to handle updates than writing a whole new table and grabbing a full
Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
2010/4/9 Greg Smith g...@2ndquadrant.com: The main hidden complexity in this particular project relates to handling view refreshes. The non-obvious problem is that when the view updates, you need something like a SQL MERGE to really handle that in a robust way that doesn't conflict with concurrent access to queries against the materialized view. And work on MERGE support is itself blocked behind the fact that PostgreSQL doesn't have a good way to lock access to a key value that doesn't exist yet--what other databases call key range locking. See the notes for Add SQL-standard MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo for more information. You can work around that to build a prototype by grabbing a full table lock on the materialized view when updating it, but that's not a production quality solution. Solving that little detail is actually more work than the Hmm... I am not sure you're right about this. It's not obvious to me that a brief full-table lock wouldn't be acceptable for an initial implementation. Obviously it wouldn't be suitable for every use case but since we're talking about manually refreshed views that was bound to be true anyway. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing hot standby
On Fri, Apr 9, 2010 at 3:39 PM, Jaime Casanova jcasa...@systemguards.com.ec wrote: Hi, i'm startint to try Hot Standby Streaming Replication, so i started a replication: i think make standbycheck needs a little more work, why it isn't accesible from top of source dir? For now, to excercise it i have to do (on the standby server): - cd src/test/regress - psql -h ip_primary -f sql/hs_primary_setup.sql - make standbycheck (and 2 tests fail for differences in the messages, patch to fix attached) just an idea, can't we use the info about primary_conninfo to know how to execute the script hs_primary_setup.sql on the primary? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 Index: src/test/regress/expected/hs_standby_disallowed.out === RCS file: /home/postgres/pg_repo/pgsql/src/test/regress/expected/hs_standby_disallowed.out,v retrieving revision 1.1 diff -c -r1.1 hs_standby_disallowed.out *** src/test/regress/expected/hs_standby_disallowed.out 19 Dec 2009 01:32:45 - 1.1 --- src/test/regress/expected/hs_standby_disallowed.out 10 Apr 2010 04:26:48 - *** *** 11,50 WARNING: there is no transaction in progress -- SELECT select * from hs1 FOR SHARE; ! ERROR: transaction is read-only select * from hs1 FOR UPDATE; ! ERROR: transaction is read-only -- DML BEGIN; insert into hs1 values (37); ! ERROR: transaction is read-only ROLLBACK; BEGIN; delete from hs1 where col1 = 1; ! ERROR: transaction is read-only ROLLBACK; BEGIN; update hs1 set col1 = NULL where col1 0; ! ERROR: transaction is read-only ROLLBACK; BEGIN; truncate hs3; ! ERROR: transaction is read-only ROLLBACK; -- DDL create temporary table hstemp1 (col1 integer); ! ERROR: transaction is read-only BEGIN; drop table hs2; ! ERROR: transaction is read-only ROLLBACK; BEGIN; create table hs4 (col1 integer); ! ERROR: transaction is read-only ROLLBACK; -- Sequences SELECT nextval('hsseq'); ! ERROR: cannot be executed during recovery -- Two-phase commit transaction stuff BEGIN; SELECT count(*) FROM hs1; --- 11,50 WARNING: there is no transaction in progress -- SELECT select * from hs1 FOR SHARE; ! ERROR: cannot execute SELECT FOR SHARE in a read-only transaction select * from hs1 FOR UPDATE; ! ERROR: cannot execute SELECT FOR UPDATE in a read-only transaction -- DML BEGIN; insert into hs1 values (37); ! ERROR: cannot execute INSERT in a read-only transaction ROLLBACK; BEGIN; delete from hs1 where col1 = 1; ! ERROR: cannot execute DELETE in a read-only transaction ROLLBACK; BEGIN; update hs1 set col1 = NULL where col1 0; ! ERROR: cannot execute UPDATE in a read-only transaction ROLLBACK; BEGIN; truncate hs3; ! ERROR: cannot execute TRUNCATE TABLE in a read-only transaction ROLLBACK; -- DDL create temporary table hstemp1 (col1 integer); ! ERROR: cannot execute CREATE TABLE in a read-only transaction BEGIN; drop table hs2; ! ERROR: cannot execute DROP TABLE in a read-only transaction ROLLBACK; BEGIN; create table hs4 (col1 integer); ! ERROR: cannot execute CREATE TABLE in a read-only transaction ROLLBACK; -- Sequences SELECT nextval('hsseq'); ! ERROR: cannot execute nextval() in a read-only transaction -- Two-phase commit transaction stuff BEGIN; SELECT count(*) FROM hs1; *** *** 54,60 (1 row) PREPARE TRANSACTION 'foobar'; ! ERROR: cannot be executed during recovery ROLLBACK; BEGIN; SELECT count(*) FROM hs1; --- 54,60 (1 row) PREPARE TRANSACTION 'foobar'; ! ERROR: cannot execute PREPARE TRANSACTION during recovery ROLLBACK; BEGIN; SELECT count(*) FROM hs1; *** *** 64,70 (1 row) COMMIT PREPARED 'foobar'; ! ERROR: cannot be executed during recovery ROLLBACK; BEGIN; SELECT count(*) FROM hs1; --- 64,70 (1 row) COMMIT PREPARED 'foobar'; ! ERROR: COMMIT PREPARED cannot run inside a transaction block ROLLBACK; BEGIN; SELECT count(*) FROM hs1; *** *** 74,80 (1 row) PREPARE TRANSACTION 'foobar'; ! ERROR: cannot be executed during recovery ROLLBACK PREPARED 'foobar'; ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK; --- 74,80 (1 row) PREPARE TRANSACTION 'foobar'; ! ERROR: cannot execute PREPARE TRANSACTION during recovery ROLLBACK PREPARED 'foobar'; ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK; *** *** 86,137 (1 row) ROLLBACK PREPARED 'foobar'; ! ERROR: cannot be executed during recovery ROLLBACK; -- Locks BEGIN; LOCK hs1; ! ERROR: cannot be executed during recovery COMMIT; BEGIN; LOCK hs1 IN SHARE UPDATE EXCLUSIVE MODE; ! ERROR: cannot be executed during recovery