Re: [Pljava-dev] [HACKERS] Re: Should creating a new base type require superuser status?
Kris Jurka wrote: Thomas Hallgren wrote: Kris Jurka wrote: 3) By value: pljava does not correctly handle passed by value types correctly, allowing access to random memory. This is simply not true. There's no way a Java developer can access random memory through PL/Java. No, the point is that the Java developer can provide some data which can convince postgresql to fetch random data for the user. Consider the attached type which is simply an int4 equivalent. Depending on how you define it as passed by value or passed by reference it will or will not work (attached). This looks like it works: jurka=# select '1'::intbyref, '2'::intbyval; intbyref | intbyval --+-- 1| 2 (1 row) But it doesn't really: jurka=# create table inttest (a intbyref, b intbyval); CREATE TABLE jurka=# insert into inttest values ('1', '2'); INSERT 0 1 jurka=# select * from inttest; a | b ---+ 1 | 2139062143 (1 row) It seems the pointer is confused for the actual value which means that writing the value back will corrupt the pointer. That's bad of course but I would classify this as a bug rather then a general security problem. PL/Java is designed to do handle all types securely and completely hide the concept of 'by value' or 'by reference' from the Java developer since such concepts are meaningless in Java. - thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Pljava-dev] [HACKERS] Re: Should creating a new base type require superuser status?
Kris Jurka wrote: 3) By value: pljava does not correctly handle passed by value types correctly, allowing access to random memory. This is simply not true. There's no way a Java developer can access random memory through PL/Java. - thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [Pljava-dev] Should creating a new base type require superuser status?
Tom Lane wrote: This is a non-issue in PL/Java. An integer parameter is never passed by reference and there's no way the PL/Java user can get direct access to backend memory. So what exactly does happen when the user deliberately specifies wrong typlen/typbyval/typalign info when creating a type based on PL/Java functions? Everything is converted into instances of Java classes such as String, byte[], etc. I think that assumption is without ground. Java doesn't permit you to access memory unless you use Java classes (java.nio stuff) that is explicitly designed to do that and you need native code to set such things up. A PL/Java user can not do that unless he is able to link in other shared objects or dll's to the backend process. PL/Java itself must be doing unsafe things in order to interface with PG at all. So what your argument really is is that you have managed to securely sandbox the user-written code you are calling. That might or might not be true, but I don't think that worrying about it is without foundation. I would be presumptuous to claim that I provide the sandbox. All PL/Java does is to provide the type mapping. The sandbox as such is implicit in Java, much in the same way that it does it for web-browsers etc. Regardless of that, I think there's some difference in expressing a worry that might or might not have a foundation versus claiming that there indeed must be a security hole a mile wide ;-) - thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [Pljava-dev] Should creating a new base type require superuser status?
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: Tom, could you please elaborate where you see a security hole? The problem that we've seen in the past shows up when the user lies in the CREATE TYPE command, specifying type representation properties that are different from what the underlying functions expect. In particular, if it's possible to pass a pass-by-value integer to a function that's expecting a pass-by-reference datum, you can misuse the function to access backend memory. This is a non-issue in PL/Java. An integer parameter is never passed by reference and there's no way the PL/Java user can get direct access to backend memory. I gather from looking at the example that Kris referenced that there's some interface code in between the SQL function call and the user's Java code, and that that interface code is itself looking at the declared properties of the SQL type to decide what to do. So to the extent that that code is (a) bulletproof against inconsistencies and (b) not subvertible by the PL/Java user, it might be that there's no hole in practice. But assumption (b) seems pretty fragile to me. I think that assumption is without ground. Java doesn't permit you to access memory unless you use Java classes (java.nio stuff) that is explicitly designed to do that and you need native code to set such things up. A PL/Java user can not do that unless he is able to link in other shared objects or dll's to the backend process. Based on that, I claim that your statement about a security hole a mile wide is incorrect. PL/Java is not subject to issues relating to misuse of backend memory. Regards, Thomas Hallgren -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [Pljava-dev] Should creating a new base type require superuser status?
It seems perfectly safe to me too for the reason that Kris mentions. Tom, could you please elaborate where you see a security hole? Regards, Thomas Hallgren Tom Lane wrote: Kris Jurka [EMAIL PROTECTED] writes: On Wed, 30 Jul 2008, Alvaro Herrera wrote: I do agree that creating base types should require a superuser though. It too seems dangerous just on principle, even if today there's no actual hole (that we already know of). pl/java already allows non-superusers to create functions returning cstring and base types built off of these functions. So in other words, if pl/java is installed we have a security hole a mile wide. regards, tom lane ___ Pljava-dev mailing list [EMAIL PROTECTED] http://pgfoundry.org/mailman/listinfo/pljava-dev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)
[EMAIL PROTECTED] wrote: On Tue, Sep 19, 2006 at 11:21:51PM -0400, Alvaro Herrera wrote: [EMAIL PROTECTED] wrote: On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote: On Mon, Sep 18, 2006 at 07:45:07PM -0400, [EMAIL PROTECTED] wrote: I would not use a 100% random number generator for a UUID value as was suggested. I prefer inserting the MAC address and the time, to at least allow me to control if a collision is possible. This is not easy to do using a few lines of C code. I'd rather have a UUID type in core with no generation routine, than no UUID type in core because the code is too complicated to maintain, or not portable enough. As others have mentioned, using MAC address doesn't remove the possibility of a collision. It does, as I control the MAC address. What happens if you have two postmaster running on the same machine? Could be bad things. :-) For the case of two postmaster processes, I assume you mean two different databases? If you never intend to merge the data between the two databases, the problem is irrelevant. There is a much greater chance that any UUID form is more unique, or can be guaranteed to be unique, within a single application instance, than across all application instances in existence. If you do intend to merge the data, you may have a problem. You may. But it's not very likely. Since a) there is a 13-bit random number in addition to the MAC address (the clock sequence) and b) the timestamp has a granularity of 100 nanosec. An implementation could be made to prevent clock-sequence collisions on the same machine and thereby avoid this altogether. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] TODO item: GUID
[EMAIL PROTECTED] wrote: ... I *like* sorting by time, as it allows the UUID to be used similar to sequence, leaving older, lesser accessed UUIDs in the past. and don't forget, an automatic timestamp of when a record is created might be useful for other purposes. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UUID/GUID discussion leading to request for hexstring bytea?
Gevik Babakhani wrote: To my opinion GUIDs type need to provide the following in the database. 1. GUID type must accept the correct string format(s), with of without extra '-' 2. GUID type must internally be stored as small as possible. 3. GUID type must be comparable with == , != , LIKE and (NOT) IS NULL 4. GUID type must have the ability to be indexed, grouped, ordered, DISTINCT... but not MAX(), MIN() or SUM() Where do you see a need for LIKE on a GUID? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UUID/GUID discussion leading to request for hexstring bytea?
Gevik Babakhani wrote: LIKE could come handy if someone wants to abuse the uuid datatype to store MD5 hash values. However I am not going to implement it if there is no need for that (assuming it will pass the acceptance test) Perhaps providing LIKE just to encourage abuse is not such a good idea? IMHO, a GUID should be comparable for equality and NULL only, not LIKE. I also think that ordering is feasible only when looking at parts of the GUID, i.e. order by the result of a function that extracts a timestamp or a node-address. Magnitude comparison on the GUID as a whole makes no sense to me. Regards, Thomas Hallgren On Mon, 2006-09-18 at 10:06 +0200, Thomas Hallgren wrote: Gevik Babakhani wrote: To my opinion GUIDs type need to provide the following in the database. 1. GUID type must accept the correct string format(s), with of without extra '-' 2. GUID type must internally be stored as small as possible. 3. GUID type must be comparable with == , != , LIKE and (NOT) IS NULL 4. GUID type must have the ability to be indexed, grouped, ordered, DISTINCT... but not MAX(), MIN() or SUM() Where do you see a need for LIKE on a GUID? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] UUID/GUID discussion leading to request for hexstring bytea?
[EMAIL PROTECTED] wrote: On Mon, Sep 18, 2006 at 11:12:54AM +0200, Gevik Babakhani wrote: Magnitude comparison on the GUID as a whole makes no sense to me. I agree. Any kind of comparison except equality has no meaning for the GUID. (And this is discussed before) I rather have the option to sort and group for the sake of consistency and compatibility. Thomas: The ability to sort / comparison is required for use with B-Tree index. I prefer a fast comparison over one with more meaning. memcmp() is fine with me and it is how I implement it in my UUID PostgreSQL library. Fair enough. Although the magnitudes as such makes little sense, the ability to order will make it possible to compare results from different queries etc. Very difficult to do with random order. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Proposal for GUID datatype
Jan de Visser wrote: On Friday 08 September 2006 15:18, Gevik Babakhani wrote: 2a) Three input formats are supported. example: insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); Please extend your list. java's RMI package generates UIDs with the following format: [head order 21:19]$ bsh.sh BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED]) bsh % print(new java.rmi.server.UID()); 3b732da7:10d9029b3eb:-8000 bsh % So forms that use colons instead of dashes seem appropriate. Or better still, make it configurable. jan RMI UID's has nothing in common with UUID's so I fail to see how they have any relevance here. I've never seen any other representation of the UUID's that the two that Gevik wants to support. Please note that UUID is a very well known concept and not an arbitrary 128 bit storage. http://en.wikipedia.org/wiki/UUID is a good source of information. The appointed RFC actually contains source code. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GUC with units, details
Simon Riggs wrote: don't ever need to say that K = 1000, AFAICS. I think we are safe to assume that kB = KB = kb = Kb = 1024 bytes mB = MB = mb = Mb = 1024 * 1024 bytes gB = GB = gb = Gb = 1024 * 1024 * 1024 bytes There's no value in forcing the use of specific case and it will be just confusing for people. It's fairly common to use 'b' for 'bits' and 'B' for 'bytes'. My suggestion would be to be much more restrictive and avoid small caps: KB = 1024 bytes MB = 1024 KB GB = 1024 KB TB = 1024 GB Although I don't expect to see bit-rates or fractions ('m' == 'milli') in GUC, it might be good to use consistent units everywhere. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] plPHP and plRuby
Marc G. Fournier wrote: Actually it would be nice to have the not-included PLs present in src/pl/ as their own directories with a README.TXT containing fetch and build instructions So we would have src/pl/plphp/README.TXT src/pl/pljava/README.TXT src/pl/plj/README.TXT and anybody looking for pl-s would find the info in a logical place *That* idea I like ... ISTM that a clear strategy for how to deal with core, contrib, add-ons, etc. is long overdue and that's the reason why these discussions pop up over and over again. The question What are the criterion's for core inclusion? has not yet been answered. I though PL/Java fulfilled those criterion's but a new threshold for the #lines of code and a concern for code in unmaintainable language made it impossible. The result of an unclear strategy can be perceived as somewhat unjust. There seem to be a very unanimous consensus that PL/pgsql belongs in core. Large object support, free text search and some others also receive support by everyone. These add-ons clearly belong where they are. The historical reasons to continuously include others are, IMHO, not so obvious and the result undoubtedly creates first- and second class citizens in the module flora. The split doesn't correlate very well with feature richness or popularity. I have a suggestion that might help clearing things up a bit :-) A couple of specialized teams need to be established (or rather, formalized since they already exists to some extent) that can be thought of as core subsidiary's. The idea is that such a team would take on the maintenance of one specialized area of PostgreSQL. Java, for instance, is such an area. PostgreSQL has a huge number of Java users. They all use the JDBC driver and a few use PL/Java. There's been talk about Eclipse tool support and some will have an interest in XA-compliance in order to gain JTA support, etc. Today, it's scattered all over the place. Other subsidiary teams should be formed around odbc (or .net perhaps), php, ruby, replication/clustering, etc. to take control over those areas. A very important part of my suggestion is that for the normal user, it would appear that what a core subsidiary team contribute really is *part of* the database proper and not something maintained by a third-party contributor or commercial vendor. The team would maintain their own website (although all layout would be centralized), source code control system, mailing list etc. but they would share a lot more of the PostgreSQL infrastructure then what is shared today. Important things would be: - Documentation. Inclusion of a subsidiary module should mean that some chapters are added (automatically) to the user manual. - Build farm support. - Packaging and downloads - Server infrastructure - Seamless navigation from the PostgreSQL main web-site. PgFoundry would live on like today, targeted on third-party modules and serving as an incubator for modules that aim to be included in core or into one of its subsidiaries. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] monolithic distro
Andrew Dunstan wrote: The topic here is NOT what features are missing from postgres. Of course it is ;-) Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Three weeks left until feature freeze
[EMAIL PROTECTED] wrote: On the subject of 38K lines of code, much that isn't C (going by memory, I apologize if this is wrong), how many of these lines could be/should be shared between PL/Java and PL/J? It seems to me that the general concepts should be in common, and that it is only how the Java interfaces with the backend that changes. Could they not be one PL, with two mechanisms for speaking to the backend? By all means. An embedded JVM solution should share as much as possible with one that uses a remote JVM. From the users perspective there should be no difference at all. PL/Java is designed with this in mind. The class loader and the utility commands are based on JDBC, the security manager that enables the choice of trusted/untrusted execution is Java standard. A set of interfaces for non-standard access (PostgreSQL TriggerData in particular) was abstracted in order to allow different implementations. Etc. That said, there is also code that deals with tight backend integration and is highly specialized to fit the embedded solution. This code is designed around the fact that function calls to the backend are very cheap. As an example, PL/Java contains a JDBC driver that is written directly on top of the SPI API. The involved C-structures are rarely copied or streamed. They are accessed directly using JNI functions. I've spent some time lately, investigating what it would take to complement PL/Java with a remote JVM option. The major challenge lays in the impedance mismatch caused by concerns that one must consider when using RPC (limit the number of calls) compared to the current design (avoid copying and streaming). Kind regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Three weeks left until feature freeze
Joshua D. Drake wrote: JDBC is different, in that it doesn't require the PostgreSQL core to build. It's 100% native Java, and as such, I see benefit to it being distributed separately. PLJava does not need PostgreSQL core to build either. It needs: pgxs + Postgresql libs + PostgreSQL headers In essence the PostgreSQL SDK. If I read what Thomas wrote (late) last night correctly. You did. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Three weeks left until feature freeze
Jonah H. Harris wrote: But, I can't find anything there to download ... just a pointer to a Wiki, which, I'm sorry, would definitely not be my first thought to go look at for a downloads ... Hmm, yes... just saw that and it is a bit odd. Thomas, I like the layout of the Wiki... but could we move the project files to pgfoundry for hosting and set the project's home page as the wiki? Yes, that sounds reasonable. I'll look into that. What I really would like is to move the whole project (aside from the Wiki) from gborg to pgfoundry. Kind regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Three weeks left until feature freeze
Marc G. Fournier wrote: So, let's try ftp ... ftp.postgresql.org:/pub/projects/gborg/pljava/stable: Nothing there newer then November 2005: ftp ls -lt 227 Entering Passive Mode (66,98,251,159,248,251) 150 Opening ASCII mode data connection for /bin/ls. total 23026 -rw-r--r-- 1 80 1009 206134 Nov 20 2005 pljava-src-1.2.0.tar.gz -rw-r--r-- 1 80 1009 522895 Nov 20 2005 pljava-i686-pc-mingw32-pg8.1-1.2.0.tar.gz -rw-r--r-- 1 80 1009 522955 Nov 20 2005 pljava-i686-pc-mingw32-pg8.0-1.2.0.tar.gz -rw-r--r-- 1 80 1009 421717 Nov 20 2005 pljava-i686-pc-linux-gnu-pg8.1-1.2.0.tar.gz -rw-r--r-- 1 80 1009 421999 Nov 20 2005 pljava-i686-pc-linux-gnu-pg8.0-1.2.0.tar.gz so, if there is a newer version (I actually eventually went to the wiki, so know there is a 1.3.0), its not taking advantage of the PostgreSQL file distribution network that has been developed over the years ... How would I go about taking advantage of that? And who did the 1.2.0 upload? I certainly didn't. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Three weeks left until feature freeze
Marc G. Fournier wrote: How would I go about taking advantage of that? And who did the 1.2.0 upload? I certainly didn't. There is alot more then then just 1.2.0 ... check out the FTP site ... As for taking advantage of that ... upload files to the file section in *either* gborg or pgfoundry, and they get auto-included as part of the ftp network ... The PL/Java 1.3.0 release has been on gborg for several weeks but only available through the wiki (the gborg generated 'download' page is messy and I'm not able to remove stuff that shouldn't be there). I guess that's why it was not mirrored. And yes, I agree wholeheartedly, a wiki is not the most intuitive place for downloads. Per Jonahs suggestion I've just uploaded everything to pgfoundry too. Thanks for uploading the 1.3.0 to the ftp. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Three weeks left until feature freeze
Marc G. Fournier wrote: I'm confused here ... has been on gborg for several weeks, but only available through the wiki ... On: http://gborg.postgresql.org/project/pljava/projdisplay.php ... I can't find any way of downloading 1.3.0 (or, older releases even) ... have you been uploading, but nobody activated teh Files section to download? :( Yes, I've been uploading to gborg and the links provided on the wiki appoints those files. I don't *want* to activate the downloads section since it exposes a page with a lot of files that I doesn't belong there. Unfortunately, there's no way to remove them. The Files section on pgfoundry looks a lot better :-) Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Three weeks left until feature freeze
Marc G. Fournier wrote: ... the only reason 'NetBSD doesn't offer pl/java now' is because nobody a) is using it under NetBSD or b) submitted a port to their system Should be fairly straight forward if the PostgreSQL SDK and gcj 4.0 or later is installed. Download the PL/Java source tarball, make sure pg_ctl is in your path and type 'make USE_GCJ=1 release' Alternatively, set JAVA_HOME to appoint some other JRE and just type 'make release' Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Three weeks left until feature freeze
Marc G. Fournier wrote: But Thomas, that means finding someone willing to do the work to build the port ... :) PL/java should be very easy to port. In fact, I'm not sure any specific porting is needed. There might be some minor makefile quirk (that is what has bitten me on other platforms). I don't have access to a FreeBSD machine so I can't try it. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Three weeks left until feature freeze
Hi Dave, Sorry I missed you at the Summit. I would've liked to discuss PL/J versus PL/Java with you. What is the status of PL/J? I haven't seen much activity there over the last 10 months. Does it run on Windows yet? Are you planning a first release anytime soon? Do you have any active users? Does the project still have over 40 dependencies to other components? The last time I looked (August last year) a beta-0.1.1 was planned. I didn't manage to built it and it didn't seem anywhere close production readiness. Perhaps it's no surprise that I disagree when you say PL/J could be considered in the same light as PL/Java. Then again, I'm fairly biased ;-) Regards, Thomas Hallgren Dave Cramer wrote: Absolutely PL/J should be considered in the same light as PL/Java. Consider this a request for PL/J to be included in the core. Dave On 11-Jul-06, at 12:50 PM, Josh Berkus wrote: David, It's good to integrate things with the core as needed. What plans do we have to integrate PL/J? None, if the PL/J team doesn't speak up. So far I have yet to see a request for PL/J or even a release notice. --Josh ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Three weeks left until feature freeze
Andrew Dunstan wrote: There is in effect no API at all, other than what is available to all backend modules. If someone wants to create an API which will be both sufficiently stable and sufficiently complete to meet the needs of the various PLs (especially, as Hannu rightly observes, any new PLs that come along) then we can revisit this question. Until then I suggest that it is at best premature. I am not even sure such a thing is actually possible. I concur with this. The needs for a module like PL/Java is very different then the needs of PL/Perl so let's get some more PL's in before we do a refactoring effort to create common API's. Personally, I'm not sure what would be included. The call handler API's together with the SPI API's are in essence what you need. The rest is fairly specialized anyway. Also there is this: speaking as someone who actually does some work in this area, I very much appreciate having the eagle eyes of people like Tom, Neil and Joe on what's going on, and keeping things on the straight and narrow. I at least would feel lots less comfortable about maintaining things without such help. This is partly why I'd like to get PL/Java included. Not that I expect any of them to devote resources to PL/Java but I think that they, from time to time, will visit the code. If not for anything else then to see why some other change caused build failures. It's always easier to have discussions around code that you know they all have on disk. The Postgres hacker community is small. I am not sure there is an adequate pool of people who will maintain the momentum of each sub-project that we might choose to orphan. If we had thousands of eager code cutters it might be different, but we don't, really. As the project grows for various reasons, the number of hackers in the community will grow as well. PL/Java for instance, does not come without resources :-) Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Three weeks left until feature freeze
Joshua D. Drake wrote: Well I know it isn't an API per say, but one interesting tid bit as an example is that PLphp does not need the PostgreSQL source to compile. It only needs pgxs and the relevant headers etc... Perhaps that is one way to go... All PLs use pgxs? PL/Java does. No source needed. So yes, there's already a fairly good API that assists in the module build process. It does however still include all header files needed by the backend and thus, leaves the backend wide open (in a matter of speech). If a refactoring effort was to start later on, that would be a good place to start. I.e. divide headers into the ones available for external modules and the ones for internal backend use only. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Three weeks left until feature freeze
Dave Cramer wrote: I expect to see a new release shortly. Dave, I tried to obtain the source but whenever I try I get: [thhal]$ cvs -d :pserver:[EMAIL PROTECTED]:/home/projects/plj/scm login Logging in to :pserver:[EMAIL PROTECTED]:2401/home/projects/plj/scm CVS password: /home/projects/plj/scm: no such repository I can browse the source using the web interface though. Judging from that, there's been no CVS activity since I last tried, i.e. august last year. Is the source being maintained somewhere else? How do I obtain the latest CVS? Judging from your statement a lot must have happened that would be interesting to look at. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Three weeks left until feature freeze
Tom Lane wrote: ... equal claim to inclusion in core. Perhaps more, because it gives us an extra layer of insulation from JVM licensing questions. Tom, Why to you persist talking about licensing issues with PL/Java? There are none. PL/Java builds and runs just fine with gcj and the above statement is completely false. Dave, What JVM requirements does PL/J currently have? What license implications are imposed by the components that it depends upon? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Three weeks left until feature freeze
Josh Berkus wrote: Perhaps it's no surprise that I disagree when you say PL/J could be considered in the same light as PL/Java. Then again, I'm fairly biased ;-) This attitude does you no credit, Thomas. My diplomatic skills are somewhat limited :-) I might be blunt at times. I'm sure there are other more subtle ways to get the message through. I'm trying to be honest and up-front. IMO, that should count for something. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Three weeks left until feature freeze
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: Why to you persist talking about licensing issues with PL/Java? There are none. PL/Java builds and runs just fine with gcj and the above statement is completely false. Um ... if you use it with gcj, there may or may not be any licensing problems (please recall we are trying to be a BSD-only project, not a BSD-and-LGPL project), You have no problems using gcc, gnu-make, etc. What's the difference? but what of people who use some other JVM? It's not like gcj works for everyone yet. What of them? If they decide to use another JVM, well, then let them. I don't see where that becomes a licensing problem from PostgreSQL. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Three weeks left until feature freeze
Joshua D. Drake wrote: What happens when the FSF inevitably removes the license clause and makes it pure GPL? I'm sorry but I don't follow. You're saying that it's inevitable that FSF will remove the 'libgcc' exception from libgcj? Why on earth would they do that? My guess is that it will go the other way (i.e. LGPL). What's the logic in having different licenses on libg++ and libgcj? Now all of this being said, I doubt there is actually an issue here because: It doesn't HAVE TO BE BUILT, it is not a derivative product. Well, assume that FSF indeed did remove the exception. It would take me 30 minutes or so to create a substitute BSD licensed dummy JNI library with associated headers that would allow PL/Java to be built without any external modules at all. It's then completely up to the user what he/she wants to slot in as a replacement. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Three weeks left until feature freeze
Josh Berkus wrote: Thomas, I'm starting to have second thoughts about this suggestion. I was enthusiastic about it at the summit, but I was unaware of the sheer size of PL/Java. 38,000 lines of code is 8% of the total size of Postgresql ... for *one* PL. Dave Cramer acquainted me with some of the difficulties of doing a Java PL today, and I understand why it needs to be that large. However, 38,000 lines of code -- much of it in a non-C language -- presents a possible debugging/maintenance major headache, especially if you someday left the project for some reason. OK. You're the one that suggested this submission attempt. There's not much point in pursuing it if you have second thoughts. Maybe we do need to look at a plug-in build tool, instead. Right, something that would allow PL/Java to participate in a build farm. That would be cool and also resolve a some of the issues. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Three weeks left until feature freeze
Joshua D. Drake wrote: Well, assume that FSF indeed did remove the exception. It would take me 30 minutes or so to create a substitute BSD licensed dummy JNI library with associated headers that would allow PL/Java to be built without any external modules at all. It's then completely up to the user what he/she wants to slot in as a replacement. Do we want to do that? I mean (and I am not saying it is, I am asking) is that a bit grey? I would prefer it be black and white. The JNI API is an open standard so I have every right to create a BSD licensed dummy for it. The user may choose a JVM from IBM, Sun, BEA, or other (like GCJ) to run. That's the essence of having a standardized API. What can possibly be 'grey' about that? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Three weeks left until feature freeze
I'd like to submit PL/Java into core for 8.2 if possible. Personally, I see the following action items to make it happen: 1. A hackers discussion to resolve any issues with the submission. Provided that #1 has a positive outcome: 2. The PL/Java CVS must be moved from gborg and become part of the PostgreSQL CVS (can this be done with version history intact?). 3. The regression tests need some work in order to fit in with the build farm. 4. Documentation must be ripped from the PL/Java Wiki and transformed into the format used by PostgreSQL. 5. I'll need committer rights to the PL/Java part in order to maintain it. 6. The pljava-dev mailing list, currently at gborg, must (perhaps) be moved also. An alternative is to remove it and instead refer to jdbc, general, and hackers. Given guidance, I'll do the steps #3 and #4. External dependencies: Platforms where PL/Java is ported must either support GCJ 4.0 or higher or have a Java Runtime Environment 1.4.2 or higher installed. Regards, Thomas Hallgren Bruce Momjian wrote: There are roughly three weeks left until the feature freeze on August 1. If people are working on items, they should be announced before August 1, and the patches submitted by August 1. If the patch is large, it should be discussed now and an intermediate patch posted to the lists soon. FYI, we don't have many major features ready for 8.2. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Three weeks left until feature freeze
Josh Berkus wrote: Tom, What about licensing issues? Does PL/Java work with any entirely-open-source JVMs? If not, what is the legal situation for distributing PG+PL/Java? Actually, Sun has re-licensed the JRE to make it OSS-compatible (it's now available for Debian, for example) They're doing a Java licensing session at OSCON if you have any specific questions, or I can ping the Java Licensing Guru directly. But even if other JRE's aren't supported, licensing shouldn't be an obstacle. I don't see any license issue at all regardless. PL/Java is satisfied with GCJ 4.0 or higher and compiling with that doesn't affect the binary more then using gcc does. No JVM is required when using GCJ. I'm also a bit concerned about size. By my count, lines of source code: plpgsql19890 plperl4902 plpython4163 pltcl4498 pljava 1.3.038711 IOW pljava is (already) bigger than the other four PLs put together. That is odd. Thomas? It's not that odd really: 1. the mapping is strongly typed, i.e. each scalar type in PostgreSQL has a set of functions that maps it to the correct primitive in Java (int4 is a java int, double precision is a double etc.). PL/Java will resort to string coercion only when no other option is left. 2. a type mapping is provided for *all* types. Scalar, composite, pseudo, array types, and result sets. 3. new Java mappings can be created on the fly. Both for scalar and composite types. 4. you can create new scalar types in PostgreSQL that uses IO functions written in Java. 5. the Java code contains it's own API documentation (standard java-doc comments on classes and methods). 6. the code is written to conform to standard interfaces such as the JDBC interfaces (from a #lines perspective, perhaps not always the most optimal way of doing it but it does bring a bunch of other advantages). 7. extensive error handling is included that allow try/catch semantics when checkpoints are used. 8. extreme measures has been taken to ensure that the backend is never exposed to more then one thread at a time. ... (from the top of my head, there are probably more reasons) IMHO, this is yet another reason to actually include it in core. I'm not an expert on the other PL's but my guess is that PL/Java is far more sensitive to API changes in the backend core. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Three weeks left until feature freeze
Hi Hannu, Hannu Krosing wrote: Maybe this functionality could be lifted out of PL/Java and made available to all PL-s ? At least at some API level. I think that what could be shared are the ideas and the semantics. The API's that the backend currently expose will give you what's needed to do the specialized implementations. The actual PL mappings are all different because their respective executor is different. Some languages are typed, others are not. Some languages support classes and objects, others don't. Other, more esoteric details like the use of a garbage collector or traditional alloc/free semantics also affects the actual implementation. I'm afraid there's not much in the PL/Java type system that could be generalized and shared. Perhaps if we had other languages with very similar capabilities (like C# for instance) but even then I have some doubts. The good news in my opinion is that if PL/Java would make it to the core it could make a good reference implementation for other equally advanced language mappings. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] UUID's as primary keys
Greg Stark wrote: In answer to your question, though my opinion carries no special weight at all, I would suggest adding a bare bones 16-byte data type to core and a second binary-compatible data type based on it that parsed/output as uuids. The extended uuid libraries should only go in pgfoundry/contrib. I second that. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
Josh Berkus wrote: Jim, I agree about splitting the utilities, except that I think the database should be able to generate UUIDs somehow. There is a GUID add-in, and someone is working on a 2nd one. UUIDs are not part of the SQL standard, and we've only seen sporadic demand for them (and different types each time) so I can't imagine one making it further than contrib real soon. Also, one could argue that UUIDs are a foot gun, so they're not exactly the type of thing we want to advocate in advance of demand. Martijn van Oosterhout wrote: It seems to me that maybe the backend should include a 16-byte fixed length object (after all, we've got 1, 2, 4 and 8 bytes already) and then people can use that to build whatever they like, using domains, for example... So how about the split? I.e. just add a 16 byte data type and forget all about UUID's for now. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
[EMAIL PROTECTED] wrote: On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote: On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote: Personally I don't buy the misuse objection - we already have plenty of things that can be misused. As long as there is a reasonable valid use and we can make it portable enough, I think there is a good case for including it. Well, since Mark has one, how about we consider adding it in? If nothing else, can you please put your stuff on pgFoundry so others can find it, Mark? It was written by Nathan Wagner [EMAIL PROTECTED] and myself, and is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. I'm not an expert on the license, but it seems acceptable to me: Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. I haven't tested to see how portable the OSSP UUID implementation is. This is their words: OSSP uuid was already written with maximum portability in mind, so there should be no great effort required to get it running on any Unix platform with a reasonable POSIX API. Additionally, the portability was tested by successfully building and running it on the following particular Unix platforms (syntax is cpu-os (compiler)): alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc) ia64-hpux11.23 (cc) ix86-debian2.2 (gcc, icc) ix86-debian3.0 (gcc) ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2 (gcc, icc) ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3 (cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc) I've put it through a fair amount of testing, including using it within compound indexes, expecting the index to be used for at least '=', constructing many UUIDs quickly, in a sequence, and converting it to and from string form. We chose to implement our own encode / decode routines for performance reasons. With the exception of testing it on a wider range of platforms, I would call the module stable. If there is interest - I'm sure Nathan and I would be willing to put it on pgfoundry, and at some point give it up for inclusion into PostgreSQL. One requirement would be that it runs on Windows. Is that something you have tested? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
[EMAIL PROTECTED] wrote: On Thu, Jun 29, 2006 at 02:02:32AM -0400, [EMAIL PROTECTED] wrote: It was written by Nathan Wagner [EMAIL PROTECTED] and myself, and is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. I'm not an expert on the license, but it seems acceptable to me: ... If there is interest - I'm sure Nathan and I would be willing to put it on pgfoundry, and at some point give it up for inclusion into PostgreSQL. This might require a little bit of research. It appears that the development version of OSSP UUID may provide its own PostgreSQL 'bindings'. I may try and contact the author of the OSSP UUID and see whether any changes we have that he does not, can be rolled into his version... Cheers, mark I'm thinking ahead on possible objections to inclusion in core. One objection might be that a fully blown UUID implementation is a lot of code. Code that needs to be maintained and it increases the size of the binary etc. A solution to that might be to break the whole thing up in two: 1 The actual type A plain scalar type that stores 16 bytes. It's complete with standard operators for comparison (natural order) and the text representation would be a 32 character hexadecimal string. This type should make no interpretation whatsoever on what it stores and its only association with UUID's is the storage size. 2 UUID utilities Various ways of representing, generating, and extract partial information from UUID's. Should have support for variants #0, #1, and #2 (the OSSP based code sounds like a good candidate). The split make sense since clients often have powerful UUID utilities handy and hence have limited or no use for such utilities in the database (true for all .NET and Java clients). Some PL's will also enable such packages out of the box. The actual type would be extremely generic, lightweight, and easy to implement. No portability issues whatsoever. The only difficulty that I see is naming it :-). Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] UUID's as primary keys
Greg Stark wrote: Martijn van Oosterhout kleptog@svana.org writes: To be honest, it seems like a lot of work to save the four bytes of overhead for the varlena structure on disk if you're going to need it in memory anyway. And anything like RAW(16) which people want for UUIDs, if it's going to have a lot of functions associated with it, may as well just be a new type. For large databases storage density leads directly to speed. Saving four bytes of overhead on a 16-byte data structure would mean a 20% speed increase. Even if that's only helpful on a tenth of the columns you're still talking about a 2% speed increase for all queries on the table. A lot of databases use CHAR(1) for flags. The overhead is even worse there. I have to concur with this. Assume you use a bytea for a UUID that in turn is used as a primary key. The extra overhead will be reflected in all indexes, all foreign keys, etc. In a normalized database some tables may consist of UUID columns only. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] UUID's as primary keys
Martijn van Oosterhout wrote: On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: I have to concur with this. Assume you use a bytea for a UUID that in turn is used as a primary key. The extra overhead will be reflected in all indexes, all foreign keys, etc. In a normalized database some tables may consist of UUID columns only. So you create a UUID type. It's cheap enough to create new types after all, that's one of postgresql's strengths. It would be a whole lot easier if I could use a domain. What I'm saying is that it's easier to create new fixed length types for the cases that need it, than it is to redo the entire type handling of the backend. Of course. But it's a matter of who does what. Your reasoning push the burden to the users. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as primary keys
Martijn van Oosterhout wrote: On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote: A user that is trusted with installing a C-function in the backend is free to scan the process memory anyway so in what way did that increase the security? IMHO, the only relevant security in that context is to have trusted people install trusted modules. I'm surprised that something like that made you remove significant functionality. You're missing the point. The type output function is not generally a priveledged function. Think bpcharout, text_out, numeric_out, etc... These can be called by users directly and the input to those functions cannot be trusted. Ah, OK that makes sense. An alternative solution when the signature was changed could perhaps have been to pass one single argument, a structure appointing the data and its associated type. My idea would work if the data and its type lived together always from the moment its instantiated (read from disk or otherwise) and until death do them apart (or the data is stored on disk, in which case the tupledesc knows what it is). I guess that would imply a major rewrite and that my desire to have a RAW fixed length type isn't enough motivation to do that :-) Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128 bit datatype) in the core package. It's increasingly common and some databases (MS SQLServer) already have built in support for it. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] refcount leak warnings
I have a PL/Java user that performs some lengthy operations. Eventually, he get warnings like: WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, flags=0x27, refcount=1 2) I traced this to the function PrintBufferLeakWarning. AFAICS, it's only called from the function ResourceOwnerReleaseInternal under the following comment: * During a commit, there shouldn't be any remaining pins --- that * would indicate failure to clean up the executor correctly --- so * issue warnings.In the abort case, just clean up quietly. I have no idea where to go from here. What should I look for when trying to find the cause of such warnings? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject SetString]
There's an inconsistency between the handling of trailing whitespace in query parameters in the client jdbc driver compared to the PL/Java SPI based driver. According to Jean-Pierre, the former apparently trims the trailing spaces before passing the query (see below). What is the correct behavior? Regards, Thomas Hallgren Original Message Subject: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject SetString Date: Tue, 27 Jun 2006 12:07:19 -0400 From: JEAN-PIERRE PELLETIER [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Hi Thomas, There are very few char columns on my system, I can easily live with explicit trim in my application code, I only wanted to let you guys know. I am not sure which of the two JDBC implementations is right. psql and pgadmin would both handle char as expected. As for JDBC, you might want to know how other dbms are handling this. Thanks for your reply. Jean-Pierre Pelletier From: Thomas Hallgren [EMAIL PROTECTED] To: JEAN-PIERRE PELLETIER [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject SetString Date: Tue, 27 Jun 2006 17:47:24 +0200 Hi Jean-Pierre, I'm not sure this is incorrect behavior. There's nothing in the spec that indicates that String values should be trimmed by setString and setObject. On the contrary. Some datatypes (the CHAR in particular) are sensitive to whitespace according to the SQL standard. Perhaps the client jdbc driver is doing something wrong here? Regards, Thomas Hallgren JEAN-PIERRE PELLETIER wrote: Hi, Trailing space are not handled properly by setObject setString. PreparedStatement pstmt = connection.prepareStatement( select * from mytable where mycharcolumn = ?); String myString = abc ; pstmt.setObject(1, myString); // or setObject(1, myString, Types.CHAR) or setString(1, myString) No rows are returned, but using trim works fine as in: pstmt.setObject(1, myString.trim()); My environment is Pl/Java 1.3, Sun JDK 1.5.07, PostgreSQL 8.1.4, Windows XP SP2 With PostgreSQL own (non pl/java) jdbc driver, setObject on char works fine without the trim. Thanks, Jean-Pierre Pelletier ___ Pljava-dev mailing list [EMAIL PROTECTED] http://gborg.postgresql.org/mailman/listinfo/pljava-dev ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject
Sorry, wrong list... I reposted this on pgsql-jdbc instead. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Shared library conflicts
There was a discussion some time back concerning the linking of the postgres backend. Some libraries where linked although they where not needed, mainly because it was convenient. I had a problem with PL/Java since a Sun JVM ships with their own version of libz.so (they call it libzip.so). Sun fixed the problem, simply by adding a namespace to all functions in their own library so I didn't stress the issue any further. Now this bites me again. Sun forgot to remove the old functions in their libzip.so on the amd64 platform. I have of course reported this to Sun and I expect it to be fixed in future releases of Java but shouldn't this be fixed for PostgreSQL too? Perhaps we should consider this a more generic problem. The more libraries that are linked with the backend, the greater the risk that add-on modules will run into conflicts. What effort would be involved to fix this once and for all? Another related question. What happens when I use --without-zlib? Does it have any effect on besides disabling compression for the dump/restore utilities? Is there anyway to make it affect the backend only? Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Shared library conflicts
That's good news. Any chance of getting this fix backported to 8.1? Or at least, the libz part of it? Regards, Thomas Hallgren Martijn van Oosterhout wrote: On Fri, Jun 23, 2006 at 10:45:29AM +0200, Thomas Hallgren wrote: There was a discussion some time back concerning the linking of the postgres backend. Some libraries where linked although they where not needed, mainly because it was convenient. AIUI, this was fixed in -HEAD. In the Makefile there's a filter command to strip libraries not needed. One of them is libz. From src/backend/Makefile: # The backend doesn't need everything that's in LIBS, however LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS)) Have a nice day, ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Unable to initdb using HEAD on Windows XP
I just compiled a fresh copy from CVS head. I then tried to do an initdb as user 'postgres' (non admin user on my system). I get the following error: C:\Tada\Workspaceinitdb -D data The files belonging to this database system will be owned by user postgres. This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. creating directory data ... ok creating subdirectories ... initdb: could not create directory data/global: Permission denied initdb: removing data directory data could not open directory data: No such file or directory initdb: failed to remove data directory AFAICS, no data directory is ever created so the 'creating directory data ... ok' message is probably incorrect. I even tried to change the permissions on the parent directory so that user 'postgres' has full control. It doesn't help. I didn't think it would since I am able to create a database in this directory if I'm using version 8.1.4. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Unable to initdb using HEAD on Windows XP
Some more info. If I manually create the data directory first, the output is different: C:\Tada\Workspacemkdir data C:\Tada\Workspaceinitdb -D data The files belonging to this database system will be owned by user postgres. This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. fixing permissions on existing directory data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 4000/20 creating configuration files ... ok creating template1 database in data/base/1 ... ok initializing pg_authid ... child process was terminated by signal 5 initdb: removing contents of data directory data Thomas Hallgren wrote: I just compiled a fresh copy from CVS head. I then tried to do an initdb as user 'postgres' (non admin user on my system). I get the following error: C:\Tada\Workspaceinitdb -D data The files belonging to this database system will be owned by user postgres. This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. creating directory data ... ok creating subdirectories ... initdb: could not create directory data/global: Permission denied initdb: removing data directory data could not open directory data: No such file or directory initdb: failed to remove data directory AFAICS, no data directory is ever created so the 'creating directory data ... ok' message is probably incorrect. I even tried to change the permissions on the parent directory so that user 'postgres' has full control. It doesn't help. I didn't think it would since I am able to create a database in this directory if I'm using version 8.1.4. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Unable to initdb using HEAD on Windows XP
Never mind. I scrubbed my folders and obtained a new fresh copy from CVS. Now it works. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] PG_MODULE_MAGIC
A module magic patch was added recently and I'm a bit uncertain what the implications are for the external PL modules. Does it affect them at all? Will I need to provide separate binaries for each bug fix release even though the API's do not change? Exactly how is the magic determined? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PG_MODULE_MAGIC
Tom Lane wrote: No, each major release (8.2, 8.3, etc). There are hardly ever any major releases where you wouldn't need a new compilation anyway ... True. I'm all in favor of a magic used this way. It will save me some grief. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal for debugging of server-side stored procedures
Mark Cave-Ayland wrote: I think that Java is quite unusual in that the design of JPDA is inherently client/server based to the point where they have defined the platform to allow you interact with the JVM via a socket. Unfortunately the same can't be said for Perl/Python - as you suggest passing the parameters into the interpreter is a trivial exercise but because the debugging classes in both languages are console interactive, the only thing you could do is redirect the console output to a socket (see http://search.cpan.org/dist/perl/lib/perl5db.pl and the RemotePort option in the case of Perl). Obviously I'm not a Perl nor Python hacker. I just find it hard to believe that languages that has matured over a decade doesn't have a remote debug option that can be used. Sockets, shared-memory, pipes, or whatever. You will be able to attach to it somehow given it's been launched with the correct flags. I think you're wrong in thinking that client/server based debugging is in any way unusual. Googling for perl+debug+remote tells me that client/server based Perl debugging is common. There are a number of solutions. The same is true for Python. What I would like to see is some form of IDE that our developers can use, probably under Windows, that would enable them to step through and debug stored procedures on a remote server on another network. Using simple console redirection would involve parsing text output which strikes as being something that would break easily with new releases. Good IDE's exists already, for both perl, python, and others (you mention Komodo further down). In addition to that you have well tested Emacs mappings and command line clients that people like to use. The more prominent ones will always provide upgrades when there are protocol changes. And then of course, there is the problem of security whereby anyone could connect to the socket. For example, imagine a group of developers all debugging different functions simultaneously; if one of them connected to the wrong console socket then it could be confusing as the developer wanders why their code never stops at a breakpoint. In my experience you have two use-cases. 1. You debug during development and have either have your own database instance to play around with or a shared sandbox database where the security is very low. 2. You debug a running instance on a live server and the sysadmin is well aware what you're doing. You will be given required privileges as needed. I would argue that the times when security becomes an issue when debugging are extremely rare and not worth spending lots of time and effort on. It is enough to prevent anyone but a super-user (or even a sysadmin) to start a remote debug session. ... the only thing that PL developers would have to is provide an API for things like step, set breakpoint, read variable, and eval. Well, the API isn't worth much without an implementation. So in essence, you're saying that the only thing the PL developers would have to do is to provide a full blown debug server ;-) A promising option at the moment would be to implement the DBGP protocol for Perl/Python/Tcl as suggested by Lukas, mainly because it appears ActiveState have already written the modules to do this (see http://aspn.activestate.com/ASPN/Downloads/Komodo/RemoteDebugging). There you go! Perl, PHP, Python, and Tcl all taken care of. IDE and all! The only issue I can see with this is again related to security in that the debugger would not respect the ACLs within PostgreSQL which could potentially allow a user to break inside a function that wasn't his/her own. I'd use the Komodo IDE and implement the ability to start the PL using a GUC setting per my original suggestion (with super-user requirement). Trivial solution, minimum effort, and very useful. KISS principle. It would be great if we could agree on a GUC flag (or flags) that would control debugging for all PL's. At present, all PL implementors must use their own (module specific) flags. Kind regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal for debugging of server-side stored procedures
Some thoughts from another Tom... Mark Cave-Ayland wrote: ... debugging in Perl is initiated with perl -d somefile.pl which then becomes a special interactive interpreter session. The same is also true with Python which is launched in a similar way, python -m pdb somefile.py. All PL's are launched someway or another . It would probably be very simple to add the '-d' flag to the PL/Perl launcher and the '-m' flag to the PL/Python launcher and control it with module specific GUC settings. PL/Java already does this. The SQL to initialize debugging looks like this: SET pljava.vmoptions TO '-agentlib:jdwp=transport=dt_shmem,server=y,suspend=y'; This tells the VM to load in-process debugging libraries and specifies the kind of connection to be made. As soon as the first java function is called, the process is suspended and waits for a client debugger to attach. The amount of work needed to create similar solutions for perl, python, tcl, etc. is probably limited and fairly trivial. However, both Perl and Python offer the ability to hook into the interpreter to detect events. For example, Python offers a C API here [1] that allows you to supply a callback when particular events occur; this would allow us to execute a callback after the interpreter executes each line. And it would force you to write your own proprietary debugger backend for each language. That's a major thing to undertake. Have you considered the maintenance aspect of it? Not something that would make the author of a PL module scream with joy. It might be wise to also consider what debugger preferences a skilled perl/python/whatever language developer might have. A home brewed debugger in the form of a PostgreSQL add-on might not be a natural choice. Perl seems a little more messy in that I can't find a documented C API to hook into the interpreter, but it looks as if it may be possible to cook something up with writing a new DB package [2] which uses XS call a C callback. The other issue is that unlike Python, the debug capability must be specified when creating the instance of interpreter rather than being able to enable/disable debugging on the fly so it may mean that two instances of the perl interpreter need to be maintained in memory - a standard instance and a debugging instance. Plpgsql isn't really a concern as we can simply code up whichever model we eventually decide to use. The only bad news I can see is that it appears Tcl may need to have the source patched in order to add debug hooks into the interpreter [3]. You'll find more bad news as you go along. I have sincere doubts that inventing your own multi-language debugger is the right way to go. Also, I'm wondering how to design the mechanism to be extendable in the future beyond debugging server-side functions, such as hooking into more of the executor mechanisms. For example, I could see a use case for allowing a profiler to use the debug API to attach to the postmaster to receive notifications whenever a new SQL query starts and ends; this would allow someone to write a rather neat app that could rank the most popular queries in terms of processing time really easily, but then maybe this could considered treading on the toes of the existing stats process... SQL debugging and hooking into the executor sounds really interesting and something that would really be worth the effort. I doubt there's a gain mixing that with debugging of pl's in general. Having multiple debugger clients, one for each language, and one for SQL, might be a good thing. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Inefficient bytea escaping?
Marko Kreen wrote: On 5/28/06, Martijn van Oosterhout kleptog@svana.org wrote: With -lpthread lock.enabled 323s lock.disabled 50s lock.unlocked 36s I forgot to test with -lpthread, my bad. Indeed by default something less expensive that full locking is going on. The crux of the matter is though, if you're calling something a million times, you're better off trying to find an alternative anyway. There is a certain amount of overhead to calling shared libraries and no amount of optimisation of the library is going save you that. The crux of the matter was if its possible to use fwrite as easy string combining mechanism and the answer is no, because it's not lightweight enough. IIRC the windows port make use of multi-threading to simulate signals and it's likely that some add-on modules will bring in libs like pthread. It would be less ideal if PostgreSQL was designed to take a significant performance hit when that happens. Especially if a viable alternative exists. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Proposal for debugging of server-side stored procedures
Tom Lane wrote: Mark Cave-Ayland [EMAIL PROTECTED] writes: ... So basically yeah, what we need is a debug subchannel in the FE/BE protocol. I'd suggest inventing a single Debug message type (sendable in both directions) with the contents being specified by a separate protocol definition. Or perhaps invert that and imagine the FE/BE protocol as embedded in a debug protocol. I think this is a bad idea. PL/Java will use either shared memory or a socket to attach and as you already mentioned, when using C, a gdb will attach directly using the pid. I wouldn't be too surprised if Perl, Python, and PHP all have a similar solution and thus have no benefit from additions to the FE/BE protocol. IMO, debugging should be language specific and take place in a separate channel. There's no gain whatsoever mixing it with the FE/BE protocol. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal for debugging of server-side stored procedures
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: I think this is a bad idea. PL/Java will use either shared memory or a socket to attach and as you already mentioned, when using C, a gdb will attach directly using the pid. I wouldn't be too surprised if Perl, Python, and PHP all have a similar solution and thus have no benefit from additions to the FE/BE protocol. IMO, debugging should be language specific and take place in a separate channel. There's no gain whatsoever mixing it with the FE/BE protocol. It may well be that for plperl and friends we can kick the problem off to language-specific debuggers --- indeed, one of the first things we need to do is look at those to see what we can avoid reinventing. But what of plpgsql? Ideally, all pl's should use the same protocol. It should be language agnostic and allow different regions of the code to origin from different languages. That way, it would be possible to single step a plpgsql function that in turn calls a function in pljava. Incidentally, the JDWP (Java Debug Wire Protocol) was designed to do just that. But I think it would be very complicated to cross language boundaries even if we did use that. The JDWP and the architecture that surrounds it might be a good source for inspiration though. See: http://java.sun.com/j2se/1.5.0/docs/guide/jpda/architecture.html. Also, any solution of this type probably requires that the person doing debugging have database superuser access (in fact, be logged directly into the server machine as the postgres user). It'd be nice to have an approach that could be used by non-superusers to debug their trusted-PL functions. Indeed. In my case, it's a matter of who starts the VM and what options that are passed to it (given certain options, the JVM will listen to a port or a semaphore that controls a region of shared memory). That in turn is controlled using GUC settings so for PL/Java I think it would be possible to set it up that way. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] API changes in patch release
The world is not perfect and I know that you are normally very restrictive in what is back-patched from head into bug-fix branches. The 8.1.4 release however, did introduce a problem. You changed the API function inv_open() with the comment Revise large-object access routines to avoid running with CurrentMemoryContext. This change will force me to a) introduce patch level sensitive conditionals in the code, and b) have two PostgreSQL 8.1.n compatible releases of PL/Java. One where n 4 and another where n = 4. I would like to avoid this in the future if possible. API's should remain stable during patch releases. Having said that, I've been in the game long enough to know that Utopia doesn't exist. You probably had a very good reason to break the compatibility. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Porting MSSQL to PGSQL (Was: [OT] MySQL is bad, but THIS bad?)
Tom Lane wrote: I think the hard part of this task is designing the API for access to the rowsets from triggers. My preference would be something similar to two Portal instances (the NEW and OLD). I could then map it in the same way that I map the result of a query. If the API actually used two real Portal instances, the PL/Java implementation would take minutes. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Christopher Kings-Lynne wrote: If you want to get users to swtich to your software from your competitors, you have to eliminate barriers, and a big one for any database is getting locked into a specific one. People aren't going to take the time to try switching to postgresql if they can't easily make it back to thier former database. It's one of the reasons why PostgreSQL's standards compliance is so important; if you want to swtich to a new database, your best bet is to give PostgreSQL a shot, because even if you don't like it, we're not going to try and trap you into our software with bunches of non-standard knobs. Low barrier to exit == low barrier to entry. Another reason why a tool to export from pgsql to mysql is just as important as the vice versa... If that's really true, then let's create a bidirectional compatibility layer as a joint venture with people from the MySQL camp. Should be a win-win situation. I somehow doubt that is the case. Important yes. But just as important? No way. We would loose big time on the export side since the vendor lock-in aspect is seriously out balanced by current levels of standards compliance. On the other hand, we'd win by order of magnitude on the import side. I bet the MySQL people would be utterly uninterested in such a venture. I think that if anything should be done, we should concentrate on import and let the MySQL people worry about going the other way. Once it becomes just as imporant, they will. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Question about casts
Just out of curiosity (and most likely, ignorance). Why can't I cast an array of strings into a string? I.e. thhal=# select ('{a,b}'::varchar[])::varchar; ERROR: cannot cast type character varying[] to character varying or a cstring into a varchar, i.e. thhal=# select array_out('{a,b}'::varchar[])::varchar; ERROR: cannot cast type cstring to character varying ISTM, the implementation of such casts should be fairly simple and straight forward and sometimes even useful. Every data type comes with string coercion routines anyway right? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question about casts
Martijn van Oosterhout wrote: On Thu, May 18, 2006 at 05:41:14PM +0200, Thomas Hallgren wrote: Just out of curiosity (and most likely, ignorance). Why can't I cast an array of strings into a string? I.e. thhal=# select ('{a,b}'::varchar[])::varchar; ERROR: cannot cast type character varying[] to character varying Why would you need to? What would you expect to happen? Joined with a seperator, no seperator, with parenthesis? Well, let's assume I use JDBC. I write code like: ResultSet rs = stmt.executeQuery(SELECT arrValue ...); while(rs.next()) String v = rs.getString(1); The tuples received by the result set contains String[]. If I let PL/Java convert it (I don't currently), it will be according to Java semantics. I'd like to convert it using PostgreSQL semantics instead. So I change my statement to: SELECT array_out(arrValue) ... that works of course. What baffles me is that I cannot write SELECT arrValue::varchar What's the benefit of a cast over a function call? None whatsoever. But PostgreSQL enables a lot of casts for some reason or another right? Why not this one? Every cast costs space and lookup time. Any user can add their own casts if they want, but the system generally only includes the ones useful to many people or those required for standards complience. OK. I can live with that. I would have thought that casting into the string types was something that could be hardwired since the backing functions are mandatory. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
John DeSoi wrote: Right, you'll definitely need to hack the C source code to force PostgreSQL to accept invalid dates ;) http://sql-info.de/mysql/gotchas.html#1_14 Couldn't we just install something that replaced invalid dates with a randomly generated but otherwise correct dates? That way they would become completely invisible. No one could even tell that the date was invalid to start with. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Lukas Smith wrote: .. but spouting outdated FUD is really making you two look foolish. Wow. On a scale from 1 to 10 measuring seriousness, I'd put my posting on -4. I'd advice you to take a step back and get some distance if you consider it outdated FUD. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Chris Browne wrote: [EMAIL PROTECTED] (Marc G. Fournier) writes: To give someone a running chance at migrating it to PostgreSQL, a 'MySQL compatibility module' would allow them to just plug the existing DB in, and then work at improving sections of the code over time ... Hell, if done well, the module should be able to dump appropriately 'clean' PgSQL schemas ... as in your example elow about the domains ... You can't have that because you essentially need to throw out four aspects of fairly vital data validation functionality: 1. Dates cease to be validatable. 2. NULL and 0 and '' are all roughly equivalent, even though they aren't. 3. Foreign key constraints have to be ignored. 4. You have to fabricate a locale offering a case-insensitive sort order. I suppose #4 isn't vital data validation... But after you gut the PostgreSQL-based system of those four aspects of data integrity, I'm not sure there's any remaining benefit to having PostgreSQL in play... Assuming the objective with a transition would be to improve on things, an alternative approach could be to offer a three step migration path: 1. A dump/restore utility that dumps a MySQL database and restores it into a PostgreSQL database. This utility must have plugin capabilities where logic can be added that deals with cases #1, #2, and #3 above. It might be as simple as just logging incorrect records to a file and skip them. A pre-defined set of generic plugins could be supplied that did just that. A user would have the chance to replace them with customized plugins to cover for special cases in his own app. Perl or PHP would probably be good candidates for plugin language. 2. Provide an add-on to the PostgreSQL parser that would make it accept MySQL syntax. The database would still run untainted underneath so from this point on, no more invalid dates or foreign keys can be entered. Some other add-ons are needed as well to cater for some sane but non-standard MySQL behavior that PostgreSQL is lacking. 3. A good user guide that helps the user to, over time, move away from the non standard MySQL specific expressions. The objective being to at some point skip the MySQL syntax layer altogether. Regards, Thomas Hallgren (dead serious this time) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] audit table containing Select statements submitted
Some users of PL/Java make use of a non-default connection from within a Trigger in order to do this. In essence, they load the client JDBC package into the backend to let the backend as such become a client. The second connection is then maintained for the lifetime of the first. Perhaps not the most efficient way of doing it but it works. Regards, Thomas Hallgren Gurjeet Singh wrote: Do we have any plans of introducing 'AUTONOMOUS TRANSACTION' like feature? Again, it might not be a part of the standard but it is very helpful in situations like these!!! You can run a trigger with an autonomous transaction attached to it, which guarantees that the work done by trigger persists even though the calling transaction rolls back (potentially a hacker trying to cover his tracks)!!! (http://asktom.oracle.com/~tkyte/autonomous/index.html) Gurjeet. On 5/16/06, Josh Berkus josh@agliodbs.com wrote: Doug, But what if the user calls the access function, sees the data, then issues a ROLLBACK? The audit record would be rolled back as well (as Tom pointed out earlier). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] psql feature thought
Tom Lane wrote: Quite aside from the compatibility and how-useful-is-it-really arguments, I think this'd be a bad idea in the abstract. SQL is not one of those languages that assigns semantic significance to the shape of whitespace [1]. We should NOT introduce any such concept into psql, because it'd fundamentally break the lexical structure of the language. +1 Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Creating a SHELL type [Was: User Defined Types in Java]
I just discovered that it is now possible to create a SHELL type using the 'CREATE TYPE type name' syntax using cvs head. Will this come in 8.2 only or will it also be included in 8.1.4? Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Inheritance, Primary Keys and Foreign Keys
Albert Cervera Areny wrote: ...What do you mean with triggers that maintain cascade behavior? It ties on to how references are handled. Since they currently ignore the inheritance aspect, you need triggers that enforce 'on cascade delete/update'. They will become obsolete if that changes (i.e. SELECT FROM instead of SELECT FROM ONLY). Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Inheritance, Primary Keys and Foreign Keys
Albert Cervera Areny wrote: Of course, that's an option for my case. Just wanted to know if this solution could be useful for PostgreSQL in general. Mainly because I'll add some triggers to check what maybe PostgreSQL should do itself but it's unimplemented. If that's not interesting or a proper solution for PostgreSQL I'll add it using the existing DDL in my application and that's all. What do you think? I think that if you want the database to improve its current inheritance behavior, then this trigger set is too limited. You need triggers that maintain both unique and primary keys and triggers that maintain cascade behavior. In order to make it really good, you would also need to add some functionality to the mechanisms that maintain references. Today, they don't recognize inheritance at all. Personally, I use Hibernate. It tries to compensate for the lack of these features but since it is a middle-tier (or client) solution, it's not ideal. Another client can still violate the rules and to maintain integrity in the client is negative from a performance standpoint. I think it would be great if PostgreSQL could provide a more complete set of features that would enable inheritance. A good start would be to extend it with the functionality needed to maintain references, cascade actions, and enforce unique constraints. On the other hand, inheritance is a tricky business and a good OO-RDB mapper will give you several choices of how it should be mapped. There's no one size fits all. The best solution is probably if someone (you perhaps?) writes an external OO-RDB mapper module that executes in the backend. The author of such a tool would of course need some new nifty backend API's in order to do whats needed with references etc. I actually wrote something similar using Oracle a couple of years ago. It was based on type inheritance and views rather then tables and used 'instead of' actions on all views (Oracles own mechanisms where far to limited). In some respect, I think that is a better solution. Inheritance and all that comes with it is more a 'type' thing then a 'table' thing in my world. A view is then used to _map_ the types to persistent storage, i.e. the 'tables'. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] any, anyelement, and anyarray
I have some questions regarding any, anyelement, and anyarray. - Why does PostgreSQL declare three different generic types? Isn't one enough? ISTM it would be far simpler to use constructs like 'any' and 'any[]' but neither of them are permitted. - Why isn't the 'anyarray' declared as an array using the elemenent type 'anyelement' in pg_type? - Why can't I write 'anyelement[]'. Shouldn't that be the same thing as 'anyarray'? - The 'any' is listed as a type but I get a syntax error whenever I try to use it. If I could use it, what would be the difference between 'any' and 'anyelement'? The only thing I can think of is if 'anyelement' was restricted to non-arrays, but apparently it isn't. I've tried and there's nothing stopping me from passing an 'int[]' to a function that takes an 'anyelement'. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] any, anyelement, and anyarray
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: - Why does PostgreSQL declare three different generic types? Isn't one enough? ISTM it would be far simpler to use constructs like 'any' and 'any[]' but neither of them are permitted. any isn't the same as anyelement, because it doesn't have the property of constraining different argument positions to be the same type. For instance, compare(any,any) and compare(anyelement,anyelement) would accept different sets of input types. I've read that the anyelement will constrain all parameters and the return type to be of the same type, or an array of that type. I understand that concept since it will give the executor an ability to infer the return type by looking at the parameters. If any doesn't do that then I understand the difference. I've been trying to use any with no luck. How do I declare a function that takes an any as a parameter? I know how it would be implemented internally but the SQL for it eludes me. I realize that it would be trickier to return an any since the expected return type must somehow be derived from the context where the function was called. But similar things are done with record already, right? - Why can't I write 'anyelement[]'. Shouldn't that be the same thing as 'anyarray'? No, you're confusing these with actual datatypes. They are pseudotypes, which means they're only allowed as function argument/result type placeholders. I understand that. But I'd consider it an implementation detail and I think the average SQL user finds it a bit confusing. Wouldn't it be a good idea to let the SQL parser recognize the anyelement[] construct as a synonym for anyarray? I.e. allow me to write: CREATE FUNCTION makeSingleElementArray(anyelement) RETURNS anyelement[] AS ... Regards, Thomas Hallgren PS. I'm happy to announce that PL/Java now handles records, domains, user defined types, anyelement, and anyarray without problems. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Number of dimensions of an array parameter
I can create a function that takes a two dimension int array: CREATE FUNCTION twodims(int[][]) RETURNS void AS ... but there's nothing stopping me from calling this function with an arbitrary number of dimensions on the array. I'd like to map a parameter like the one above to a corresponding representation in Java (it would be int[][] there too). As it turns out, I can't do that. PostgreSQL will not store any information that can tell me how many dimensions that where used in the declaration, i.e. it's impossible to write a language VALIDATOR that, based on the information in pg_proc, builds a signature where the number of dimensions is reflected. This leaves me with two choices: Only allow arrays with one dimension unless the parameter is of a domain type (domains are apparently stored with the actual number of dimensions). Any call that uses an array parameter with more then one dimension will yield an exception. --OR-- Always map to Object[] instead of mapping to the correct type, . This will work since an array in Java is also an Object and all primitive types can be represented as objects (i.e. int can be a java.lang.Integer). The strong typing and the ability to use primitives are lost however. I'm leaning towards #1 and hoping that PostgreSQL will enhance the parameter type declarations to include the dimensions in future releases. ... After some more testing ... Unfortunately, I run into problems even when I use domains. Consider the following: thhal=# CREATE DOMAIN twodims as int[][]; CREATE DOMAIN thhal=# SELECT typndims FROM pg_type WHERE typname = 'twodims'; typndims -- 2 (1 row) thhal=# SELECT array_dims('{{{1,2,3},{3,4,3}},{{5,3,2},{9,9,9}}}'::twodims); array_dims - [1:2][1:2][1:3] (1 row) IMO, there is something seriously wrong here. Clearly the number of dimensions is a property of the type. Any array with a different number of dimensions should yield an error or at least be coerced into the right number of dimensions. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Number of dimensions of an array parameter
Stefan Kaltenbrunner wrote: while it would be nice to improve that - it is actually documented quite clearly. http://www.postgresql.org/docs/current/static/arrays.html has: However, the current implementation does not enforce the array size limits — the behavior is the same as for arrays of unspecified length. Actually, the current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring number of dimensions or sizes in CREATE TABLE is simply documentation, it does not affect run-time behavior. A documented flaw is much better than an undocumented one but it's still a flaw, and a pretty bad one at that. It's like having a compiler that doesn't complain when you define a C-function that takes an int** and then pass an int*. Would it be hard to enforce a real check? The implementation could use GUC settings like 'enforce_array_dimensions' and 'enforce_array_lengths' that could be set to false for the legacy implementations that rely on the current behavior. I know Tom added the ability to have NULL values in the arrays. Perhaps now is the time to improve the type semantics as well? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Number of dimensions of an array parameter
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: Would it be hard to enforce a real check? The implementation could use GUC settings like 'enforce_array_dimensions' and 'enforce_array_lengths' that could be set to false for the legacy implementations that rely on the current behavior. The fact that it doesn't exactly match Java semantics does not make it legacy behavior. I don't agree that it's a bug; I think it's a feature, precisely because many functions can work on arrays of different dimensions. Why should we change to make PL/Java happier, when it will move us further away from the semantics of, say, PL/R? Would it really? The way I see it, the choice of language is irrelevant. Either you support dimensions or you don't. The way PostgreSQL does it, you get the impression that it is supported while in fact it's not. I can't see how anyone would consider that a feature. If you want the ability to use an arbitrary number of dimensions, then you should have a syntax that supports that particular use-case. An int[][] cannot be anything but a two dimensional int array. Not in my book anyway. That opinion has nothing to do with Java. I think reasonable choices for PL/Java would be to reject multidimensional array arguments, or to silently ignore the dimensionality and treat the data as 1-D in storage order (as I think plperl for instance already does). I agree. That's the way I'll do it. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Number of dimensions of an array parameter
Martijn van Oosterhout wrote: On Mon, May 08, 2006 at 07:31:14PM +0200, Thomas Hallgren wrote: Would it be hard to enforce a real check? The implementation could use GUC settings like 'enforce_array_dimensions' and 'enforce_array_lengths' that could be set to false for the legacy implementations that rely on the current behavior. I know Tom added the ability to have NULL values in the arrays. Perhaps now is the time to improve the type semantics as well? The big probem is where do you store the number of declared dimensions? It's not stored anywhere, so there's nowhere to check against either. If we can fix that first we might get to the checking part. test=# create function foo(int[][]) returns int4 as 'select 1' language sql; CREATE FUNCTION test=# \df foo List of functions Result data type | Schema | Name | Argument data types --++--+- integer | public | foo | integer[] (1 row) Let each type have it's own entry in pg_type. I.e. let the int[] and int[][] be two distinct types (like int and int[] already are). In addition, perhaps introduce a new syntax that denotes 'arbitrary number of dimensions' and let that too be a distinct type. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fwd: [pgsql-hackers-win32] Build with Visual Studio MSVC
Gurjeet Singh wrote: Thomas, I love the idea of eclipse; any platform, any language, one IDE. I am downloading it right now. Can you please send in the steps that you perform to setup the environment, including mingw toolkit; I will try to grow on that. My setup is pretty basic. I can't give you an exact step by step instruction since it's been a while since I last did it. I installed msys, mingw, Eclipse. and then, using the Eclipse update manager and the 'Callisto' site, I installed the C/C++ plugin. I also made sure that the PATH in effect for the Eclipse IDE contains entries for the %MSYS_HOME%\bin and %MINGW_HOME%\bin. That's it basically. The idea of this effort is to have a GUI IDE, with a slew of features that MSVC offers: Memory window that allows you to edit memory inplace, call-stack window, watches, quick-expression evaluater with class/struct support, etc. etc. . If Eclipse can offer all these, then I dont think anyone would mind using it insead of MSVC. Eclipse won't offer all of these. Not yet anyway. What you get is a fair C/C++ editor and parsers for your make output that will annotate your files with errors and warnings. There's said to be some debugging support too on top of gdb, but to be honest, I've never tried it on Windows. I do my C-debugging using gdb on Linux. My attempts to use gdb on Windows have been quite futile so far. Then again, I'm not using the latest MinGW version so perhaps there's still hope. All in all, Eclipse C/C++ has some way to go before it can match up with MSVC. My point was that you can do Windows development without MSVC and you can do it fairly well. If you are a Linux hacker, you might even prefer doing it that way. So as a platform, Windows is not by any means left alone. I really think that what you and others are trying to accomplish is very valuable. If not for me (since I'm mixing Java and C and work on multiple platforms) then certainly for many others. Personally, I'd rather see a Visual Studio port than one for VC++6.0. I wish you the best of luck. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pseudo-type record arguments for PL-functions
David Fetter wrote: On Thu, May 04, 2006 at 09:02:02PM +0200, Thomas Hallgren wrote: Tom Lane wrote: Why can PLs not handle pseudo-types? No one's done the work to figure out which ones are sensible to support and then add the logic needed to support them. PL/Java will handle the RECORD type correctly. I'm just finalizing a new, more flexible, type mapping implementation for PL/Java and it would be easy to add support for more pseudo types too. But what others would make sense? Ideally, some way to get all kinds of user-defined types. DOMAINs, too. :) OK, got them covered as well. Only thing that remain now is arrays. I have a hard time figuring out how to manage them. I'm looking at the arrayutils.c. The thing that makes me a bit confused is the ArrayMetaState. The functions obtain it using: my_extra = (ArrayMetaState *) fcinfo-flinfo-fn_extra; which is fine if there's only one array parameter. What happens if I have two? And how do I declare a function that takes, say, an array of int's as a parameter (in SQL that is)? I find very little information about how to write functions that deals with arrays. My only source of information right now is the arrayutils.c. Other pointers to docs and code are greatly appreciated. Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pseudo-type record arguments for PL-functions
Tom Lane wrote: Make a struct that can hold two ArrayMetaStates. Or whatever else you need. What a C function keeps in fn_extra is its own affair. Yes, of course. I see that now. I was unaware that a function had an associated user data. What's the semantics associated with the fn_extra? Does it retain its setting throughout a session (i.e. the lifetime of the backend process)? PL/Java associates a structure with a function using a hash map lookup on the function Oid. Seems I could use the fn_extra and remove that map altogether. Then again, there are times when I need to invalidate the associated structure of all java functions due to reload of jar files. Is there any way that I can list all functions for a specific language and get hold of their current setting of the fn_extra? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pseudo-type record arguments for PL-functions
Martijn van Oosterhout wrote: Looking at contrib/intarray/_int_op.c might help. It does something like this: ArrayType *a = (ArrayType *) DatumGetPointer(PG_DETOAST_DATUM_COPY(PG_GETARG_DATUM(0))); The file src/include/utils/array.h also seems to have many useful functions. Hope this helps, Yes, the intarray stuff was very helpful but also somewhat confusing. Why are there two ways of representing some of the array types? I mean, why is there an _int4 when you could just as well write int4[]? I'm probably missing the point altogether. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [pgsql-hackers-win32] Build with Visual Studio MSVC
Gurjeet Singh wrote: My main grudge is that if we are supporting almost all flovours of nixens and compilers (close to 34 according to official website), then why are we leaving Windows platform alone? This will bring in quite a lot more developers. You should look at MinGW as a development toolkit, not a platform. PostgreSQL builds and runs just fine on the Windows platform. Personally, I use Eclipse C/C++ with MinGW since it brings me a number of advantages. The most prominent one is that I only need to master one IDE regardless of platform. I am sure it's not going to be easy, but I am sure with this great community suppport, we sure can achieve it. Seems some people has done a lot of work to get things working with VC++ already. Search for the word MSVC on this list. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pseudo-type record arguments for PL-functions
Tom Lane wrote: Why can PLs not handle pseudo-types? No one's done the work to figure out which ones are sensible to support and then add the logic needed to support them. PL/Java will handle the RECORD type correctly. I'm just finalizing a new, more flexible, type mapping implementation for PL/Java and it would be easy to add support for more pseudo types too. But what others would make sense? Kind Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Is a SERIAL column a black box, or not?
Rod Taylor wrote: If SERIAL is going to be kept long term, then it should be the macro version so it doesn't appear too duplicated. I concur with this. But to really break out from the current middle ground, you must implement the IDENTITY and also document the SERIAL macro as deprecated. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Finding the correct type
I'm writing a UDT that takes a varchar argument that represents the name of a type. The caller may optionally qualify with a namespace, i.e. pg_catalog.varchar, or public.address. Is there a c-function somewhere that will return the pg_type that corresponds to the name (with respect to the current setting of search_path in case the name is unqualified)? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Finding the correct type
Martijn van Oosterhout wrote: On Sun, Apr 30, 2006 at 12:50:23PM +0200, Thomas Hallgren wrote: I'm writing a UDT that takes a varchar argument that represents the name of a type. The caller may optionally qualify with a namespace, i.e. pg_catalog.varchar, or public.address. Is there a c-function somewhere that will return the pg_type that corresponds to the name (with respect to the current setting of search_path in case the name is unqualified)? If you want it as a C string, something like format_type_be() would suffice. Not it's designed for use in error messages so it makes no particular to clean up after itself. Another possibility is the output function for the regtype type, ie regtypeout. Hope this helps, Well, regtypeout led me to regtypein which in turn led me to parseTypeString which seems to be exactly what I want. Thanks, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] stdout when running as a Windows service
I'm having problems with crashes whenever I attempt to write something on stdout or stderr from a UDT whenever the postmaster is running as a windows service. Not sure what I'm doing wrong. Is this a known problem? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Get explain output of postgresql in Tables
Richard Huxton wrote: Tom Lane wrote: I dislike the thought of encouraging people to post stuff in a not-easily-readable format. They won't do it anyway, if it's not default; look how we still can't get people to send EXPLAIN ANALYZE output the first time. It certainly needs to be one format for both purposes. One idea that comes to mind is to work up some trivial little script that undoes the more common forms of cut-and-paste damage. I wonder if it would help much just to change EXPLAIN to indent with something other than spaces? Maybe instead of Nested Loop (cost=1.06..40.43 rows=5 width=244) Join Filter: (public.tenk1.unique2 = int4_tbl.f1) - HashAggregate (cost=1.06..1.11 rows=5 width=4) print Nested Loop (cost=1.06..40.43 rows=5 width=244) --Join Filter: (public.tenk1.unique2 = int4_tbl.f1) HashAggregate (cost=1.06..1.11 rows=5 width=4) Not sure what would look nice, but this would at least remove the hazard from stuff that thinks whitespace isn't significant. That's the sort of thing I was thinking of, or even something like: 1 Nested Loop ... 1.1 Join Filter... 1.1.1 HashAggregate... 1.2 etc Why not go all the way. Here's the above using Satoshi's suggestion: NestedLoop cost=1.06..40.43 rows=5 width=244 JoinFilter publicTenk1Unique2=int4_tbl.f1 HashAggregate cost=1.06..1.11 rows=5 width=4/ /JoinFilter /NestedLoop Easy to copy/paste and whitespace doesn't matter. Easy to read (well, to some at least) and can be even easier if you have access to an XML viewer. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] plpgsql by default
Tom Lane wrote: In the end it's only one small component of security, but any security expert will tell you that you take all the layers of security that you can get. If you don't need a given bit of functionality, it shouldn't get installed. I think any security expert would say that if let non trustworthy people get so far as to create their own SQL statements, you're in big trouble. Plpgsql or not. I fail to see what the real issue is here. Your argument is analog to saying don't install bash on a Linux system by default. People might do bad things with it. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_class catalog question...
Jim C. Nasby wrote: On Sat, Apr 01, 2006 at 05:42:34PM +0200, Thomas Hallgren wrote: Why not simply a fixed number of bytes, i.e. byte(16) or octet(16)? Hexadecimal is just a convenient human-readable representation. Well, hex is much easier to deal with in many regards than raw bytes, though. But yes, the idea is that you'd just store raw bytes on disk. byte or octet would work fine if they existed. IIRC, Oracle actually uses the term RAW. It makes sense I think. No conversion applied, no nothing. Just simple raw data. - thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Remote PL/Java, Summary
Hi all, And thanks for very good input regarding a remote alternative to PL/Java (thread titled Shared Memory). I'm convinced that such an alternative would be a great addition to PL/Java and increase the number of users. The work to create such a platform that has the stability and quality of todays PL/Java is significant (I really do think it is a production-grade product today). So significant in fact, that I'm beginning to think of a third alternative. An alternative that would combine the performance of using in-process calls with the benefits of sharing a JVM. The answer is of course to make the backend multi-threaded. This question has been debated before and always promptly rejected. One major reason is of course that it will not bring any benefits over the current multi-process approach on a majority of the platforms where PostgreSQL is used. A process-switch is just as fast as a thread-switch on Linux based systems. Over the last year however, something has happen that certainly speaks in the favor of multi-threading. PostgreSQL is getting widely adopted on Windows. On Windows, a process-switch is at least 5 times more expensive then a thread-switch. In order to appropriate locking, PostgreSQL is forced to do a fair amount of switching during transaction processing so the gain in using a multi-threaded approach on Windows is probably significant. The same is true for other OS'es where process-switching is relatively expensive. There are other benefits as well. PostgreSQL would no longer need shared memory and semaphores and lot more resources could be shared between backend processes. The one major drawback of a multi-threaded approach (the one that's been the main argument for the defenders of the current approach) is vulnerability. If one thread is messing things up, then the whole system will be brought to a halt (on the other hand, that can be said about the current shared-memory approach as well). The cure for this is to have a system that, to the extent possible, prevents this from happening. How would that be possible? Well, such systems are widely used today. Huge companies use them in mission critical applications all over the world. They are called Virtual Machines. Two types in particular are gaining more an more ground. The .NET based CLR and the Java VM. Although there's an Open Source initiative called Mono that implements the CLR, I still don't see it as a viable alternative to create a production-grade multi-platform database. Microsofts CLR is of course confined to Microsoft platforms. The Java VM's are however a different matter altogether. And with the java.nio.channels package that was introduced in Java 1.4 and the java.util.concurrent package from Java 5.0, Java has taken a major steps forward in being a very feasible platform for a database implementation. There's actually nothing stopping you from doing a high-performance MVCC system using Java today. A SQL parser would be based on JavaCC technology (the grammar is already written although it needs small adjustments to comply with the PostgreSQL dialect). Lots of technology is there out-of-the-box such as regular expressions, hash-maps, linked lists, etc. Not to forget an exceptionally great threading system, now providing atomic operations, semaphores, copy-on-write arrays etc. In short, everything that a database implementor could ever wish for. The third alternative for PL/Java, an approach that gets more viable every minute I think about it, is to implement the PostgreSQL backend completely in Java. I'm involved in the development of one of the commercial JVM's. I know that an enormous amount of resources are constantly devoted to performance optimizations. The days when a complex system written in C or C++ could outperform a JVM have passed. A static optimizer can only do so well. A JVM, that collects heuristics, communicates with the CPU about cache usage etc., can be a great deal smarter on how the final machine code will be optimized, and re-optimized should the conditions change. It would be great if PostgreSQL could benefit from all this research. If a commercial JVM is perceived as a problem, then combine^h^h^hpile the code with GNU gcj instead of gcc like today. The list of advantages can be made a mile long. There's no point in listing everything here. From my own standpoint, I'm of course thinking first and foremost about the advantages with PL/Java. It will become the absolute most efficient PL of them all. Other languages, for which no good Java implementation exists (I'm thinking Jython for Python, etc.), can be implemented using JNI. The most common functions used by say, PL/Perl could probably be implemented as callbacks into the Java domain in order to make the changes in the respective PL minimal. Opinions? Suggestions? Kind Regards, Thomas Hallgren ---(end of broadcast
Re: [HACKERS] pg_class catalog question...
Jim C. Nasby wrote: On Fri, Mar 31, 2006 at 11:29:15AM -0500, Tom Lane wrote: This argument falls flat when you consider that the width of a CHAR entry is measured in characters, not bytes, and therefore its physical size is not fixed even if its logical width is. True, but in every case I've used char it was to store something that would never be multi-byte, like a GUID, or a SHA1. Though I guess in retrospect, what would really be handy is 'hex' datatype, that stores a hex string (possibly with a custom format, such as a GUID) in it's native binary format. Why not simply a fixed number of bytes, i.e. byte(16) or octet(16)? Hexadecimal is just a convenient human-readable representation. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remote PL/Java, Summary
Andrew Dunstan wrote: and a happy April 1 to you too, btw. ;-) - thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Shared memory
Hi Simon, Thanks for your input. All good points. I actually did some work using Java stored procedures on DB2 a while back but I had managed to forget (or repress :-) ) all about the FENCED/NOT FENCED stuff. The current discussion definitely puts it in a different perspective. I think PL/Java has a pretty good 'NOT FENCED' implementation, as does many other PL's, but no PL has yet come up with a FENCED solution. This FENCED/NOT FENCED terminology would be a good way to differentiate between the two approaches. Any chance of that syntax making it into the PostgreSQL grammar, should the need arise? Some more comments inline: Simon Riggs wrote: Just some thoughts from afar: DB2 supports in-process and out-of-process external function calls (UDFs) that it refers to as UNFENCED and FENCED procedures. For Java only, IBM have moved to supporting *only* FENCED procedures for Java functions, i.e. having a single JVM for all connections. Are you sure about this? As I recall it a FENCED stored procedure executed in a remote JVM of it's own. A parameter could be used that either caused a new JVM to be instantiated for each stored procedure call or to be kept for the duration of the session. The former would yield really horrible performance but keep memory utilization at a minimum. The latter would get a more acceptable performance but waste more memory (in par with PL/Java today). Each connection's Java function runs as a thread on a single dedicated JVM-only process. If that was true, then different threads could share dirty session data. I wanted to do that using DB2 but found it impossible. That was a while back though. That approach definitely does increase the invocation time, but it significantly reduces the resources associated with the JVM, as well as allowing memory management to be more controllable (bliss...). So the overall picture could be more CPU and memory resources for each connection in the connection pool. My very crude measurements indicate that the overhead of using a separate JVM is between 6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured with 4GB RAM or more are not uncommon. I'm not saying that the overhead doesn't matter. Of course it does. But the time when you needed to be extremely conservative with memory usage has passed. It might be far less expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead. My point is, even fairly large app-servers (using connection pools with up to 200 simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based server with 4GB RAM and show very good throughput with the current implementation. If you have a few small Java functions centralisation would not be good, but if you have a whole application architecture with many connections executing reasonable chunks of code then this can be a win. One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the overhead is not just limited to the actual call of the UDF, it's also imposed on all database accesses that the UDF makes in turn. In that environment we used Java for major database functions, with SQL functions for small extensions. My guess is that those major database functions did a fair amount of JDBC. Am I right? Also the Java invocation time we should be celebrating is that by having Java in the database the Java-DB time is much less than it would be if we had a Java stack sitting on another server. I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the actual database are very common. One major reason being that you don't want network overhead between the middle tier and the backend. Moving logic into the database instead of keeping it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Shared memory
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: This FENCED/NOT FENCED terminology would be a good way to differentiate between the two approaches. Any chance of that syntax making it into the PostgreSQL grammar, should the need arise? Of what value would it be to have it in the grammar? The behavior would be entirely internal to any particular PL in any case. Not necessarily but perhaps the term FENCED is incorrect for the concept that I have in mind. All languages that are implemented using a VM could benefit from the same remote UDF protocol. Java, C#, perhaps even Perl or Ruby. The flag that I'd like to have would control 'in-process' versus 'remote'. I'm not too keen on the term FENCED, since it, in the PL/Java case will lead to poorer isolation. Multiple threads running in the same JVM will be able to share data and a JVM crash will affect all connected sessions. Then again, perhaps it's a bad idea to have this in the function declaration in the first place. A custom GUC parameter might be a better choice. It will not be possible to have some functions use the in-process approach and others to execute remotely but I doubt that will matter that much. I'm still eager to hear what it is in the current PL/Java that you consider fundamental unresolvable problems. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Shared memory
Dave Cramer wrote: What exactly is a FENCED solution ? If it is simply a remote connection to a single JVM then pl-j already does that. Last time I tried to use pl-j (in order to build a mutual test platform), I didn't manage to make it compile due to missing artifacts and it wasn't ported to Windows. Lazslo filed a JIRA bug on that but since then (August last year) I've seen no activity in the project. Is it still alive? Is anyone using it? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Shared memory
Dave Cramer wrote: I'm not too keen on the term FENCED, since it, in the PL/Java case will lead to poorer isolation. Multiple threads running in the same JVM will be able to share data and a JVM crash will affect all connected sessions. When was the last time you saw a JVM crash ? These are very rare now. I think that's somewhat dependent on what JVM you're using. For the commercial ones, BEA, IBM, and Sun, i fully agree. In any case if it does fail, it's a JVM bug and can happen to any code running and take the server down if it is in process. Crash is perhaps not the right word. My point concerned level of isolation. Code that is badly written may have serious impact on other threads in the same JVM. Let's say you cause an OutOfMemoryException or an endless loop. The former will render the JVM completely useless and the latter will cause low scheduling prio. If the same thing happens using an in-process JVM, the problem is isolated to that one session. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Shared memory
Martijn, I tried a Socket approach. Using the new IO stuff that arrived with Java 1.4 (SocketChannel etc.), the performance is really good. Especially on Linux where an SMP machine show a 1 to 1.5 ratio between one process doing ping-pong between two threads and two processes doing ping-pong using a socket. That's acceptable overhead indeed and I don't think I'll be able to trim it much using a shared memory approach (the thread scenario uses Java monitor locks. That's the most efficient lightweight locking implementation I've come across). One downside is that on a Windows box, the ratio between the threads and the processes scenario seems to be 1 to 5 which is a bit worse. I've heard that Solaris too is less efficient then Linux in this respect. The real downside is that a call from SQL to PL/Java using the current in-process approach is really fast. It takes about 5 micro secs on my 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18 micro secs on Linux and 64 micro secs on Windows. That's an overhead of between 440% and 1300% due to context switching alone. Yet, for some applications, perhaps that overhead is acceptable? It should be compared to the high memory consumption that the in-process approach undoubtedly results in (which in turn might lead to less optimal use of CPU caches and, if memory is insufficient, more time spent doing swapping). Given those numbers, it would be interesting to hear what the community as a whole thinks about this. Kind Regards, Thomas Hallgren Martijn van Oosterhout wrote: On Fri, Mar 24, 2006 at 11:51:30AM +0100, Thomas Hallgren wrote: Hi, I'm currently investigating the feasibility of an alternative PL/Java implementation that would use shared memory to communicate between a JVM and the backend processes. I would very much like to make use of the routines provided in shmem.c but I'm a bit uncertain how to add a segment for my own use. I'm wondering if a better way to do it would be similar to the way X does it. The client connects to the X server via a pipe (tcp/ip or unix domain). This is handy because you can block on a pipe. The client then allocates a shared memory segment and sends a message to the server, who can then also connect to it. The neat thing about this is that the client can put data in the shared memory segment and send one byte through the pipe and then block on a read. The JVM which has a thread waiting on the other end wakes up, processes the data, puts the result back and writes a byte to the pipe and waits. This wakes up the client who can then read the result. No locking, no semaphores, the standard UNIX semantics on pipes and sockets make sure everything works. In practice you'd probably end up sending small responses exclusively via the pipe and only use the shared memory for larger blocks of data but that's your choice. In X this is mostly used for image data and such. My questions are: 1. Do you see something right away that invalidates this approach? Nothing direct, though a single segment just for finding the JVM seems a lot. A socket approach would work better I think. 2. Is using the shared memory functionality that the backend provides a good idea (I'm thinking shmem functions, critical sections, semaphores, etc.). I'd rather depend on them then having conditional code for different operating systems. That I don't know. However, ISTM a lock-free approach is better wherever possible. If you can avoid the semaphores altogether... 3. Would it be better if the Postmaster allocated the global segment and started the JVM (based on some config parameter)? I don't know about the segment but the postmaster should start. I thought the tsearch guys had an approach using a co-process. I don't know how they start it up but they connected via pipes. Hope this helps, ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Shared memory
Martijn van Oosterhout wrote: On Mon, Mar 27, 2006 at 10:57:21AM +0200, Thomas Hallgren wrote: Martijn, I tried a Socket approach. Using the new IO stuff that arrived with Java 1.4 (SocketChannel etc.), the performance is really good. Especially on Linux where an SMP machine show a 1 to 1.5 ratio between one process doing ping-pong between two threads and two processes doing ping-pong using a socket. That's acceptable overhead indeed and I don't think I'll be able to trim it much using a shared memory approach (the thread scenario uses Java monitor locks. That's the most efficient lightweight locking implementation I've come across). Yeah, it's fairly well known that the distinctions between processes and threads on linux is much smaller than on other OSes. Windows is pretty bad, which is why threading is much more popular there. The real downside is that a call from SQL to PL/Java using the current in-process approach is really fast. It takes about 5 micro secs on my 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18 micro secs on Linux and 64 micro secs on Windows. That's an overhead of between 440% and 1300% due to context switching alone. Yet, for some applications, snip This might take some more measurements but AIUI the main difference between in-process and intra-process is that one has a JVM per connection, the other one JVM shared. In that case might thoughts are as follows: - Overhead of starting JVM. If you can start the JVM in the postmaster you might be able to avoid this. However, if you have to restart the JVM each process, that's a cost. - JIT overhead. For often used classes JIT compiling can help a lot with speed. But if every class needs to be reinterpreted each time, maybe that costs more than your IPC. - Memory overhead. You meantioned this already. - Are you optimising for many short-lived connections or a few long-lived connections? My gut feeling is that if someone creates a huge number of server-side java functions that performence will be better by having one always running JVM with highly JIT optimised code than having each JVM doing it from scratch. But this will obviously need to be tested. The use case with a huge number of short-lived connections is not feasible at all with PL/Java as it stands today. This is partly the reason for my current research. Another reason is that it's sometimes desirable to share resources between your connections. Dangerous perhaps, but an API that encourages separation and allows sharing in a controlled way might prove very beneficial. The ideal use-case for PL/Java is a client that utilizes a connection pool. And most servlet containers and EJB servers do. Scenarios where you have just a few and fairly long lived clients are OK too. One other thing is that seperate processes give you the ability to parallelize. For example, if a Java function does an SPI query, it can receive and process results in parallel with the backend generating them. This may not be easy to acheive with an in-process JVM. It is fairly easy to achieve using threads. Only one thread at a time may of course execute an SPI query but that's true when multiple processes are in place too since the backend is single-threaded, and since the logical thread in PL/Java must utilize the same backend as where the call originated (to maintain the transaction boundaries). Any result must also sooner or later be delivered using that same backend which further limits the ability to parallelize. Incidently, there are compilers these days that can compile Java to native. Is this Java stuff setup in such a way that you can compile your classes to native and load directly for the real speed-freaks? PL/Java can be used with GCJ although I don't think the GCJ compiler outranks the JIT compiler in a modern JVM. It can only do static optimizations whereas the JIT has runtime heuristics to base its optimizations on. In the test results I've seen so far, the GCJ compiler only gets the upper hand in very simple tests. The JIT generated code is faster when things are more complicated. GCJ is great if you're using short-lived connections (less startup time and everything is optimized from the very start) but the native code that it produces still needs a JVM of some sort. No interpreter of course but classes must be initialized, a garbage collector must be running etc. The shared native code results in some gain in memory consumption but it's not as significant as one might think. In that case, maybe you should concentrate on relibility and flexibility and still have a way out for functions that *must* be high-performance. Given time and enough resources, I'd like to provide the best of two worlds and give the user a choice whether or not the JVM should be external. Ideally, this should be controlled using configuration parameters so that its easy to test which scenario that works best. It's a lot of work though
Re: [HACKERS] Shared memory
Tom Lane wrote: It's only that much difference? Given all the other advantages of separating the JVM from the backends, I'd say you should gladly pay that price. If I'm right, and the most common scenario is clients using connection pools, then it's very likely that you don't get any advantages at all. Paying for nothing with a 440% increase in calling time (at best) seems expensive :-) Regards, Thomas Hallgren ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Shared memory
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: Tom Lane wrote: It's only that much difference? Given all the other advantages of separating the JVM from the backends, I'd say you should gladly pay that price. If I'm right, and the most common scenario is clients using connection pools, then it's very likely that you don't get any advantages at all. Paying for nothing with a 440% increase in calling time (at best) seems expensive :-) You are focused too narrowly on a few performance numbers. In my mind the primary advantage is that it will *work*. I do not actually believe that you'll ever get the embedded-JVM approach to production-grade reliability, because of the fundamental problems with threading, error processing, etc. My focus with PL/Java over the last year has been to make it a production-grade product and I think I've succeeded pretty well. The current list of open bugs is second to none. What fundamental problems are you thinking of that hasn't been solved already? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings