Re: [HACKERS] Patch queue - wiki (was varadic patch)
On Thu, Apr 3, 2008 at 12:35 AM, Bruce Momjian [EMAIL PROTECTED] wrote: It is not clear to me how a wiki can be easily created for 2k emails and then maintained in a reasonable way, or how emails can be added to it easily. That seems like a *really* odd thing for one of the founders of the world's most advanced OSS DBMS project to say. It's all relational (which we do do pretty well) - we can add links to the wiki to threads in the archives, and anything posted from then on is self-maintaining (except when new threads are started - but even if each patch gets 5 threads that's not a huge chore). I see no reason to go manually copying all 2k emails to the wiki. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
I had some thoughts about similar issues when looking at what it would take to make pl/java yum-installable. The end goal was to be able to say e.g. yum install pljava; echo create language pljava; | psql mydb. Currently there's a non-trivial install process involving running an sql script and java class. My idea was to have a createlang_init kind of function that could be called when installing a language to set up appropriate functions, tables etc. There would be a similar function to clean up when dropping the lang. On Thu, Apr 3, 2008 at 6:12 AM, Ron Mayer [EMAIL PROTECTED] wrote: Agreed. Such a mechanism would only really apply for things that are installed in the database. But from an end user's point of view, installing functions, index types, languages, data types, etc all see to fit the pg_install postgis -d mydb, pg_install pl_ruby -d mydb, etc. pattern pretty well. Well, there are a couple of major differences. Firstly cpan, gem etc are able to install all required dependencies themselves, at least where no native compilation is required, because they are basically their own platform. PG libs more or less require a build environment. Secondly, and more importantly, module installation for those environments happens once and is global; installation of native libs for pgsql is different to instllation in a database. What happens in the above scenario when the postgis libs are already installed? And what about cleanup? Also, it would seem that such an install process requires the server to be running - so much for packaging as RPMs/debs/win32 installer etc. I think a better solution would be to have a pg_install be a distribution mechanism capable of installing binaries / scripts / other resources, but have pgsql itself handle module installation into a particular database. I'm thinking a CREATE MODULE foo; kind of thing that would be capable of finding either a module install script or a foo_init() function in libfoo.so/foo.dll. Similarly for cleanup, so cleanup isn't dependent on pg_install lying around or the version that was install still being the latest when pg_install looks for an uninstall script. This would allow modules to be installed site-wide but optionally created / dropped from specific databases in a much saner manner, and standard pgsql permissions could apply to installation of modules. It would also allow creation of rpms etc that can be shipped by a distribution, and then enabled by the user by calling the appropriate command. Finally, setting up modules so they can be built for Windows, especially using MSVC, will probably be quite a challenge. Indeed. Seems ruby gems give you the option of installing a ruby version or a windows version that I'm guessing has pre-compiled object files. Yeah, setting up Cygwin to build postgres is a pain (or was when I last did so). If we're serious about setting up a central repository, we should consider having a virtualized windows machine capable of building binaries for the modules that people upload. Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
The closest analogy to what I'm thinking is the perl CPAN or ruby gems. I think this is more a developer thing. I don't think an ISP would want all that automagic (and certainly does not do that for joe user). One thing that might be worth looking at is an install command at the SQL level, so the INSTALL foo would run the install script for the foo module in the current database, assuming it's in the standard location. Yes. We don't have a central repository of non-standard modules, like CPAN, and so of course no facility for fetching / building / installing them. I think that is not a problem, since the service providers would rather want readily fetched built and regression tested modules, not anything fancy or magic. The readily built modules would simply be part of their binary distibution. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
Zeugswetter Andreas OSB SD wrote: The closest analogy to what I'm thinking is the perl CPAN or ruby gems. I think this is more a developer thing. I don't think an ISP would want all that automagic (and certainly does not do that for joe user). I think you are missing an essential part of the vision. This is not just targetted at developers. Binary distro authors typically include a huge number of CPAN modules as well as core Perl, and ISPs typically install them. We want to have something CPAN-like so we can get the same effect. At any rate, that's a bit blue sky right now. I haven't seen any disagreement with our kissing contrib goodbye as a name, so let's work on that. Unfortunately, that's going to involve a bit of pain, including in the buildfarm, whose client relies on the name. I'll go to work on fixing that, and we can get a new version out so when we make the switch the buildfarm doesn't go dark. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Hi! I have sent a patch to pgsql-patches: http://archives.postgresql.org/pgsql-patches/2008-04/msg00050.php ...which adds \G command to psql client. The idea of \G command is to perform the query, but with printing query results using extended table output format. For example: postgres=# SELECT * FROM pg_stat_activity; datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+--+-+--+--+-+-+---+---+---+-+- 11511 | postgres | 11729 | 10 | postgres | SELECT * FROM pg_stat_activity; | f | 2008-04-03 14:40:15.277272+02 | 2008-04-03 14:40:15.277272+02 | 2008-04-03 14:39:50.050512+02 | | -1 (1 row) postgres=# SELECT * FROM pg_stat_activity\G -[ RECORD 1 ]-+--- datid | 11511 datname | postgres procpid | 11729 usesysid | 10 usename | postgres current_query | SELECT * FROM pg_stat_activity waiting | f xact_start| 2008-04-03 14:41:47.533763+02 query_start | 2008-04-03 14:41:47.533763+02 backend_start | 2008-04-03 14:39:50.050512+02 client_addr | client_port | -1 postgres=# SELECT * FROM pg_stat_activity\g datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+--+-+--+--++-+---+---+---+-+- 11511 | postgres | 11729 | 10 | postgres | SELECT * FROM pg_stat_activity | f | 2008-04-03 14:42:09.940897+02 | 2008-04-03 14:42:09.940897+02 | 2008-04-03 14:39:50.050512+02 | | -1 (1 row) Comments anyone? Regards, Dawid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COPY Transform support
Hi, Here's a proposal for COPY to support the T part of an ETL, that is adding the capability for COPY FROM to Transform the data it gets. The idea is quite simple: adding to COPY FROM the option to run a function on the data before to call datatype_in functions. This needs some syntax addition to be worked out at the COPY side, then the COPY code will have to run the given function on the read data and consider giving the output of it to current COPY code (datatype input function). The function could either get the data as text or bytea, and would have to return either text or bytea. bytea seems the more sensible choice, as long as we don't lose encoding information there, which I'm not sure about. The syntax could be something like: COPY mytable FROM '/my/file.txt' WITH COLUMN x CONVERT USING myfunc; I tried to only add keywords already present in [1], while getting something meaningfull... and x is intended to be the column number, counting from 1. [1] http://www.postgresql.org/docs/8.3/static/sql-keywords-appendix.html Comments? -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 At any rate, that's a bit blue sky right now. I haven't seen any disagreement with our kissing contrib goodbye as a name, so let's work on that. Unfortunately, that's going to involve a bit of pain, Yes, I'm not sure I see the point of it. It's got a bad name, but changing it is just putting lipstick on a pig. End users don't know, and don't care, about contrib. Sysadmins and casual DBAs only care what they can yum install. That only leaves packagers and hard-core developers, both of whom already know how contrib works. Not that I wouldn't want to see some of the good ideas raised in this thread explored. In particular, I'd love to see some of the more standard contrib things installable as simple as: postgres=# INSTALL earthdistance; Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200804030953 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkf04VUACgkQvJuQZxSWSsjmPACeMoaDTXgjqXBKlthPad6D3sWV qooAn2y0cwnafYwnGonGBEq/6IAbXzlF =SO7r -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Hi, On Thu, Apr 3, 2008 at 6:47 PM, Dimitri Fontaine [EMAIL PROTECTED] wrote: Here's a proposal for COPY to support the T part of an ETL, that is adding the capability for COPY FROM to Transform the data it gets. The idea is quite simple: adding to COPY FROM the option to run a function on the data before to call datatype_in functions. This needs some syntax addition to be worked out at the COPY side, then the COPY code will have to run the given function on the read data and consider giving the output of it to current COPY code (datatype input function). The function could either get the data as text or bytea, and would have to return either text or bytea. bytea seems the more sensible choice, as long as we don't lose encoding information there, which I'm not sure about. The syntax could be something like: COPY mytable FROM '/my/file.txt' WITH COLUMN x CONVERT USING myfunc; I tried to only add keywords already present in [1], while getting something meaningfull... and x is intended to be the column number, counting from 1. [1] http://www.postgresql.org/docs/8.3/static/sql-keywords-appendix.html Comments? -- dim +1 Data transformation while doing a data load is a requirement now and then. Considering that users will have to do mass updates *after* the load completes to mend the data to their liking should be reason enough to do this while the loading is happening. I think to go about it the right way we should support the following: * The ability to provide per-column transformation expressions COPY mytable (col1 transform to col1 + 10, col2 transform to 'Post' || 'greSQL', col3...) FROM .. * The ability to use any kind of expressions while doing the transformation The transformation expression should be any expression (basically ExecEvalExpr) that can be evaluated to give a resulting value and obviously a corresponding is_null value too. It should and could be system in-built functions (e.g. UPPER, TRIM, TO_CHAR, TO_NUMBER etc.) or user defined functions too * The transformation expression can refer to other columns involved in the load. So that when the current row is extracted from the input file, the current values should be used to generate the new resultant values before doing a heap_form_tuple. E.g. (col1 transform col1 + 10, col2 transform col1 * col2, col3 transform UPPER(col1 || col3),...) I have spent some thoughts on how to do this and will be happy to share the same if the list is interested. Personally, I think data transformation using such expressions is a pretty powerful and important activity while doing the data load itself. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] modules
* Greg Sabino Mullane [EMAIL PROTECTED] [080403 09:54]: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. But that begs the question of *why* it's a kiss of death? For instance, in perl land, having something in CPAN and not in perl core is most certainly *not* a kiss of death? Why is it so different for PostgreSQL? Is it because the infrastructure behind CPAN is much better than that behind pgfoundry? Or is it because CPAN is better vetted and organized than pgfoundry? Or is it because the projects that go into CPAN are better quality and projects in pgroundry? Or is it something else? I'm pretty sure the answers to the above questions aren't all yes... a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 That line of argument could be used to justify putting anything and everything in core. I think that our extensible architecture is an important feature and one we should not hesitate to use to the fullest. I agree, but part of the problem here is that pgcrypto is extraordinary overkill for people who just want a better hash function than md5. Our extensible architecture is a feature, but our contrib/packaging/gborg/pgfoundry situation is a mess. It's only the efforts of the distro package maintainers that's kept things from being even worse. Here's what it boils down to for me: 1) Postgres has the md5() function, which is not ever getting removed. 2) Since it exists, people are using it. 3) Not having a builtin sha1() means we are less compatible with other databases. Fair? Perhaps not. But requiring an installation of pgcrypto, or plperl, is another hurdle to be cleared by people porting and using applications with Postgres as a backend. 4) We're also encouraging the use of md5() by making it the only option. Yes, we can talk about why people *shouldn't* use it for this purpose or that, but they will. 5) It seems unwise to go through the trouble of just adding sha1(), when we could easily add some better hashes, which has the nice side effect of making us stand out more and push the envelope, rather than play follow the leader, as was mentioned at PGCon East. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200804031020 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkf06AIACgkQvJuQZxSWSshJGACcDlE/sUBTJNx36zMW7C9G2FqE n0QAoLOj50gGura/g2JCk+3sFxR0cLb1 =K8sl -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch queue - wiki (was varadic patch)
The one concern I have with the way the last commitfest went (and I say this as strictly an observer), there was no discussion on anything. Now, I know that discussion happened, but it happened somewhere, in some web-forum, in a community that seems to generally promote mailing lists as the preferred method of discussion. As an observer, who generally doesn't have much input code wise, but occasionally might have an observation as a user, *I* would love to see the commitfest patch-queue be something pretty simple, along the lines of a big list of: 1) item name, submission date, author 2a) item intention (maybe a see $MSGID) 2b) item (see $MSGID) 3) status summary (in discussion, applied, needs $improvements, rejected, see $MSGID Note I said item because it appears as if the consensus is that the commit-fest has to deal with more than just patches, but also proposals, and fork-in-the-road details. And no, I don't think it should included the 2K emails. It should can the $N items needing to be dealt with, and a list of pointers to messages (which generally lead to threads), with a simple status list/summary for each one (again with pointers to $MSGID where specific information might be needed). Basically, I would like to see the patch queue be more a summary/pointer of/to discussion, then some web forum where the discussion happens. And I would like the mailling lists be where the discussion of items in the patch queue happens. But all this is the opinion of an observing devellopper, not involved in any of the heavy-lifting, but as someone who would like to keep an eye on what patches are presented, and their strengths/deficiencies, so that when I present my first patch/proposal, hopefully I can avoid most of the pitfalls. But don't cater to me. Cater to Tom and Bruce, who are the ones who actually use whatever is in place. Since they are the ones doing the work, I have to accept (or ignore) whatever system they use. a. * Bruce Momjian [EMAIL PROTECTED] [080402 19:36]: It is not clear to me how a wiki can be easily created for 2k emails and then maintained in a reasonable way, or how emails can be added to it easily. There are several steps: o getting those 2k emails to start the commit fest o getting them into a wiki in a way that is fast/efficient o updating the wiki for changes efficiently Keep in mind the patch emails are pretty dynamic. As you get closer to the end of the commit fest, the wiki is easier because the list of open items becomes more stable. I am able to give others the ability to add, move, and delete emails in my patch queue, if desired. If people want to use the wiki, go ahead --- this would be one less job for me to do. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] psql \G command -- send query and output using extended format
Dawid Kuroczko [EMAIL PROTECTED] writes: The idea of \G command is to perform the query, but with printing query results using extended table output format. Seems a bit useless --- if you prefer \x format, wouldn't you prefer it all the time? Or at least often enough that the toggling command is fine? I'm dubious that this is worth eating up a command letter for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Data transformation while doing a data load is a requirement now and then. Considering that users will have to do mass updates *after* the load completes to mend the data to their liking should be reason enough to do this while the loading is happening. I think to go about it the right way we should support the following: * The ability to provide per-column transformation expressions * The ability to use any kind of expressions while doing the transformation The transformation expression should be any expression (basically ExecEvalExpr) that can be evaluated to give a resulting value and obviously a corresponding is_null value too. It should and could be system in-built functions (e.g. UPPER, TRIM, TO_CHAR, TO_NUMBER etc.) or user defined functions too * The transformation expression can refer to other columns involved in the load. So that when the current row is extracted from the input file, the current values should be used to generate the new resultant values before doing a heap_form_tuple. E.g. (col1 transform col1 + 10, col2 transform col1 * col2, col3 transform UPPER(col1 || col3),...) I have spent some thoughts on how to do this and will be happy to share the same if the list is interested. Personally, I think data transformation using such expressions is a pretty powerful and important activity while doing the data load itself. Well, since COPY is about as fast as INSERT INTO ... SELECT plus the parsing overead, I suggest adding a special SELECT form that can read from a file instead of a table, which returns tuples, and which therefore can be used and abused to the user's liking. This is a much more powerful feature because : - there is almost no new syntax - it is much simpler for the user - lots of existing stuff can be leveraged EXAMPLE : Suppose I want to import a MySQL dump file (gasp !) which obviously contains lots of crap like -00-00 dates, '' instead of NULL, borken foreign keys, etc. Let's have a new command : CREATE FLATFILE READER mydump ( id INTEGER, dateTEXT, ... ) FROM file 'dump.txt' (followed by delimiter specification syntax identical to COPY, etc) ; This command would create a set-returning function which is basically a wrapper around the existing parser in COPY. Column definition gives a name and type to the fields in the text file, and tells the parser what to expect and what to return. It looks like a table definition, and this is actually pretty normal : it is, after all, very close to a table. INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00' ), ... FROM mydump WHERE (FKs check and drop the borken records); Now I can import data and transform it at will using a simple SELECT. The advantage is that everybody will know what to do without learning a new command, no awkward syntax (transform...), you can combine columns in expressions, JOIN to ckeck FKs, use ORDER to get a clustered table, anything you want, without any extension to the Postgres engine besides the creation of this file-parsing set-returning function, which should be pretty simple. Or, if I have a few gigabytes of logs, but I am absolutely not interested in inserting them into a table, instead I want to make some statistics, or perhaps I want to insert into my table some aggregate computation from this data, I would just : CREATE FLATFILE READER accesses_dump ( dateTEXT, ip INET, ... ) FROM file 'web_server_logtxt'; And I can do some stats without even loading the data : SELECT ip, count(*) FROM accesses_dump GROUP BY ip ORDER BY count(*) HAVING count(*) 1000; Much better than having to load those gigabytes just to make a query on them... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch queue - wiki (was varadic patch)
Aidan Van Dyk [EMAIL PROTECTED] writes: The one concern I have with the way the last commitfest went (and I say this as strictly an observer), there was no discussion on anything. Umm ... in the first place, the fest isn't over yet. In the second place, the reason you haven't seen much discussion is that we've been working primarily on the stuff that could be committed without much discussion. That underbrush has mostly been cleared away at this point, and we're starting to get down to the stuff that actually will need extended discussion. That should definitely happen on this list. The remaining open issues are listed here: http://momjian.us/cgi-bin/pgpatches Feel free to start talking about any of them ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
Greg Sabino Mullane wrote: At any rate, that's a bit blue sky right now. I haven't seen any disagreement with our kissing contrib goodbye as a name, so let's work on that. Unfortunately, that's going to involve a bit of pain, Yes, I'm not sure I see the point of it. It's got a bad name, but changing it is just putting lipstick on a pig. End users don't know, and don't care, about contrib. Sysadmins and casual DBAs only care what they can yum install. That only leaves packagers and hard-core developers, both of whom already know how contrib works. If this were at all true we would not not have seen the complaints from people along the lines of My ISP won't install contrib. But we have, and quite a number of times. We have concrete evidence that calling it contrib actually works against us. It's also worth pointing out that WE HAVE HAD THIS DISCUSSION BEFORE. Sometimes I get rather frustrated by our habit of turning time into a circle and running Groundhog Day. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 3, 2008 at 4:35 PM, Tom Lane [EMAIL PROTECTED] wrote: Dawid Kuroczko [EMAIL PROTECTED] writes: The idea of \G command is to perform the query, but with printing query results using extended table output format. Seems a bit useless --- if you prefer \x format, wouldn't you prefer it all the time? Or at least often enough that the toggling command is fine? I'm dubious that this is worth eating up a command letter for. No, the point is that I usually have mixed queries -- ones which are most comfortably viewed in normal format (many not-so-long rows), and ones which are best viewed expanded (little rows, many columns). Alternating between formats using \x is, at least for me, a bit cumbersome: usually _after_ I wrote a query I realize it would look more readable in expanded format, which is a bit too late. So I run the query, ctrl+c, \x, rerun the query... and forget to turn expanded mode off afterwards. I think that ability to decide about the format after the query, not before, can be quite useful especially when writing ad-hoc queries. Incidentally \g and \G is also used more or less similarily by our dolphin-loving friends -- which doesn't help using \G for other things. Regards, Dawid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Dimitri Fontaine [EMAIL PROTECTED] writes: Here's a proposal for COPY to support the T part of an ETL, that is adding the capability for COPY FROM to Transform the data it gets. The idea is quite simple: adding to COPY FROM the option to run a function on the data before to call datatype_in functions. The major concern I have about this is to ensure that no detectable overhead is added to COPY when the feature isn't being used. I am not actually convinced that the column-by-column design you seem to have in mind is worth anything. The examples that I remember seeing often involve removing columns, generating one column from multiple ones or vice versa, dealing with nonstandard column delimiters, etc. What would makes sense in my mind is a single function taking and returning text, which is invoked once on each complete input line before it is broken into fields. This is, of course, just a substitute for running a sed or perl or similar script over the data before feeding it to COPY --- and probably not an amazingly good substitute at that. For instance, assuming you like perl for text-wrangling, I'd fully expect the function approach to be slower than an external script because of the large overhead of getting into and out of libperl for each line, In situations where it's actually useful to apply SQL functions rather than text-mangling operations to the data, you always have the option to COPY into a temp table and then do INSERT/SELECT from there. So the whole thing seems just marginally attractive to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Le jeudi 03 avril 2008, PFC a écrit : CREATE FLATFILE READER mydump ( id INTEGER, dateTEXT, ... ) FROM file 'dump.txt' (followed by delimiter specification syntax identical to COPY, etc) ; [...] INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00' ), ... FROM mydump WHERE (FKs check and drop the borken records); What do we gain against current way of doing it, which is: COPY loadtable FROM 'dump.txt' WITH ... INSERT INTO destination_table(...) SELECT ... FROM loadtable; -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] COPY Transform support
On Thu, 2008-04-03 at 16:44 +0200, PFC wrote: CREATE FLATFILE READER mydump ( id INTEGER, dateTEXT, ... ) FROM file 'dump.txt' (followed by delimiter specification syntax identical to COPY, etc) ; Very cool idea, but why would you need to create a reader object first ? You should be able to use COPY directly with the target table being omitted, meaning the copy will not pump it's result in the target but be equivalent to a select... and use it in any place where a select can be used. This would have absolutely no new syntax, just the rules changed... Now that I had a second look you actually need the field definitions to meaningfully interpret the file, but then why not use a record specification instead of the table in the normal COPY command ? I'm not sure if there's any existing syntax for that but I would guess yes... In any case, such a feature would help a lot in processing input files based also on other existing data in the DB. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00' ), ... FROM mydump WHERE (FKs check and drop the borken records); What do we gain against current way of doing it, which is: COPY loadtable FROM 'dump.txt' WITH ... INSERT INTO destination_table(...) SELECT ... FROM loadtable; You read and write the data only once instead of twice (faster) if you want to import all of it. If you just want to compute some aggregates and store the results in a table, you just read the data once and don't write it at all. The advantages are the same than your proposed transformations to COPY, except I feel this way of doing it opens more options (like, you can combine columns, check FKs at load, do queries on data without loading it, don't necessarily have to insert the data in a table, don't have to invent a new syntax to express the transformations, etc). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
On Thu, 03 Apr 2008 16:57:53 +0200, Csaba Nagy [EMAIL PROTECTED] wrote: On Thu, 2008-04-03 at 16:44 +0200, PFC wrote: CREATE FLATFILE READER mydump ( id INTEGER, dateTEXT, ... ) FROM file 'dump.txt' (followed by delimiter specification syntax identical to COPY, etc) ; Very cool idea, but why would you need to create a reader object first ? You should be able to use COPY directly with the target table being omitted, meaning the copy will not pump it's result in the target but be equivalent to a select... and use it in any place where a select can be used. This would have absolutely no new syntax, just the rules changed... Now that I had a second look you actually need the field definitions to meaningfully interpret the file, Yeah, you need to tell Postgres the field names, types, and NULLness before it can parse them... or else it's just a plain flat text file which makes no sense... but then why not use a record specification instead of the table in the normal COPY command ? I'm not sure if there's any existing syntax for that but I would guess yes... Hm, yeah, that's even simpler, just create a type for the row (or just use table%ROWTYPE if you have a table that fits the description), and tell COPY to parse according to the row type definition... smart... Like : CREATE TYPE import_rowtype AS (id INTEGER, date TEXT); INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00' )::DATE FROM (COPY AS import_rowtype FROM 'mysql_trash.txt') AS foo WHERE (FKs check and drop the borken records); Looks clean... Obviously, in this case (and also in my proposal's case) you must use COPY and not \copy since it is the database server which will be reading the file. This could probably be hacked so the client sends the file via the \copy interface, too... In any case, such a feature would help a lot in processing input files based also on other existing data in the DB. Yeah, it would be cool. Also, since COPY TO can use a SELECT as a data source, you could use postgres to read from a file/pipe, process data, and write to a file/pipe (kinda better than sed, lol) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Greg Sabino Mullane wrote: 4) We're also encouraging the use of md5() by making it the only option. Yes, we can talk about why people *shouldn't* use it for this purpose or that, but they will. There is always the Java route - internal classes have package-scope constructors to specifically prevent them from being accidentally used (and relied on). I prefer the let them use it, but warn them not to have expectations route, which is what PostgreSQL is doing today. The above is not a legitimate reason to provide additional functions in the core. 5) It seems unwise to go through the trouble of just adding sha1(), when we could easily add some better hashes, which has the nice side effect of making us stand out more and push the envelope, rather than play follow the leader, as was mentioned at PGCon East This presumes that better hashes truly exist. It is basic math to show that all hashes will include collisions. Ignoring the possibility that one hash has theoretical better distribution for real documents, the real benefit of SHA-1 over MD5, is that it has more bits. The ultimate solution here, is to store the original using the full copy hash technique, with 0 chance of collision. This extreme defeats the purpose of a hash to start with. Why does PostgreSQL need something better than md5 as part of core? Bragging rights? Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, Apr 3, 2008 at 8:25 PM, Andrew Dunstan [EMAIL PROTECTED] wrote: If this were at all true we would not not have seen the complaints from people along the lines of My ISP won't install contrib. But we have, and quite a number of times. We have concrete evidence that calling it contrib actually works against us. It's hard to see ISPs who won't install contrib from installing ${random module} from the big bad internet as has been discussed in this thread, but who knows? If we go with a solution that allows users to say install mymodule; or whatever into their own database, is there any reason not to install (as in make install) all modules currently called contrib by default? Are there any security issues with modules in there? I seem to remember something coming up involving dblink a while back... Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 21:03:05 +0530 Tom Dunstan [EMAIL PROTECTED] wrote: On Thu, Apr 3, 2008 at 8:25 PM, Andrew Dunstan [EMAIL PROTECTED] wrote: If this were at all true we would not not have seen the complaints from people along the lines of My ISP won't install contrib. But we have, and quite a number of times. We have concrete evidence that calling it contrib actually works against us. It's hard to see ISPs who won't install contrib from installing ${random module} from the big bad internet as has been discussed in this thread, but who knows? Sure it is. The very word contrib brings about ideas of things like: Unstable, Cooker, unofficial. modules is completely different (from a perception perspective). IMO the core modules should be compiled via configure with something like: ./configure --enable-module=ALL or ./configure --enable-module=pgcrypto --enable-module=cube This would install all the modules but not enable them in the database itself (of course). This could also be extended to the pls so that we have exactly one mechanism to control those options as well. ./configure --enable-module=pgcrypto --enable-module=plperl Sincerely, Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9PwCATb/zqfZUUQRAoWtAKCdbdcv4KdOIdiF8gcjebWTIrub1gCgg8RU QaatCVhlETRkA6+5wyYNdRM= =z1gI -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Le jeudi 03 avril 2008, Tom Lane a écrit : The major concern I have about this is to ensure that no detectable overhead is added to COPY when the feature isn't being used. Well, when COLUMN x CONVERT USING or whatever syntax we choose is not used, we default to current code path, that is we do not mess with data content at all before to consider it's valid input syntax for target table datatypes. And the syntax check is done only once, before beginning to read the data lines from the file. I am not actually convinced that the column-by-column design you seem to have in mind is worth anything. The examples that I remember seeing often involve removing columns, generating one column from multiple ones or vice versa, dealing with nonstandard column delimiters, etc. Yes, this is another need, but actually better solved, in my opinion, with loading data into a (temp) loadtable then process it with SQL: INSERT INTO destination_table SELECT whatever FROM loadtable; The problem I'm trying to solve is not this one, I'm trying to have COPY able to load data into a table when the representation of it we have into the file does not match what datatype input function expects. An example might help us talking about the same thing. mysqldump CSV outputs timestamp sometimes (depending on server version) as '20041002152952' when PostgreSQL expects '2004-10-02 15:29:52'. I'd like COPY to be able to cope with this situation. Now, another syntax proposal could have both the needs solved. We basically need to be able to transform input fields and process them into input columns, in a way that N input fields (found in the data file) will get us M input columns: COPY destination_table(col1, col2, col3, col4) USING (field1, field2 || field3, myfunc(field4, field5)) FROM 'file.txt' WITH ... This could get better than preprocessing then COPY then INSERT INTO ... SELECT because we don't need a temp table (don't need to care about its name being unique, nor to mess up with temp_buffers), etc. You're the one able to tell why it'll be better to have one COPY command instead of a two table steps load, I'm just guessing ;) And if it's better for the user to preprocess in perl then COPY, he still has the option. -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Patch queue - wiki (was varadic patch)
Dave Page wrote: On Thu, Apr 3, 2008 at 12:35 AM, Bruce Momjian [EMAIL PROTECTED] wrote: It is not clear to me how a wiki can be easily created for 2k emails and then maintained in a reasonable way, or how emails can be added to it easily. That seems like a *really* odd thing for one of the founders of the world's most advanced OSS DBMS project to say. It's all relational (which we do do pretty well) - we can add links to the wiki to threads in the archives, and anything posted from then on is self-maintaining (except when new threads are started - but even if each patch gets 5 threads that's not a huge chore). I see no reason to go manually copying all 2k emails to the wiki. Well, I am waiting for someone to show me how it is done because I can't figure out a way. Do it and I will gladly stop doing what I am doing. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small TRUNCATE glitch
Just noticed that TRUNCATE fails to clear the stats collector's counts for the table. I am not sure if it should reset the event counts or not (any thoughts?) but surely it is wrong to not zero the live/dead tuple counts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
Am Donnerstag, 3. April 2008 schrieb Andrew Dunstan: If this were at all true we would not not have seen the complaints from people along the lines of My ISP won't install contrib. But we have, and quite a number of times. We have concrete evidence that calling it contrib actually works against us. ISPs also won't install additional Perl modules, for example. Yet, CPAN does exist successfully. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Dawid Kuroczko wrote: On Thu, Apr 3, 2008 at 4:35 PM, Tom Lane [EMAIL PROTECTED] wrote: Dawid Kuroczko [EMAIL PROTECTED] writes: The idea of \G command is to perform the query, but with printing query results using extended table output format. Seems a bit useless --- if you prefer \x format, wouldn't you prefer it all the time? Or at least often enough that the toggling command is fine? I'm dubious that this is worth eating up a command letter for. No, the point is that I usually have mixed queries -- ones which are most comfortably viewed in normal format (many not-so-long rows), and ones which are best viewed expanded (little rows, many columns). Alternating between formats using \x is, at least for me, a bit cumbersome: usually _after_ I wrote a query I realize it would look more readable in expanded format, which is a bit too late. So I run the query, ctrl+c, \x, rerun the query... and forget to turn expanded mode off afterwards. I think that ability to decide about the format after the query, not before, can be quite useful especially when writing ad-hoc queries. Incidentally \g and \G is also used more or less similarily by our dolphin-loving friends -- which doesn't help using \G for other things. It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: o Add auto-expanded mode so expanded output is used if the row length is wider than the screen width. Consider using auto-expanded mode for backslash commands like \df+. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printTable API (was: Show INHERIT in \du)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 31/03/2008, Tom Lane wrote: Brendan Jurd writes: 1. describe malloc's the cells to zero, but print just does a local calloc without any initialisation. There isn't any functional difference there. I am not sure, but I think the reason print.c has its own malloc wrappers instead of depending on common.c's is that we use print.c in some bin/scripts/ programs that do not want common.c too. Yeah, it looks like createlang and droplang use print.c to emit a list of installed languages. 2. describe only does an mbvalidate for WIN32, but print does it in all cases. I don't know why describe only does that for WIN32; it looks inconsistent to me too. Possibly some trolling in the CVS history would give a clue about this. Well, mbvalidate was originally added to print.c in 2001, as part of a big patch to add multibyte support to psql [1]. However, it was only added to describe much later (2003) in response to a bug report about 8-bit characters not displaying correctly on the Windows console [2]. I think that because the bug was only observed in Windows, the patch was added #ifdef WIN32, even though print.c was already using mbvalidate for all content. This nicely illustrates the nuisance inherent to duplication of code! Based on this, I'm going to go ahead with using mbvalidate in all cases. Cheers, BJ [1] http://repo.or.cz/w/PostgreSQL.git?a=commit;h=a428cef1 [2] http://repo.or.cz/w/PostgreSQL.git?a=commit;h=e6a16c17 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFH9QFt5YBsbHkuyV0RAv2ZAJ4/rfyjgFOh8XZo6aJo68dz5NsovQCgmf40 fCXMlsHdg1r4oTpfZD5DH+0= =PrN1 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
On Thu, Apr 3, 2008 at 4:34 AM, Tom Lane [EMAIL PROTECTED] wrote: The right way seems to be to treat our own insertions as live during ANALYZE, but then subtract off our own pending insertions from the live-tuples count sent to the stats collector. pgstat_report_analyze() can handle the latter part by groveling through the backend's pending statistics data. Seems like a right approach to me. I assume we shall do the same for DELETE_IN_PROGRESS tuples. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, Apr 3, 2008 at 9:17 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: It's hard to see ISPs who won't install contrib from installing ${random module} from the big bad internet as has been discussed in this thread, but who knows? Sure it is. The very word contrib brings about ideas of things like: Unstable, Cooker, unofficial. Point taken, and I completely agree. Part of the problem is that we have explicitly encouraged this perception, ie it's in contrib so the barrier to entry is lower. That may not be the case anymore, or it may just be that the bar is really really high for non-contrib stuff vs other projects. Whatever the actual case is, I agree that the name is unfortunate. When I wrote the above I was thinking about it from the other way around: doing a cpan or gem install of some random module seems even less safe to me, but maybe I'm just revealing confidence in pgsql or fear of some cpan code etc that ISPs don't share. This would install all the modules but not enable them in the database itself (of course). This could also be extended to the pls so that we have exactly one mechanism to control those options as well. ./configure --enable-module=pgcrypto --enable-module=plperl That's basically where I was heading, although I took it a step further: why not build and install all possible modules by default, if we think they're up to quality? One answer is: what do you do if some required library isn't available? Do you fail with an error message or just don't build that module? I don't like the idea of e.g. accidentally and silently not installing pl/perl just because the sysadmin hadn't installed their perl-devel package or whatever. --enable-module=ALL could be pretty good, though, especially if it build pl/perl etc that most sysadmins will want to install but do so in less configure args. :) Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 21:45:52 +0530 Tom Dunstan [EMAIL PROTECTED] wrote: This would install all the modules but not enable them in the database itself (of course). This could also be extended to the pls so that we have exactly one mechanism to control those options as well. ./configure --enable-module=pgcrypto --enable-module=plperl That's basically where I was heading, although I took it a step further: why not build and install all possible modules by default, if we think they're up to quality? Good point. One answer is: what do you do if some required library isn't available? If we build by default, then when a library isn't found the configure output tells you: Looking for Perl Development packages: No , disabling plperl build. --enable-module=ALL could be pretty good, though, especially if it build pl/perl etc that most sysadmins will want to install but do so in less configure args. :) Right. I am using the Apache model here. Sincerely, Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9QQ2ATb/zqfZUUQRAsD9AJ9b9/12ZtaJ/CpnQ3y0xH7U3a0EYACfVeUJ FKUyEmuuw9nx3F+sk4mL2eQ= =sA7I -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
Aidan Van Dyk wrote: * Greg Sabino Mullane [EMAIL PROTECTED] [080403 09:54]: I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. But that begs the question of *why* it's a kiss of death? For instance, in perl land, having something in CPAN and not in perl core is most certainly *not* a kiss of death? Why is it so different for PostgreSQL? Is it because the infrastructure behind CPAN is much better than that behind pgfoundry? I wouldn't say one is better than the other. PGFoundry and CPAN have totally disjoint feature sets. PgFoundry's like SoruceForge + Bugtrackers + Discussion Forums + Surveys + Mailing Lists -- pretty much everything except installable packages. CPAN and RubyGems is very much focused on installable packages. Or is it because CPAN is better vetted and organized than pgfoundry? Or is it because the projects that go into CPAN are better quality and projects in pgroundry? To simplify those two: CPAN contains installers that mostly just work. PGFoundry contains mostly works-in-progress without installers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Mark Mielke wrote: This presumes that better hashes truly exist. It is basic math to show that all hashes will include collisions. Ignoring the possibility that one hash has theoretical better distribution for real documents, the real benefit of SHA-1 over MD5, is that it has more bits. The ultimate solution here, is to store the original using the full copy hash technique, with 0 chance of collision. This extreme defeats the purpose of a hash to start with. Why does PostgreSQL need something better than md5 as part of core? Bragging rights? Having more than one hash algorithm significantly decreases the risk of (common) collisions. As a non-developer (who does track most messages on the list anyways), I surely find the SHA* functions will add significantly value and they should be easy to install (well-defined functions) with no maintainance afterwards. Hashes are an absolute minimum for keeping passwords stored somehat safely in a database. More two or even three different hashes with different collion-points will strongly increase the security. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Svenne Krap wrote: Mark Mielke wrote: This presumes that better hashes truly exist. It is basic math to show that all hashes will include collisions. Ignoring the possibility that one hash has theoretical better distribution for real documents, the real benefit of SHA-1 over MD5, is that it has more bits. The ultimate solution here, is to store the original using the full copy hash technique, with 0 chance of collision. This extreme defeats the purpose of a hash to start with. Why does PostgreSQL need something better than md5 as part of core? Bragging rights? Having more than one hash algorithm significantly decreases the risk of (common) collisions. No it doesn't. More bits reduces risk of collisions. Additional algorithms just muddy the waters. As a non-developer (who does track most messages on the list anyways), I surely find the SHA* functions will add significantly value and they should be easy to install (well-defined functions) with no maintainance afterwards. Hashes are an absolute minimum for keeping passwords stored somehat safely in a database. It has yet to be proven that MD5 is insufficient for this purpose. Significant value being what? More two or even three different hashes with different collion-points will strongly increase the security. No it doesn't unless you are thinking about a security through obscurity argument. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
D'Arcy J.M. Cain wrote: Check out NetBSD pkgsrc as a model. It is very flexible. One nice thing would be the ability to specify where the packages are rather than always insisting that they be on pgfoundry. Yup - a feature shared by RubyGems: gem install rails –source http://gems.rubyonrails.org Many of the most popular modules seem to live outside of pgfoundry anyway (postgis, the contrib ones, etc); so I'd think even if we maintain a central repository we want to make sure it can install from other sites. Perl and Ruby are languages - Postgres is a very different animal. ...Overall though I don't think that what is being installed to changes much. The basics remain the same - define the package with latest version, download if necessary,check that the source package is the correct, tested one, build, install, register. +1. From the end user I think he cares that the software is installed with the required dependencies and passes any included regression tests. Bonus points if it also registers itself in his database. And in the ruby/gems world the Windows guys seem not to have liked the check...source packages...build so they include precompiled windows libraries for those guys in many Ruby Gems. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
Pavan Deolasee [EMAIL PROTECTED] writes: Please see the attached patch. One change I made is to hold the SHARE lock on the page while ANALYZE is reading tuples from it. I thought it would be a right thing to do instead of repeatedly acquiring/releasing the lock. I've applied a modified/extended form of this patch for 8.3.2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, 3 Apr 2008 13:54:11 - Greg Sabino Mullane [EMAIL PROTECTED] wrote: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. Yes! I have plenty of FTP servers to put up my own open source projects. It would annoy me if I was forced to use someone else's development environment. Whatever we do should allow for packages to be picked up from anywhere. We can use MD5 checksums to assure users that no one has changed the file since it was tested and packaged. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/04/2008, Joshua D. Drake wrote: Tom Dunstan wrote: One answer is: what do you do if some required library isn't available? If we build by default, then when a library isn't found the configure output tells you: Looking for Perl Development packages: No , disabling plperl build. That might easily go unnoticed in amongst all the other configure output. It would only be effective if the messages were repeated again at the end of the configure, or configure somehow draws attention to the fact that there was a problem. Another approach I've come across is to fail with an error message like Perl development files not found, required to build module plperl. Install these files or configure with --disable-module=plperl Cheers, BJ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFH9Qfu5YBsbHkuyV0RAmKIAJ9eBkAGaw5kBmahk4CzJ4JbrkmitACff9DB eYYSl1SiANAaAyky/3QBSIs= =Fg12 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 12:35:31 -0400 D'Arcy J.M. Cain [EMAIL PROTECTED] wrote: On Thu, 3 Apr 2008 13:54:11 - Greg Sabino Mullane [EMAIL PROTECTED] wrote: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. Pgfoundry is not a kiss of death except that you spread falsehoods like that. PGfoundry is a very alive project that is constantly adding content and has continuing and very active projects. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9QjVATb/zqfZUUQRAmuMAKCR/+mgHqB9TTsdI0G3Ax2Y5ry4SQCfQMNt d7+jcUa3pDirWo34n7dqg2o= =p4Oq -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
Andrew Dunstan wrote: Ron Mayer wrote: Andrew Dunstan wrote: Tom Lane wrote: as having better system support for packages or modules or whatever you want to call them; and maybe we also need some marketing-type ...re-raise the question of getting rid of contrib... The PostgreSQL Standard Modules. While renaming, could we go one step further and come up with a clear definition of what it takes for something to qualify as a module? In particular I think standardizing the installation would go a long way to letting packagers automate the installation of modules from pgfoundry. I think it'd be especially cool if one could one-day have a command pg_install_module [modulename] -d [databasename] and it would magically get (or verify that it had) the latest version from pgfoundry; compile it (if needed) and install it in the specified database. The closest analogy to what I'm thinking is the perl CPAN or ruby gems. Yes, and the CPAN analogy that has been in several minds, but it only goes so far. Perl and Ruby are languages - Postgres is a very different animal. We do in fact have some support for building / installing some modules in a standard way. It's called pgxs and it is used by quite a number of existing modules. On Windows we also have the StackBuilder application which is used for installation of binary modules. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Bruce Momjian escribió: It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: o Add auto-expanded mode so expanded output is used if the row length is wider than the screen width. Consider using auto-expanded mode for backslash commands like \df+. Some sort of \x auto? Sounds interesting ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, 3 Apr 2008 09:41:57 -0700 Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 12:35:31 -0400 D'Arcy J.M. Cain [EMAIL PROTECTED] wrote: On Thu, 3 Apr 2008 13:54:11 - Greg Sabino Mullane [EMAIL PROTECTED] wrote: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. Pgfoundry is not a kiss of death except that you spread falsehoods like that. PGfoundry is a very alive project that is constantly adding content and has continuing and very active projects. Eep! Careful with attributions. There is not a single word of mine in what you included. I know it technically says that but since your comments were directed at Greg you really should have replied to his email and not to mine that included his. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 03, 2008 at 12:07:54PM -0400, Bruce Momjian wrote: Alternating between formats using \x is, at least for me, a bit cumbersome: usually _after_ I wrote a query I realize it would look more readable in expanded format, which is a bit too late. So I run the query, ctrl+c, \x, rerun the query... and forget to turn expanded mode off afterwards. It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: I was thinking that maybe \x should have a one-shot mode, i.e. \x query does it only for this one statement. It would solve the OPs problem. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] modules
On Apr 3, 2008, at 7:01 AM, Aidan Van Dyk wrote: * Greg Sabino Mullane [EMAIL PROTECTED] [080403 09:54]: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. But that begs the question of *why* it's a kiss of death? For instance, in perl land, having something in CPAN and not in perl core is most certainly *not* a kiss of death? Why is it so different for PostgreSQL? Is it because the infrastructure behind CPAN is much better than that behind pgfoundry? Yes. I can install a package from a CPAN mirror with a one-line incantation and be sufficiently sure it works that on the very rare occasions it doesn't I'm really surprised. On the Windows end of things I can usually get pre-built binaries of those same packages installed, in the cases where a compiler is needed to build them. The exact process is a bit different, but it's consistent across most packages and uses the same namespace. Or is it because CPAN is better vetted and organized than pgfoundry? Partly. Vetted is partly self-vetting - you're expected to pass your self tests and install cleanly before you publish to CPAN. The naming hierarchy helps with the CPAN organization, and makes it easier to use than the trove approach, once you're familiar with the perl namespace habits. Some of that is applicable to a postgresql package distribution method, but the neat organization is a perl thing, not a CPAN thing, so that idea doesn't really transfer. Or is it because the projects that go into CPAN are better quality and projects in pgroundry? Partly. There are some dubious packages on CPAN but they're finished, and with extremely few exceptions download, pass their self tests and do what it says on the box (the main flaws are packages going stale and occasionally dependency problems). Pgfoundry is a development site with a search engine and has projects in various stages of completion from vaporware to production tested usable code. Or is it something else? Projects vs Packages sums up the differences. Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
On Thu, Apr 3, 2008 at 10:02 PM, Tom Lane [EMAIL PROTECTED] wrote: I've applied a modified/extended form of this patch for 8.3.2. Thanks. I had another concern about VACUUM not reporting DEAD line pointers (please see up thread). Any comments on that ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
On Thu, Apr 03, 2008 at 06:14:17PM +0200, Svenne Krap wrote: Hashes are an absolute minimum for keeping passwords stored somehat safely in a database. More two or even three different hashes with different collion-points will strongly increase the security. Not only that, but they also increase the complexity of the system. Increases in complexity tend to mean decreases in reliability and, by implication, security. As an example, someone may do some fancy cryptanalysis and discover that having lots of hashes will actually make it easier. As another point, most passwords have significantly less state than a 128bit hash allowing attacks like rainbow tables become viable. Sam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 12:46:30 -0400 D'Arcy J.M. Cain [EMAIL PROTECTED] wrote: On Thu, 3 Apr 2008 09:41:57 -0700 Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 12:35:31 -0400 D'Arcy J.M. Cain [EMAIL PROTECTED] wrote: On Thu, 3 Apr 2008 13:54:11 - Greg Sabino Mullane [EMAIL PROTECTED] wrote: Right now contrib is a real catch-all of various things; it would be nice to categorize them somehow. And by categorize, I emphatically do NOT mean move to pgfoundry, which is pretty much a kiss of death. Pgfoundry is not a kiss of death except that you spread falsehoods like that. PGfoundry is a very alive project that is constantly adding content and has continuing and very active projects. Eep! Careful with attributions. There is not a single word of mine in what you included. I know it technically says that but since your comments were directed at Greg you really should have replied to his email and not to mine that included his. Sorry Darcy :). Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9Qv2ATb/zqfZUUQRAtBCAJ4yRvm6IydAstjb06G2mM8XhkVfPACfdmCy oa3KN6PmkXzZgFlFOSHseVk= =T5sZ -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, 03 Apr 2008 09:31:01 -0700 Ron Mayer [EMAIL PROTECTED] wrote: D'Arcy J.M. Cain wrote: Check out NetBSD pkgsrc as a model. It is very flexible. One nice thing would be the ability to specify where the packages are rather than always insisting that they be on pgfoundry. Yup - a feature shared by RubyGems: gem install rails ?source http://gems.rubyonrails.org Yes but what I am suggesting goes beyond that. My idea is that there is a modules directory that contains a file for each installable module. This file would contain all the information about the module such as name, version, where to get the actual package, an MD5 checksum of the package, minimum and maximum PostgreSQL versions required, etc. Naturally we should allow for people to define their own local packages as well. In fact, this may be the way to deprecate contrib. Start building modules and move the contrib packages to it one at a time. That way people using contrib have some time to switch and we can point people to modules if they are just starting out. Is there support for this idea? I would like to start exploring this if so. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
On Thu, Apr 03, 2008 at 11:58:11AM -0400, Tom Lane wrote: Just noticed that TRUNCATE fails to clear the stats collector's counts for the table. I am not sure if it should reset the event counts or not (any thoughts?) but surely it is wrong to not zero the live/dead tuple counts. Wern't there complaints from people regularly truncating and refilling tables getting bad plans because they lost the statistics? Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] psql \G command -- send query and output using extended format
Alvaro Herrera wrote: Bruce Momjian escribi?: It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: o Add auto-expanded mode so expanded output is used if the row length is wider than the screen width. Consider using auto-expanded mode for backslash commands like \df+. Some sort of \x auto? Sounds interesting ... Yep. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
D'Arcy J.M. Cain wrote: In fact, this may be the way to deprecate contrib. Start building modules and move the contrib packages to it one at a time. That way people using contrib have some time to switch and we can point people to modules if they are just starting out. Is there support for this idea? I would like to start exploring this if so. No. I don't want to deprecate it, I want to get rid of it, lock, stock and barrel. If you think that we need more than renaming then we can discuss it, but I don't want a long death, I want one that is certain and swift. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Martijn van Oosterhout wrote: -- Start of PGP signed section. On Thu, Apr 03, 2008 at 12:07:54PM -0400, Bruce Momjian wrote: Alternating between formats using \x is, at least for me, a bit cumbersome: usually _after_ I wrote a query I realize it would look more readable in expanded format, which is a bit too late. So I run the query, ctrl+c, \x, rerun the query... and forget to turn expanded mode off afterwards. It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: I was thinking that maybe \x should have a one-shot mode, i.e. \x query does it only for this one statement. It would solve the OPs problem. But break for others who want all output \x. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Martijn van Oosterhout [EMAIL PROTECTED] writes: On Thu, Apr 03, 2008 at 11:58:11AM -0400, Tom Lane wrote: Just noticed that TRUNCATE fails to clear the stats collector's counts for the table. I am not sure if it should reset the event counts or not (any thoughts?) but surely it is wrong to not zero the live/dead tuple counts. Wern't there complaints from people regularly truncating and refilling tables getting bad plans because they lost the statistics? Not related --- the planner doesn't look at pgstats data. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Mark Mielke wrote: More two or even three different hashes with different collion-points will strongly increase the security. No it doesn't unless you are thinking about a security through obscurity argument. It is really the same argument on all your questions If I have a simple table now ID serial Username varchar Password varchar I currently save only md5(id || username || 'password')* into password, if I had access to sha1 (for example) i would add another password column so, having for example ID serial Username varchar Password_md5 varchar Password_sha1 varchar No matter how you see it, I get more bits of hash to check against. I would drop md5 totally and use sha1 and ripemd-160 if possible.. but currently i use only md5 as it is the only available one.. Loading pgcrypto is overkill for something as simple as hash-functions. Svenne * I prepend the id and the username to guard users with weak passwords against known hashvalues (rainbow tables) should the box ever get comprised ... if you are in doubt about the value of this, try google for 40e94aa51dc5c0ccc5aad4e6aefdde2a and guess the secret password... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
Pavan Deolasee [EMAIL PROTECTED] writes: Thanks. I had another concern about VACUUM not reporting DEAD line pointers (please see up thread). Any comments on that ? If you want to work on that, go ahead, but I wanted it separate because I didn't think it merited back-patching. It's strictly cosmetic in terms of being about what VACUUM VERBOSE prints, no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Tom Lane [EMAIL PROTECTED] writes: Dimitri Fontaine [EMAIL PROTECTED] writes: Here's a proposal for COPY to support the T part of an ETL, that is adding the capability for COPY FROM to Transform the data it gets. The idea is quite simple: adding to COPY FROM the option to run a function on the data before to call datatype_in functions. The major concern I have about this is to ensure that no detectable overhead is added to COPY when the feature isn't being used. I am not actually convinced that the column-by-column design you seem to have in mind is worth anything. The examples that I remember seeing often involve removing columns, generating one column from multiple ones or vice versa, dealing with nonstandard column delimiters, etc. What would makes sense in my mind is a single function taking and returning text, which is invoked once on each complete input line before it is broken into fields. I think not having to deal with separating fields is actually one of the few reasons to do this within COPY. If you can separate out yourself or need to do something more clever than COPY is capable of to split the columns then you're better off preprocessing it with perl or something anyways. To that end all the other use cases you describe could be handled with his plan. There's nothing stopping you from doing CREATE READER foo (a integer, b integer) INSERT INTO b (SELECT a+b FROM foo); or INSERT INTO b (SELECT 1, a, b, greatest(a,b) FROM foo) However I'm not sure we even need new syntax for CREATE READER. I would think something like this would make more sense: CREATE FUNCTION transform(integer, integer) RETURNS SETOF b; COPY b FROM 'foo' USING transform(integer,integer); So the whole thing seems just marginally attractive to me. Everything about ETL is only marginally attractive, but it's something people spend a lot of time doing. Nobody's come up with any particularly clean solutions I think. AFAIK the state of the art is actually to load the data into a table which closely matches the source material, sometimes just columns of text. Then copy it all to another table doing transformations. Not impressed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, 03 Apr 2008 13:06:25 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: D'Arcy J.M. Cain wrote: In fact, this may be the way to deprecate contrib. Start building modules and move the contrib packages to it one at a time. That way people using contrib have some time to switch and we can point people to modules if they are just starting out. Is there support for this idea? I would like to start exploring this if so. No. I don't want to deprecate it, I want to get rid of it, lock, stock and barrel. If you think that we need more than renaming then we can discuss it, but I don't want a long death, I want one that is certain and swift. Well, OK, but given that this is a huge public project with lots of users expecting things to be in certain places, how fast do you think we could make such a change. It seems to me that we are going to have to make things look the same for some time at least otherwise we are going to have lots of complaints. How swift is swift? To me, swift means add the alternate functionality to the next release and remove the old in the release after. Do you see things happening any faster? -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, Apr 3, 2008 at 10:36 PM, Andrew Dunstan [EMAIL PROTECTED] wrote: No. I don't want to deprecate it, I want to get rid of it, lock, stock and barrel. If you think that we need more than renaming then we can discuss it, but I don't want a long death, I want one that is certain and swift. I'll admit that I had thought that moving contrib modules over to a modules dir as they were, uh, modularized would be the way forward. Anything that doesn't fit the database-owner-installable pattern (pgbench? start-scripts? others?) could end up in a utils dir, and anything left in contrib shows us what's left to do before e.g. 8.4. The end goal would be no more contrib dir by the next major release. As a side note, how were you intending to rename contrib? Directory shenanigans in CVS are horrible, particularly if you want all your old branches to still work. Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Svenne Krap wrote: If I have a simple table now ID serial Username varchar Password varchar I currently save only md5(id || username || 'password')* into password, if I had access to sha1 (for example) i would add another password column so, having for example ID serial Username varchar Password_md5 varchar Password_sha1 varchar No matter how you see it, I get more bits of hash to check against. Really? Why stop at two, then? How many hash functions is enough? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] best way for export gram.y symbols
Hello I would to use main scanner from plpgsql. I need some values from parser/parse.h #define SELECT 543 #define PARAM 642 and YYSTYPE any ideas how to do it? I would not copy it by hand. Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 13:27:03 -0400 D'Arcy J.M. Cain [EMAIL PROTECTED] wrote: Well, OK, but given that this is a huge public project with lots of users expecting things to be in certain places, how fast do you think we could make such a change. 8.4. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD4DBQFH9RY/ATb/zqfZUUQRAu0SAJ9+bnPyHmVIRb/QgbD8plEmGBRC2gCY0uS2 L+stcsM5h97QAzT23VD8zw== =z+FW -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Mark Mielke wrote: Svenne Krap wrote: Mark Mielke wrote: Svenne Krap wrote: More two or even three different hashes with different collion-points will strongly increase the security. No it doesn't unless you are thinking about a security through obscurity argument Your logic is invalid - the best quality would be to not use a hash at all, and store in plain text, or ROT-13. Then you will have no collisions. If you truly believe more bits are better, don't use a hash to start with. Ooops, went offlist by a wrong click. Putting it back onliste I am aware that plain text (or any 1:1 mapping) has no chance of collision, but on the other hand if the box is compromised it gives an easy target for stealing passwords (and a lot of users use the same passwords a lot of places). I believe that hashing through one hash function is an acceptable compromise between collisions (i.e. people get in with the wrong password) and password safety (evil hacker cannot read passwords) given you deploy anti rainbow table meassures. I would still prefer two hash functions as they do add a better safeguard towards collisions (the gentoo distribtion actually hashes the files by three different algorithms SHA1, SHA256 and RMD160) - i would be inclined to use three hashes too, if they were instantly available. Svenne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
D'Arcy J.M. Cain wrote: On Thu, 03 Apr 2008 13:06:25 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: D'Arcy J.M. Cain wrote: In fact, this may be the way to deprecate contrib. Start building modules and move the contrib packages to it one at a time. That way people using contrib have some time to switch and we can point people to modules if they are just starting out. Is there support for this idea? I would like to start exploring this if so. No. I don't want to deprecate it, I want to get rid of it, lock, stock and barrel. If you think that we need more than renaming then we can discuss it, but I don't want a long death, I want one that is certain and swift. Well, OK, but given that this is a huge public project with lots of users expecting things to be in certain places, how fast do you think we could make such a change. It seems to me that we are going to have to make things look the same for some time at least otherwise we are going to have lots of complaints. How swift is swift? To me, swift means add the alternate functionality to the next release and remove the old in the release after. Do you see things happening any faster? I don't understand this at all. We are talking about directory and package organisation here. How do you do that with transition arrangements? I guess we could put in a symlink from contrib, but I just don't see the point. I don't think we are under any obligation to preserve the way we package or split packages between releases. And doing this reorganisation now, fairly early in the release cycle, would let us give people like packagers plenty of advance notice. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
* Tom Dunstan [EMAIL PROTECTED] [080403 13:30]: As a side note, how were you intending to rename contrib? Directory shenanigans in CVS are horrible, particularly if you want all your old branches to still work. Well, please, anybody doing this, just simply copy and use cvs remove and cvs add... We're using CVS, so we live with disjoint history on renames... As long as the commit comment is clear, the history isn't lost, just another command away. a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
[HACKERS] Locale, Collation, ICU patch
Regarding the ICU patch in the commitfest here's my plan. IMHO the idea of making ICU a hard dependency which Postgres will have to use forevermore on all systems is a non-starter. I'm not entirely against having ICU as a supported collation system which packagers on systems where the system locale support is weak can choose to make a dependency of their binary packages though, assuming the issues raised elsewhere about ICU are resolved. As long as this bogeyman is scaring us though it's preventing us from having the SQL standard collation syntax and the accompanying catalog and planner changes. And as long as we don't have that support -- which is a big job -- nobody who's interested in implementing ICU or strcoll_l() or any other interfaces for a new platform will get around to it. The actual porting glue to call those functions on each platform is fairly lightweight and could easily be done by experts on that platform who aren't catalog and planner mavens. So we have a bit of a chicken and egg problem. We aren't getting the planner and syntax changes because we aren't sure the support would be good on every platform and we aren't getting the platform support because we don't have the planner and catalog changes. What I want to do is focus on adding the planner and catalog changes somehow. We implement a kind of baseline locale support something only slightly better than what we have now using setlocale before every comparison. This is clearly not the recommended configuration but as long as it handles what we handle today without a performance hit and a bit more besides it would be a big start. I'm assuming we would check if the desired locale is the current locale and skip the assignment. So if only one locale is *actually* in use then basically no additional overhead is incurred. Moreover if the desired locale is C then we can skip the assignment and use strcmp directly. So actually as long as only one non-C locale is in use then no additional overhead would be incurred. The big gotcha is what collation to use when comparing with data in the system tables, especially the shared system tables. I think we do need to define a database-wide encoding and collation to use for system tables. (Unless we can get by with varchar_pattern_ops indexes on system tables?) So the following use cases arise: a) They're actually using only one collation for both the system tables and their own data. This is well handled by our existing setup and would be basically unchanged in the new setup. b) They're using multiple collations for their data but only one at a time. Either one per database or one per session. In which case they don't incur any overhead c) They're using multiple collations for their data but only one collation in a given application unit of work. This is probably the most common case for OLTP application since each unit of work represents some particular user's operation. In this case as long as the system tables are set up to use the C locale then this would require at most one setlocale() call per unit of work though. d) They're actively using multiple collations in a single query, possibly even within a single sort (something like ORDER BY a COLLATION en_US, b COLLATION es_US). This would perform passably on glibc but abysmally on most other libc's. From that point forward we would go about adding support for strcoll_l() and other interfaces to handle case (d) on various platforms. For platforms with no reasonable interface we could add a --enable-ICU users or packagers could choose to use. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Separate psql commands from arguments (was: psql command aliases support)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Moving to -hackers ... -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFH9RwN5YBsbHkuyV0RAr9ZAKD+XwNYYw3ugsTvowvKImOlKMZzPQCfTHkQ u9jLkEIAWI/0MbNzzxBt0ok= =So1n -END PGP SIGNATURE- On Fri, Apr 4, 2008 at 4:19 AM, Tom Lane [EMAIL PROTECTED] wrote: Yeah, the fundamental difference between the backslash command situation and aliases in shells and suchlike is that, because we've historically allowed no space between command name and argument, it's not that easy to tell what string ought to be compared against alias names. I think that an alias facility would only be acceptably safe if we disallowed that syntax (ie, start to *require* a space between command and args). Are we ready to do that? As far as i know, this behaviour isn't documented anywhere. In fact, the manual denies its existence: The format of a psql command is the backslash, followed immediately by a command verb, then any arguments. The arguments are separated from the command verb and each other by any number of whitespace characters. (unless you interpret any number to include zero, but that's quite a stretch in this context) For what it's worth, I've been using Postgres actively for about five years, and I've not once suspected that it was possible to omit the space between a psql command and its argument. The idea of writing the command and its arguments in one word is so completely bizarre to me that I can't imagine anyone even trying it casually to see if it works. Although it is likely that some people have stumbled upon it accidentally via typos, it's hard to imagine them wanting to use it in any kind of pratical application. +1 for dropping this quirk. And, if there are no objections (or other takers), I volunteer to write a patch. Regards, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
On Thu, Apr 03, 2008 at 07:07:56PM +0200, Svenne Krap wrote: I currently save only md5(id || username || 'password')* into password, if I had access to sha1 (for example) i would add another password column so, having for example ID serial Username varchar Password_md5 varchar Password_sha1 varchar No matter how you see it, I get more bits of hash to check against. Are you a cryptanalyst and are you sure that this doesn't actually make things worse? I'm sure it gives you a warm fuzzy feeling that it's *got* to be better, but unless someone has done some hard maths I'm not sure how you can be so sure. Why not just use SHA-512, you get many more quality bits that way. I would drop md5 totally and use sha1 and ripemd-160 if possible.. but currently i use only md5 as it is the only available one.. Loading pgcrypto is overkill for something as simple as hash-functions. Sounds like a good reason for moving the current md5 function out into pgcrypto as well! :) * I prepend the id and the username to guard users with weak passwords against known hashvalues (rainbow tables) should the box ever get comprised ... I take it your threat model doesn't include the attacker logging incoming queries to look for the clear-text password. Sam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 03, 2008 at 01:06:26PM -0400, Bruce Momjian wrote: Alvaro Herrera wrote: Bruce Momjian escribi?: It seems more helpful if there were \x option to use extended format only when the output is too wide. TODO already has: o Add auto-expanded mode so expanded output is used if the row length is wider than the screen width. Consider using auto-expanded mode for backslash commands like \df+. Some sort of \x auto? Sounds interesting ... Yep. Having \df+ go to \x automatically sounds like a really great idea :) Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] best way for export gram.y symbols
Pavel Stehule [EMAIL PROTECTED] writes: I would to use main scanner from plpgsql. I need some values from parser/parse.h #define SELECT 543 #define PARAM 642 No, you don't. Whatever you think you need those for, there's probably a better way to do it. We got out of the business of letting anything but scan.c and gram.c depend on Bison symbol numbers years ago, and I don't much want to re-introduce that dependency. What exactly are you trying to accomplish? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] About numeric division again
One of the items on the commit-fest list is my patch from last year to rewrite the numeric division operator using schoolbook division: http://archives.postgresql.org/pgsql-patches/2007-06/msg00173.php The code that's currently in there sometimes has to propagate rounding to the left, meaning that you can never be certain whether all of the digits you have so far are good, and that means that it can sometimes generate an incorrect truncated output. This leads to the bugs cited in the above message. The reason I didn't just commit it last year is that I was dissatisfied with the speed penalty --- on very long inputs (dozens or hundreds of digits) division is about 4X slower than with our existing code. However, no one has come up with a better answer; and as a wise man once said, I can make this program arbitrarily fast, if it doesn't have to give the right answer. Correctness has to trump speed. One thing that occurs to me is that we could keep the existing approximate division code in there too, and use it internally in the transcendental function implementations. Those are not particularly interested in getting exact truncated results, and they are the worst case for the speed penalty because they do lots of divisions on values that are likely to be long. However this idea could fairly be charged with being code bloat. Comments? Also, there was some discussion of providing a SQL-level numeric integer division operator or function, that is the equivalent of trunc(x/y) except faster (since it'd not need to compute fractional digits that would then be thrown away). Is this worth doing, and if so what should we call it exactly? The amount of new code needed should be pretty small (just an interface function), so I'm willing to take care of it if we want one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Locale, Collation, ICU patch
Gregory Stark [EMAIL PROTECTED] writes: The big gotcha is what collation to use when comparing with data in the system tables, especially the shared system tables. I think we do need to define a database-wide encoding and collation to use for system tables. You mean cluster-wide? If we can get away with that, it'd solve a lot of problems. Note that the stuff in the system tables is mostly type name not text, and the comparison semantics for that have always been strcmp(), so the question of collation doesn't really apply. Name in itself doesn't care about encoding either, but I think we have to restrict encoding to avoid the problem of injecting data that's invalidly encoded into one database from another via the shared catalogs. The other issue that'd have to be resolved is the problem of system log output. I think we'd wish that log messages are written in a uniform encoding (CSV output in particular is going to have a hard time otherwise) but what do you do when you need to report something that includes a character not present in that encoding? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thursday 03 April 2008 08:47:12 Joshua D. Drake wrote: On Thu, 3 Apr 2008 21:03:05 +0530 Tom Dunstan [EMAIL PROTECTED] wrote: On Thu, Apr 3, 2008 at 8:25 PM, Andrew Dunstan [EMAIL PROTECTED] wrote: If this were at all true we would not not have seen the complaints from people along the lines of My ISP won't install contrib. But we have, and quite a number of times. We have concrete evidence that calling it contrib actually works against us. It's hard to see ISPs who won't install contrib from installing ${random module} from the big bad internet as has been discussed in this thread, but who knows? Sure it is. The very word contrib brings about ideas of things like: Unstable, Cooker, unofficial. modules is completely different (from a perception perspective). IMO the core modules should be compiled via configure with something like: ./configure --enable-module=ALL or ./configure --enable-module=pgcrypto --enable-module=cube This would install all the modules but not enable them in the database itself (of course). This could also be extended to the pls so that we have exactly one mechanism to control those options as well. ./configure --enable-module=pgcrypto --enable-module=plperl I think --enable-module might be the wrong term here, since you specificaly state we are not enabling them in the database. I think --with-module=... might be a better way to go. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -- Darcy Buskermolen Command Prompt, Inc. +1.503.667.4564 X 102 http://www.commandprompt.com/ PostgreSQL solutions since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 3 Apr 2008 12:03:43 -0700 Darcy Buskermolen [EMAIL PROTECTED] wrote: This would install all the modules but not enable them in the database itself (of course). This could also be extended to the pls so that we have exactly one mechanism to control those options as well. ./configure --enable-module=pgcrypto --enable-module=plperl I think --enable-module might be the wrong term here, since you specificaly state we are not enabling them in the database. I think --with-module=... might be a better way to go. That would work. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9S8HATb/zqfZUUQRAjAFAJ0dsH4Cwr3WuiLXVKw9tReOarhKSQCeNuKL GkaxyLV8eC/YhUzgfd4YTEI= =6C6r -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Locale, Collation, ICU patch
Tom Lane wrote: The other issue that'd have to be resolved is the problem of system log output. I think we'd wish that log messages are written in a uniform encoding (CSV output in particular is going to have a hard time otherwise) but what do you do when you need to report something that includes a character not present in that encoding? I think the only problem with CSV logs would be in trying to read them back into Postgres (which I agree is the main point of having them). We need to be more aggressive about dealing with these problems, or else how will we ever get to per-column charsets/collations? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Separate psql commands from arguments (was: psql command aliases support)
--On Freitag, April 04, 2008 05:04:04 +1100 Brendan Jurd [EMAIL PROTECTED] wrote: For everyone else who hasn't read the original discussion on -patches, here's a link into the archives: http://archives.postgresql.org/pgsql-patches/2008-04/msg5.php On Fri, Apr 4, 2008 at 4:19 AM, Tom Lane [EMAIL PROTECTED] wrote: Yeah, the fundamental difference between the backslash command situation and aliases in shells and suchlike is that, because we've historically allowed no space between command name and argument, it's not that easy to tell what string ought to be compared against alias names. I think that an alias facility would only be acceptably safe if we disallowed that syntax (ie, start to *require* a space between command and args). Are we ready to do that? Okay, the more we talked about that, the more i got that feeling, too. As far as i know, this behaviour isn't documented anywhere. In fact, the manual denies its existence: The format of a psql command is the backslash, followed immediately by a command verb, then any arguments. The arguments are separated from the command verb and each other by any number of whitespace characters. Not only there, the code itself doesn't encourage the use of this syntax (src/bin/psql/command.c), too: /* * If the command was not recognized, try to parse it as a one-letter * command with immediately following argument (a still-supported, but * no longer encouraged, syntax). */ (unless you interpret any number to include zero, but that's quite a stretch in this context) For what it's worth, I've been using Postgres actively for about five years, and I've not once suspected that it was possible to omit the space between a psql command and its argument. The idea of writing the command and its arguments in one word is so completely bizarre to me that I can't imagine anyone even trying it casually to see if it works. Although it is likely that some people have stumbled upon it accidentally via typos, it's hard to imagine them wanting to use it in any kind of pratical application. +1 for dropping this quirk. And, if there are no objections (or other takers), I volunteer to write a patch. Here's a quick and dirty patch which removes the responsible code from psql (maybe not enough, but short testing shows it's working). Sorry for the unified diff +1, too. I advised people not to use that syntax for years now, maybe we're in luck and everyone else was doing the same ;) -- Thanks Bernddiff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 1392972..c1194c2 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -90,30 +90,6 @@ HandleSlashCmds(PsqlScanState scan_state, /* And try to execute it */ status = exec_command(cmd, scan_state, query_buf); - if (status == PSQL_CMD_UNKNOWN strlen(cmd) 1) - { - /* - * If the command was not recognized, try to parse it as a one-letter - * command with immediately following argument (a still-supported, but - * no longer encouraged, syntax). - */ - char new_cmd[2]; - - /* don't change cmd until we know it's okay */ - new_cmd[0] = cmd[0]; - new_cmd[1] = '\0'; - - psql_scan_slash_pushback(scan_state, cmd + 1); - - status = exec_command(new_cmd, scan_state, query_buf); - - if (status != PSQL_CMD_UNKNOWN) - { - /* adjust cmd for possible messages below */ - cmd[1] = '\0'; - } - } - if (status == PSQL_CMD_UNKNOWN) { if (pset.cur_cmd_interactive) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 3, 2008 at 2:43 PM, David Fetter [EMAIL PROTECTED] wrote: On Thu, Apr 03, 2008 at 01:06:26PM -0400, Bruce Momjian wrote: Some sort of \x auto? Sounds interesting ... Yep. Having \df+ go to \x automatically sounds like a really great idea :) you can get pretty good resultsr currently for \df+ if you set up your 'less' pager a particular way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate psql commands from arguments (was: psql command aliases support)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Apr 4, 2008 at 6:35 AM, Bernd Helmle wrote: Here's a quick and dirty patch which removes the responsible code from psql (maybe not enough, but short testing shows it's working). Sorry for the unified diff I didn't realise it would be that straightforward! You could probably also do away with psql_scan_slash_pushback() (psqlcan.h, psqlcan.l) as it is only used by the code you removed. Cheers, BJ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFH9TV85YBsbHkuyV0RAq2aAKC5ay/QJQu9De9Ivq1kcgUFOGUa9gCgyB0c wt1IjpIZH26O7DiWxf0nXE0= =7Ybj -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Gregory Stark [EMAIL PROTECTED] writes: AFAIK the state of the art is actually to load the data into a table which closely matches the source material, sometimes just columns of text. Then copy it all to another table doing transformations. Not impressed. I liked the idea of allowing COPY FROM to act as a table source in a larger SELECT or INSERT...SELECT. Not at all sure what would be involved to implement that, but it seems a lot more flexible than any other approach. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 03, 2008 at 03:43:50PM -0400, Merlin Moncure wrote: On Thu, Apr 3, 2008 at 2:43 PM, David Fetter [EMAIL PROTECTED] wrote: On Thu, Apr 03, 2008 at 01:06:26PM -0400, Bruce Momjian wrote: Some sort of \x auto? Sounds interesting ... Yep. Having \df+ go to \x automatically sounds like a really great idea :) you can get pretty good resultsr currently for \df+ if you set up your 'less' pager a particular way. Does 'less' have a way to re-arrange columns?!? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Separate psql commands from arguments (was: psql command aliases support)
--On Freitag, April 04, 2008 06:52:37 +1100 Brendan Jurd [EMAIL PROTECTED] wrote: I didn't realise it would be that straightforward! Stumbled across it during hacking... You could probably also do away with psql_scan_slash_pushback() (psqlcan.h, psqlcan.l) as it is only used by the code you removed. It's not clean yet, but i thought we need something quick so people can try and comment on it. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] best way for export gram.y symbols
No, you don't. Whatever you think you need those for, there's probably a better way to do it. We got out of the business of letting anything but scan.c and gram.c depend on Bison symbol numbers years ago, and I don't much want to re-introduce that dependency. What exactly are you trying to accomplish? when I build CASE expression, I have to merge some PLpgSQL_expr together. Then I have to reparse expr-query and I have to find params and actualize it. I found some else. I can't include parser/parse.h in gram.y file, because there is name's conflict. But I can do it in other file. It's better, because is less risk of wrong preproces. So I have function: #include parser/parse.h #include parser/gramparse.h extern char *base_yytext; int plpgsql_querylex(int *param, char **ttext) { int tok = base_yylex(); if (tok == 0) return PLPGSQL_QUERYLEX_DONE; *ttext = base_yytext; switch (tok) { case SELECT: return PLPGSQL_QUERYLEX_SELECT; case PARAM: *param = base_yylval.ival; return PLPGSQL_QUERYLEX_PARAM; default: return PLPGSQL_QUERYLEX_NONPARAM; } } and then I can merge queries in function: /* * This function joins an PLpgSQL_expr to expression stack. It's used * for CASE statement where from some expr is created one expression. * Reparsing is necessary for detecting parameters in SQL query. */ static void add_expr(PLpgSQL_expr *expr, PLpgSQL_dstring *ds, int *nparams, int *params) { charbuff[32]; int lex; int pnum; char*yytext; scanner_init(expr-query); /* First lexem have to be SELECT */ if (plpgsql_querylex(pnum, yytext) != PLPGSQL_QUERYLEX_SELECT) { plpgsql_error_lineno = plpgsql_scanner_lineno(); /* internal error */ elog(ERROR, expected \SELECT \, got \%s\, yytext); } while((lex = plpgsql_querylex(pnum, yytext)) != PLPGSQL_QUERYLEX_DONE) { if (lex == PLPGSQL_QUERYLEX_PARAM) { int dno; int i; if (pnum 1 || pnum = MAX_EXPR_PARAMS) elog(ERROR, parsing query failure, wrong param $%d, pnum); dno = expr-params[pnum-1]; for (i = 0; i *nparams; i++) if (params[i] == dno) break; snprintf(buff, sizeof(buff), $%d, i+1); /* when not found variable */ if (i = *nparams) { if (*nparams = MAX_EXPR_PARAMS) { plpgsql_error_lineno = plpgsql_scanner_lineno(); ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg(too many variables specified in SQL statement))); } params[*nparams] = dno; (*nparams)++; } plpgsql_dstring_append(ds, buff); } else plpgsql_dstring_append(ds, yytext); } scanner_finish(); } Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Svenne Krap wrote: I would still prefer two hash functions as they do add a better safeguard towards collisions (the gentoo distribtion actually hashes the files by three different algorithms SHA1, SHA256 and RMD160) - i would be inclined to use three hashes too, if they were instantly available. Technically MD5 (128 bits) + SHA1 (160 bits) gives better strength than MD5 on its own, or SHA1 on its own, in that finding one collision is likely to be insufficient to break in, however, I doubt you could mathematically prove that it would perform equal to a 128 + 160 = 288 bit equivalent strength hash function. At the bare minimum, I refer you to the fact that each component on its own represents a self-contained hash of the entire document, and a small change in a real-life document will presumedly affect both values (this is how hash functions are designed), therefore, for real-life documents, there are bit patterns that will never coincide, therefore, not all 288 bits are being used. For real-life documents, it is highly likely that the MD5 and the SHA1 will always be a ~1:1 mapping (how many collision have truly been found in practice?), and the effective bit strength of the total for real documents approaches the highest of the two - which is 160 bits. I suggest that MD5 + SHA1 is not 128 bits better than SHA1 on its own. In any case, this is all irrelevant, because md5 passwords are still very useful, and the argument that more = better is a never ending infinite resource trap. More is not better. Better is better. If you can prove md5 is insufficient for PostgreSQL passwords, the correct decision would be to switch to something better, and deprecate md5 from the core. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
On Thu, Apr 3, 2008 at 4:08 PM, David Fetter [EMAIL PROTECTED] wrote: On Thu, Apr 03, 2008 at 03:43:50PM -0400, Merlin Moncure wrote: On Thu, Apr 3, 2008 at 2:43 PM, David Fetter [EMAIL PROTECTED] wrote: On Thu, Apr 03, 2008 at 01:06:26PM -0400, Bruce Momjian wrote: Some sort of \x auto? Sounds interesting ... Yep. Having \df+ go to \x automatically sounds like a really great idea :) you can get pretty good resultsr currently for \df+ if you set up your 'less' pager a particular way. Does 'less' have a way to re-arrange columns?!? no, but being able to scroll left/right with the arrow keys is (usually) just as good. that said, \G is pretty neat...I use it once in a while on the 'other' open source database. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Bruce Momjian escribió: Martijn van Oosterhout wrote: I was thinking that maybe \x should have a one-shot mode, i.e. \x query does it only for this one statement. It would solve the OPs problem. But break for others who want all output \x. I think Martijn is proposing using it as some sort of prefix which would take effect only on the current query. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Row estimation for var const and for NOT (...) queries
Hi, I have a table table1 with ~100k rows, the table having flag1 column. The value of flag1 is NULL in 85k+ rows, and it's TRUE in 7k+ rows, and FALSE in 6k rows. I use EXPLAIN to get apprx. number of rows for simple SELECT queries. But in case of ...WHERE NOT flag1 the optimizer is completely wrong: -- it's OK here, the estimation is fine test=# EXPLAIN ANALYZE SELECT * FROM table1 WHERE flag1; QUERY PLAN - Seq Scan on table1 (cost=0.00..9962.84 rows=7875 width=532) (actual time=0.107..134.729 rows=7652 loops=1) Filter: flag1 Total runtime: 139.460 ms (3 rows) -- here optimizer thinks that we have 90k+ rows with flag1 = FALSE, while the real number of rows is 6k+ test=# EXPLAIN ANALYZE SELECT * FROM table1 WHERE NOT flag1; QUERY PLAN -- Seq Scan on table1 (cost=0.00..9962.84 rows=91809 width=532) (actual time=0.087..110.596 rows=6243 loops=1) Filter: (NOT flag1) Total runtime: 114.414 ms (3 rows) I've checked statistics available and have found that Postgres actually knows how many FALSE values are present (approximately) in the table: test=# SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats WHERE tablename='table1' AND attname='flag1'; null_frac | n_distinct | most_common_vals | most_common_freqs ---++--+--- 0.864667 | 2 | {t,f}| {0.079,0.056} (1 row) So, I've started to think that this is a shortcoming of the optimizer code, which makes Postgres count both FALSEs and NULLs when estimating var const expressions. 1) backend/utils/adt/selfuncs.c, in neqsel() we have: ... result = DatumGetFloat8(DirectFunctionCall4(eqsel, ... ... result = 1.0 - result; PG_RETURN_FLOAT8(result); ... -- so, there is a wrong assumption that for var const expressions we may just use estimation for var = const and subtract it from 1. In fact, NULLs are ignored here. According to ternary logic, in this case we must subtract the number of NULLs also. This will improve row estimation for var const queries (but not in case when we deal with boolean datatype, look at (2)!). If there are no objections, I'll send the patch, which is straightforward. 2). In case of WHERE flag1 = FALSE or WHERE flag1 TRUE the planner rewrites the query to WHERE NOT flag1 and then uses the logic defined in backend/optimizer/path/clausesel.c, where, again, we see the wrong approach which ignores NULLs: ... else if (not_clause(clause)) { /* inverse of the selectivity of the underlying clause */ s1 = 1.0 - clause_selectivity(root, (Node *) get_notclausearg((Expr *) clause), varRelid, jointype); ... I have no idea how to improve this. AFAIKS, at this level we have no knowledge about the data we're dealing with (am I right?) -- so, I'm afraid that for booleans there is no way to improve the optimizer. If my thoughts described in (1) are correct and we improve the estimation for , we will have a situation where using booleans might decrease the performance due to wrong rows count estimation. I'll appreciate any help and ideas that will allow to improve the situation. P. S. I use current HEAD version of Postgres; before running queries the statistic was updated with ANALYZE -- Best regards, Nikolay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Mark Mielke wrote: In any case, this is all irrelevant, because md5 passwords are still very useful, and the argument that more = better is a never ending infinite resource trap. More is not better. Better is better. If you can prove md5 is insufficient for PostgreSQL passwords, the correct decision would be to switch to something better, and deprecate md5 from the core. Agreed. One must also remember that if you use two hashes, if *either* one of them is broken in the future so that you can reconstruct the password from the hash, you're screwed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Alvaro Herrera [EMAIL PROTECTED] writes: Bruce Momjian escribió: Martijn van Oosterhout wrote: I was thinking that maybe \x should have a one-shot mode, i.e. \x query does it only for this one statement. It would solve the OPs problem. But break for others who want all output \x. I think Martijn is proposing using it as some sort of prefix which would take effect only on the current query. A bigger problem is that it doesn't play nicely at all with multi-line queries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Heikki Linnakangas wrote: Mark Mielke wrote: One must also remember that if you use two hashes, if *either* one of them is broken in the future so that you can reconstruct the password from the hash, you're screwed. That is quite a good argument actually :) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: AFAIK the state of the art is actually to load the data into a table which closely matches the source material, sometimes just columns of text. Then copy it all to another table doing transformations. Not impressed. I liked the idea of allowing COPY FROM to act as a table source in a larger SELECT or INSERT...SELECT. Not at all sure what would be involved to implement that, but it seems a lot more flexible than any other approach. Several years ago Bruce and I discussed the then theoretical use of a SELECT query as the source for COPY TO, and we agreed that the sane analog would be to have an INSERT query as the target of COPY FROM. This idea seems to take that rather further. If doable I think it would be cool, as long as people don't try using it as an alternative storage engine. I can just imagine people creating views over such SELECT statements ... cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row estimation for var const and for NOT (...) queries
Nikolay Samokhvalov [EMAIL PROTECTED] writes: I have a table table1 with ~100k rows, the table having flag1 column. The value of flag1 is NULL in 85k+ rows, and it's TRUE in 7k+ rows, and FALSE in 6k rows. Yeah, you're going to have some problems with so many NULLs, I'm sure. -- so, there is a wrong assumption that for var const expressions we may just use estimation for var = const and subtract it from 1. In fact, NULLs are ignored here. According to ternary logic, in this case we must subtract the number of NULLs also. This will improve row estimation for var const queries (but not in case when we deal with boolean datatype, look at (2)!). If there are no objections, I'll send the patch, which is straightforward. It doesn't seem all that straightforward to me, unless your intent is to copy-and-paste all of eqsel(), which I wouldn't regard as a very acceptable solution. Otherwise you're going to need some refactoring. 2). In case of WHERE flag1 = FALSE or WHERE flag1 TRUE the planner rewrites the query to WHERE NOT flag1 and then uses the logic defined in backend/optimizer/path/clausesel.c, where, again, we see the wrong approach which ignores NULLs: I think the only case where we could hope to improve that is where the argument is a simple bool variable --- but of course that's also the only case where we could've done much with the flag1 = FALSE form, so the rewriting isn't really hurting here. I'd suggest pushing the work into selfuncs.c and seeing if examine_variable can do anything with the argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] best way for export gram.y symbols
Pavel Stehule [EMAIL PROTECTED] writes: What exactly are you trying to accomplish? when I build CASE expression, I have to merge some PLpgSQL_expr together. Then I have to reparse expr-query and I have to find params and actualize it. There has to be a better way than that. What CASE syntax are you trying to implement, anyhow? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
Sam Mason wrote: Are you a cryptanalyst and are you sure that this doesn't actually make things worse? I'm sure it gives you a warm fuzzy feeling that it's *got* to be better, but unless someone has done some hard maths I'm not sure how you can be so sure. No sadly I am no cryptoanalyst. Why not just use SHA-512, you get many more quality bits that way. I would, if it was available in core. I would drop md5 totally and use sha1 and ripemd-160 if possible.. but currently i use only md5 as it is the only available one.. Loading pgcrypto is overkill for something as simple as hash-functions. Sounds like a good reason for moving the current md5 function out into pgcrypto as well! :) I am not sure how I am to understand that comment. But again I am just a user... * I prepend the id and the username to guard users with weak passwords against known hashvalues (rainbow tables) should the box ever get comprised ... I take it your threat model doesn't include the attacker logging incoming queries to look for the clear-text password. No it doesn't, I am mostly concerned with the grab and run scenario. I am still convinced having more (and better) hash-functions in core is a gain for some users. And it is fairly un-intrusive as the hash functions are well-defined and never going to change (new ones can be added and old ones deleted, but SHA256 for example will never change). I think I will drop the issue as I cannot present formal proof of my case, sorry to have wasted your time. Svenne
Re: [HACKERS] psql \G command -- send query and output using extended format
Tom Lane escribió: Alvaro Herrera [EMAIL PROTECTED] writes: I think Martijn is proposing using it as some sort of prefix which would take effect only on the current query. A bigger problem is that it doesn't play nicely at all with multi-line queries. Hmm, why wouldn't it? I assume it would only be recognized if the query buffer is empty. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Tom Lane wrote: Just noticed that TRUNCATE fails to clear the stats collector's counts for the table. I am not sure if it should reset the event counts or not (any thoughts?) but surely it is wrong to not zero the live/dead tuple counts. Agreed, the live/dead counters should be reset. Regarding event counts, my take is that we should have a separate statement count for truncate (obviously not a tuple count), and the others should be left alone. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane escribió: A bigger problem is that it doesn't play nicely at all with multi-line queries. Hmm, why wouldn't it? I assume it would only be recognized if the query buffer is empty. Huh? The proposed syntax was \x query... What do you do when you'd like the query to extend over multiple lines? Backslash commands can't cross lines. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Just noticed that TRUNCATE fails to clear the stats collector's counts for the table. I am not sure if it should reset the event counts or not (any thoughts?) but surely it is wrong to not zero the live/dead tuple counts. Agreed, the live/dead counters should be reset. Regarding event counts, my take is that we should have a separate statement count for truncate (obviously not a tuple count), and the others should be left alone. I thought some more about how to do it, and stumbled over how to cope with TRUNCATE being rolled back. That nixed my first idea of just having TRUNCATE send a zero-the-counters-now message. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Tom Lane escribió: Huh? The proposed syntax was \x query... What do you do when you'd like the query to extend over multiple lines? Backslash commands can't cross lines. Save the fact that the current query is extended, until query end? I haven't actually looked at what the implementation would look like. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \G command -- send query and output using extended format
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane escribió: Huh? The proposed syntax was \x query... What do you do when you'd like the query to extend over multiple lines? Backslash commands can't cross lines. Save the fact that the current query is extended, until query end? Yech. To name just a couple of problems, what if you decide after typing another line or two that you didn't want \x after all? Action-at-a-distance commands suck. You'd also find that this didn't play very nicely with history recall, since the history stuff assumes that a line starting with a backslash command is separate from those around it. If we want this at all, we should do it the way first proposed (\G). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers