[HACKERS] Fwd: Starting off with the development

2010-11-02 Thread Vaibhav Kaushal
And I wanted to know what type of data structures are created by the parser
/ planner and how are they read by the executor.

I know that it is some sort of tree. But how many child nodes are there in a
node on the tree and what structure is the NODE itself of?

Which files should I look into to get these details?

Thanks

-Vaibhav


Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons

2010-11-02 Thread Peter Eisentraut
On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote:
 * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names
 confusing. Not sure if its worth it.

Given that there are widely established functions atoi() and atol(),
naming the reverse itoa() and ltoa() makes a lot of sense.  The changed
versions read like string to ASCII.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-02 Thread Heikki Linnakangas

On 02.11.2010 00:47, Tom Lane wrote:

Greg Starkgsst...@mit.edu  writes:

On Mon, Nov 1, 2010 at 12:37 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Yes, indeed there is a corner-case bug when you try to stream the very first
WAL segment, with log==seg==0.



This smells very much like
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/137052



I wonder if there's some defensive programming way to avoid bugs of this sort.


It strikes me that it's not good if there isn't a recognizable invalid
value for WAL locations.  These bits of code show that there is reason
to have one.  Maybe we should teach initdb to start the WAL one segment
later, and then 0/0 *would* mean invalid, and we could revert these
other hacks.


Good idea. That can even be back-patched to 9.0, it should have no 
effect on existing installations,


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] B-tree parent pointer and checkpoints

2010-11-02 Thread Heikki Linnakangas
We have the rm_safe_restartpoint mechanism to ensure that we don't use a 
checkpoint that splits a multi-level B-tree insertion as a restart 
point. But to my surprise, we don't have anything to protect against the 
analogous case during normal operation. This is possible:


1. Split child page. Write WAL records for the child pages.
2. Begin and finish a checkpoint
3. Crash, before writing the WAL record of inserting the child pointer 
in the parent B-tree page.
4. Recovery begins at the new checkpoint, never sees the incomplete 
split, so it stays incomplete.


In practice that's pretty hard to hit, because a checkpoint takes some 
time, while locking the parent page and writing the child pointer is 
usually very quick. But it's possible.


It surprises me that we thought of this when we introduced 
restartpoints, but this more obvious case during normal operation seems 
to have been there forever. Nothing very bad happens if you lose the 
parent update, but this would be nice to fix nevertheless.


I bumped into this while thinking about archive recovery - the above can 
happen at archive recovery too if the checkpoint is caused by 
pg_start_backup().


I think we can fix this by requiring that any multi-WAL-record actions 
that are in-progress when a checkpoint starts (at the REDO-pointer) must 
finish before the checkpoint record is written. That will close the 
issue with restartpoints, archive recovery etc. as well, so we no longer 
need to worry about this anywhere else than while performing an online 
checkpoint.


I'm thinking of using the isCommit flag for this, to delay writing the 
checkpoint record until all incomplete splits are finished. isCommit 
protects against a similar race condition between writing commit record 
and flushing the clog page, this race condition is similar. Will 
obviously need to rename it, and double-check that it's safe: b-tree 
splits take longer, and there's no critical section there like there is 
in the commit codepath.


Comments?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Intelligent RDBMS

2010-11-02 Thread ghatpande
Hello All,
I am Vijay Ghatpande from Pune, India.  I am in IT field for last 30 years and 
mainly in ERP design, development and implementation. I have worked on JD 
Edwards, Oracle Apps and SAP. I was involved in design and development of ERP 
package called ISP – Integrated Software for Production. Now I am independent 
and started my own consultancy AMS ERP Solutions and work for SME industries in 
and around Pune. I am technical and worked from COBO, DGL to .NET, C++. I have 
also worked as ORACLE, DB2 DBA for many years. 

I have a dream project in my mind. It is on Relational Database Management. Pl 
read below paragraph from the angle of users of application developed on RDBMS.
Any application contains many tables and they are related to each other. RDBMS 
keeps relations. Developers, users have to know these relations to extract 
proper data and convert it into information. If multiple tables are involved 
then query can be more complex. Main problem with RDBMS is writing SQL. End 
users avoid SQL and thinks that this is a technical work. This is limitation of 
RDBMS. If SQL becomes free of table and relations then l am sure everyone will 
like it.

For example MS Dynamics has many tables to store the data. We can keep this 
data in such a way that for developer MS Dynamics will be one table say MSD and 
user can extract any data from that table. He need not know internal table 
relations. If this happens life will be easy for many.  Application development 
will be very easy and cost effective. This can be achieved by creating views. I 
am thinking of intelligent database where relations are inbuilt. If this is 
available in the product itself then domain knowledge will be in RDBMS. If SQL 
becomes free of table and relations then l am sure everyone will like it. My 
intention is not to hide relations from users. But this will be one of the 
advantages of intelligent RDBMS. This will be next generation RDBMS. This is 
going to change the RDBMS, ERP world. 

I am requesting you to give me feedback.

With Warm Regards,


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] timestamp of the last replayed transaction

2010-11-02 Thread Fujii Masao
Hi,

After 9.0 release, I've often heard that some people want to know
how far transactions have been replayed in the standby in timestamp
rather than LSN. So I'm thinking to include the function which returns
the timestamp of the last applied transaction (i.e., commit/abort WAL
record) in the core.

Name: pg_last_replay_xact_timestamp (better name?)
Return Type: timestamp with time zone

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp of the last replayed transaction

2010-11-02 Thread Dimitri Fontaine
Fujii Masao masao.fu...@gmail.com writes:
 After 9.0 release, I've often heard that some people want to know
 how far transactions have been replayed in the standby in timestamp
 rather than LSN. So I'm thinking to include the function which returns
 the timestamp of the last applied transaction (i.e., commit/abort WAL
 record) in the core.

 Name: pg_last_replay_xact_timestamp (better name?)
 Return Type: timestamp with time zone

 Thought?

How do you want to implement the tracking? Will it look like the
proposal in this thread:

  http://archives.postgresql.org/pgsql-hackers/2010-05/msg01209.php

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons

2010-11-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote:
 * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names
 confusing. Not sure if its worth it.

 Given that there are widely established functions atoi() and atol(),
 naming the reverse itoa() and ltoa() makes a lot of sense.  The changed
 versions read like string to ASCII.

Yeah, and s32 makes no sense at all.  I think we should either leave
well enough alone (to avoid introducing a cross-version backpatch
hazard) or use pg_i32toa etc.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improved parallel make support

2010-11-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 This patch requires GNU make 3.80, because of the above | feature and
 the $(eval) function.  Version 3.80 is dated October 2002, so it should
 be no problem, but I do occasionally read of make 3.79 around here;
 maybe it's time to get rid of that.  I did put in a check that makes the
 build fail right away if a wrong version of make is used.

Do we have a handle on how many buildfarm members this will break?

(fwiw, my hpux box is running 3.79.1)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-tree parent pointer and checkpoints

2010-11-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I think we can fix this by requiring that any multi-WAL-record actions 
 that are in-progress when a checkpoint starts (at the REDO-pointer) must 
 finish before the checkpoint record is written.

What happens if someone wants to start a new split while the checkpoint
is hanging fire?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-tree parent pointer and checkpoints

2010-11-02 Thread Heikki Linnakangas

On 02.11.2010 16:30, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I think we can fix this by requiring that any multi-WAL-record actions
that are in-progress when a checkpoint starts (at the REDO-pointer) must
finish before the checkpoint record is written.


What happens if someone wants to start a new split while the checkpoint
is hanging fire?


You mean after CreateCheckPoint has determined the redo pointer, but 
before it has written the checkpoint record? The new split can go ahead, 
and the checkpoint doesn't need care about it. Recovery will start at 
the redo pointer, so it will see the split record, and will know to 
finish the incomplete split if necessary.


The logic is the same as with inCommit. Checkpoint will fetch the list 
of in-progress splits some time after determining the redo-pointer. It 
will then wait until all of those splits have finished. Any new splits 
that begin after fetching the list don't affect the checkpoint.


inCommit can't be used as is, because it's tied to the Xid, but 
something similar should work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Starting off with the development

2010-11-02 Thread Kevin Grittner
Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote:
 
 What version should I start from? I guess postgresql 9.1 alpha
 would be good.
 
The HEAD of the master branch of the git repository is where
development normally takes place.  You should start by developer
section of the main PostgreSQL web site and the pages to which it
links -- especially the Developer's FAQ page:
 
http://www.postgresql.org/developer/
 
http://wiki.postgresql.org/wiki/Developer_FAQ
 
You also might want to browse the README files in the source
directories.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking latest timeline in standby mode

2010-11-02 Thread Heikki Linnakangas

On 02.11.2010 07:15, Fujii Masao wrote:

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.


Yeah, that seems better.


We're a bit fuzzy about what TLI is written in the page header when the
timeline changing checkpoint record is written, though. If the checkpoint
record fits in the previous page, the page will carry the old TLI, but if
the checkpoint record begins a new WAL page, the new page is initialized
with the new TLI. I think we should rearrange that so that the page header
will always carry the old TLI.


Or after rescanning the timeline history files, what about refetching the last
applied record and checking whether the TLI in the xlog page header is the
same as the previous TLI? IOW, what about using the header of the xlog page
including the last applied record instead of the following checkpoint record?


I guess that would work too, but it seems problematic to move backwards 
during recovery.



Anyway ISTM we should also check that the min recovery point is not ahead
of the TLI switch location. So we need to fetch the record in the min recovery
point and validate the TLI of the xlog page header. Otherwise, the database
might get corrupted. This can happen, for example, when you remove all the
WAL files in pg_xlog directory and restart the standby.


Yes, that's another problem. We don't know which timeline the min 
recovery point refers to. We should store TLI along with 
minRecoveryPoint, then we can at least check that we're on the right 
timeline when we reach minRecoveryPoint and throw an error.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create custom collation from case insensitive portuguese

2010-11-02 Thread Euler Taveira de Oliveira
Alexandre Riveira escreveu:
 I've achieved some success in changing collate operating system (linux)
 to generate sort of way of Brazil Portuguese hopes by adding the
 following code in LC_COLLATE
 
This was already discussed; search the archives [1] [2].

 So far, I understood the mechanism of change collate and reproduce in
 postgresql, and I could not generate a case-insensitive search, I
 believe that would change within the LC_COLLATE variable, but could not
 go any further than that.
 
PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]. But you could use ILIKE or regular expression to
achieve  that.


[1] http://pgfoundry.org/pipermail/brasil-usuarios/20060330/001667.html
[2] http://www.mail-archive.com/brasil-usuar...@pgfoundry.org/msg00895.html
[3] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00512.php


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER TYPE recursion to typed tables

2010-11-02 Thread Peter Eisentraut
I'm working on propagating ALTER TYPE commands to typed tables.  This is
currently prohibited.  For example, take these regression test cases:

CREATE TYPE test_type2 AS (a int, b text);
CREATE TABLE test_tbl2 OF test_type2;
ALTER TYPE test_type2 ADD ATTRIBUTE c text; -- fails
ALTER TYPE test_type2 ALTER ATTRIBUTE b TYPE varchar; -- fails
ALTER TYPE test_type2 DROP ATTRIBUTE b; -- fails
ALTER TYPE test_type2 RENAME ATTRIBUTE b TO bb; -- fails

The actual implementation isn't very difficult, because the ALTER TABLE
code already knows everything about recursion.

Now I'm wondering what kind of syntax should be used to control this.  I
think you don't want to automatically propagate such innocent looking
operations to tables in a potentially data-destroying manner.  The
natural idea would be RESTRICT/CASCADE.  This is currently only
associated with DROP operations, but I suppose ADD/ALTER/RENAME
ATTRIBUTE x ... CASCADE doesn't sound too odd.

Comments, other ideas?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparison with true in source code

2010-11-02 Thread Michael Meskes
On Mon, Nov 01, 2010 at 12:17:02PM +0900, Itagaki Takahiro wrote:
 There are some == true in the codes, but they might not be safe
 because all non-zero values are true in C. Is it worth cleaning up them?
 ...
 src/interfaces/ecpg/ecpglib/connect.c(168):   if (con-autocommit ==
 true  strncmp(mode, off, strlen(off)) == 0)
 src/interfaces/ecpg/preproc/ecpg.addons(356): if (compat ==
 ECPG_COMPAT_INFORMIX_SE  autocommit == true)
 src/interfaces/ecpg/preproc/ecpg.c(310):  
 ptr2ext[3] = (header_mode
 == true) ? 'h' : 'c';
 src/interfaces/ecpg/preproc/ecpg.c(327):  
 ptr2ext[1] = (header_mode
 == true) ? 'h' : 'c';

I actually see no reason why these variables are not defined as bool instead of
int, so I changed this. Hopefully I found all of them.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 
 * You wont see an error if the next command after the IDLE in
 transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
 around that.
 
Well, on a ROLLBACK I'm not sure it's a problem.  On a COMMIT,
couldn't you call a function to check for it in CommitTransaction
and PrepareTransaction?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE recursion to typed tables

2010-11-02 Thread Robert Haas
On Tue, Nov 2, 2010 at 9:15 AM, Peter Eisentraut pete...@gmx.net wrote:
 I'm working on propagating ALTER TYPE commands to typed tables.  This is
 currently prohibited.  For example, take these regression test cases:

 CREATE TYPE test_type2 AS (a int, b text);
 CREATE TABLE test_tbl2 OF test_type2;
 ALTER TYPE test_type2 ADD ATTRIBUTE c text; -- fails
 ALTER TYPE test_type2 ALTER ATTRIBUTE b TYPE varchar; -- fails
 ALTER TYPE test_type2 DROP ATTRIBUTE b; -- fails
 ALTER TYPE test_type2 RENAME ATTRIBUTE b TO bb; -- fails

 The actual implementation isn't very difficult, because the ALTER TABLE
 code already knows everything about recursion.

 Now I'm wondering what kind of syntax should be used to control this.  I
 think you don't want to automatically propagate such innocent looking
 operations to tables in a potentially data-destroying manner.  The
 natural idea would be RESTRICT/CASCADE.  This is currently only
 associated with DROP operations, but I suppose ADD/ALTER/RENAME
 ATTRIBUTE x ... CASCADE doesn't sound too odd.

 Comments, other ideas?

That seems reasonable.  What do you plan to do about this case?

CREATE TYPE test_type AS (a int, b text);
CREATE TABLE test_tbl (x test_type);
ALTER TYPE test_type ADD ATTRIBUTE c text;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create custom collation from case insensitive portuguese

2010-11-02 Thread Robert Haas
On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 Alexandre Riveira escreveu:
 I've achieved some success in changing collate operating system (linux)
 to generate sort of way of Brazil Portuguese hopes by adding the
 following code in LC_COLLATE

 This was already discussed; search the archives [1] [2].

 So far, I understood the mechanism of change collate and reproduce in
 postgresql, and I could not generate a case-insensitive search, I
 believe that would change within the LC_COLLATE variable, but could not
 go any further than that.

 PostgreSQL doesn't support case-insensitive searches specifying the collate
 per column yet. Look at [3]. But you could use ILIKE or regular expression to
 achieve  that.

Is citext also useful for this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Robert Haas
On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 marcin mank marcin.m...@gmail.com writes:
 On Sat, Oct 30, 2010 at 6:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 3. To hash, apply the element type's hash function to each array
 element.  Combine these values by rotating the accumulator left
 one bit at each step and xor'ing in the next element's hash value.

 Thoughts?  In particular, is anyone aware of a better method
 for combining the element hash values?

 This would make the hash the same for arrays with elements 32 apart swapped.

 Well, there are *always* going to be cases where you get the same hash
 value for two different inputs; it's unavoidable given that you have to
 combine N 32-bit hash values into only one 32-bit output.

Sure.  The goal is to make those hard to predict, though.  I think
multiply by 31 and add the next value is a fairly standard way of
getting that behavior.  It mixes up the bits a lot more than just
left-shifting by a variable offset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE recursion to typed tables

2010-11-02 Thread Peter Eisentraut
On tis, 2010-11-02 at 10:54 -0700, Robert Haas wrote:
 What do you plan to do about this case?
 
 CREATE TYPE test_type AS (a int, b text);
 CREATE TABLE test_tbl (x test_type);
 ALTER TYPE test_type ADD ATTRIBUTE c text;

This is currently prohibited, and I'm not planning to do anything about
that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 marcin mank marcin.m...@gmail.com writes:
 This would make the hash the same for arrays with elements 32 apart swapped.
 
 Well, there are *always* going to be cases where you get the same hash
 value for two different inputs; it's unavoidable given that you have to
 combine N 32-bit hash values into only one 32-bit output.

 Sure.  The goal is to make those hard to predict, though.

Really?  I think I don't understand when this fails isn't obviously
better than being able to predict when it fails ...

 I think
 multiply by 31 and add the next value is a fairly standard way of
 getting that behavior.  It mixes up the bits a lot more than just
 left-shifting by a variable offset.

What concerns me about that is that it tends to push the bits to the
left --- I think the effects of the earlier inputs are going to overflow
out as you incorporate a lot of newer inputs.  What you want is a scheme
where every input item affects all the bits of the final result about
equally, and my gut feeling is this doesn't provide that.

I'd be happier about this approach if there were some actual theory
behind it ... maybe there's some analysis out there, but the one link
that was given was just about entirely unconvincing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar nov 02 15:21:31 -0300 2010:

 What concerns me about that is that it tends to push the bits to the
 left --- I think the effects of the earlier inputs are going to overflow
 out as you incorporate a lot of newer inputs.  What you want is a scheme
 where every input item affects all the bits of the final result about
 equally, and my gut feeling is this doesn't provide that.

Ahh, now I understand what you meant with rotating the bits in the
original email in this thread.  You're not simply shifting.  Clever.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Robert Haas
On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 marcin mank marcin.m...@gmail.com writes:
 This would make the hash the same for arrays with elements 32 apart 
 swapped.

 Well, there are *always* going to be cases where you get the same hash
 value for two different inputs; it's unavoidable given that you have to
 combine N 32-bit hash values into only one 32-bit output.

 Sure.  The goal is to make those hard to predict, though.

 Really?  I think I don't understand when this fails isn't obviously
 better than being able to predict when it fails ...

Isn't that the whole point of hash functions?  Collisions are
inevitable, but you want them to be unpredictable.  If you want a hash
function with predictable collision behavior, just has the first
element.  It'll be highly predictable AND wicked fast.

 I think
 multiply by 31 and add the next value is a fairly standard way of
 getting that behavior.  It mixes up the bits a lot more than just
 left-shifting by a variable offset.

 What concerns me about that is that it tends to push the bits to the
 left --- I think the effects of the earlier inputs are going to overflow
 out as you incorporate a lot of newer inputs.  What you want is a scheme
 where every input item affects all the bits of the final result about
 equally, and my gut feeling is this doesn't provide that.

I don't think so.  That would be a problem if you multiplied by an
even number.  You can see that you don't get dups if you just add in
the same value over and over:

#include stdio.h

int
main(int argc, char **argv)
{
unsigned hv = 1;
unsigned i;

for (i = 0; i  100; ++i)
{
hv = hv * 31 + 1;
printf(%08x\n, hv);
}
return 0;
}

No dups.  Also, if you change that first hv = 1 line to hv = 2 and
rerun, that sequence also has no dups, and no collisions with the hv =
1 sequence.

 I'd be happier about this approach if there were some actual theory
 behind it ... maybe there's some analysis out there, but the one link
 that was given was just about entirely unconvincing.

I think it's from Knuth, though unfortunately I don't have a copy to
check.  There are probably better algorithms out there, but this one's
pretty simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 
 The goal is to make those hard to predict, though.

 Really?  I think I don't understand when this fails isn't
 obviously better than being able to predict when it fails ...
 
 Isn't that the whole point of hash functions?  Collisions are
 inevitable, but you want them to be unpredictable.
 
Are you sure you're not confusing attributes of a good random number
generator with hash generation?  (They have much in common, but I've
only seen concern with hard to predict when working on random
number algorithms, as for financial audits or jury selection.)
 
 If you want a hash function with predictable collision behavior,
 just has the first element.  It'll be highly predictable AND
 wicked fast.
 
You're confusing unpredictable with widely distributed, I think.
There's no reason that the hash value of an integer of the same size
as the hash shouldn't be the integer itself, for example.  It's hard
to get more predictable than that, yet it works well for hash
lookups.
 
I know that for a hash of a character string, the total = (total *
31) + nextcharacter has been shown to be effective.  That's hardly
random or hard to predict, but it does tend to spread out the hash
values well.  Whether it is as good for arrays, I don't know.  It
seems like a reasonable place to start, though, since a character
string is rather like an array of characters.
 
 What concerns me about that is that it tends to push the bits to
 the left --- I think the effects of the earlier inputs are going
 to overflow out as you incorporate a lot of newer inputs.  What
 you want is a scheme where every input item affects all the bits
 of the final result about equally, and my gut feeling is this
 doesn't provide that.
 
 I don't think so.  That would be a problem if you multiplied by an
 even number.  You can see that you don't get dups if you just add
 in the same value over and over:
 
Yeah, that 1 in the low order position ensures that the impact of
any value which is ever added into the total is never entirely lost.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE recursion to typed tables

2010-11-02 Thread Robert Haas
On Nov 2, 2010, at 11:17 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-11-02 at 10:54 -0700, Robert Haas wrote:
 What do you plan to do about this case?
 
 CREATE TYPE test_type AS (a int, b text);
 CREATE TABLE test_tbl (x test_type);
 ALTER TYPE test_type ADD ATTRIBUTE c text;
 
 This is currently prohibited, and I'm not planning to do anything about
 that.

OK.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Robert Haas
On Tue, Nov 2, 2010 at 12:34 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:

 The goal is to make those hard to predict, though.

 Really?  I think I don't understand when this fails isn't
 obviously better than being able to predict when it fails ...

 Isn't that the whole point of hash functions?  Collisions are
 inevitable, but you want them to be unpredictable.

 Are you sure you're not confusing attributes of a good random number
 generator with hash generation?  (They have much in common, but I've
 only seen concern with hard to predict when working on random
 number algorithms, as for financial audits or jury selection.)

 If you want a hash function with predictable collision behavior,
 just has the first element.  It'll be highly predictable AND
 wicked fast.

 You're confusing unpredictable with widely distributed, I think.
 There's no reason that the hash value of an integer of the same size
 as the hash shouldn't be the integer itself, for example.  It's hard
 to get more predictable than that, yet it works well for hash
 lookups.

Well, no, not really.  For example, it may be that you have a hash
table with N buckets and values that of the form N+k, 2N+k, 3N+k, 
 and everything will collide.

If you do some mixing of the bits, that case is far less likely to
occur in practice.

See for example http://www.azillionmonkeys.com/qed/hash.html

 I know that for a hash of a character string, the total = (total *
 31) + nextcharacter has been shown to be effective.  That's hardly
 random or hard to predict, but it does tend to spread out the hash
 values well.  Whether it is as good for arrays, I don't know.  It
 seems like a reasonable place to start, though, since a character
 string is rather like an array of characters.

That was my thought.

 What concerns me about that is that it tends to push the bits to
 the left --- I think the effects of the earlier inputs are going
 to overflow out as you incorporate a lot of newer inputs.  What
 you want is a scheme where every input item affects all the bits
 of the final result about equally, and my gut feeling is this
 doesn't provide that.

 I don't think so.  That would be a problem if you multiplied by an
 even number.  You can see that you don't get dups if you just add
 in the same value over and over:

 Yeah, that 1 in the low order position ensures that the impact of
 any value which is ever added into the total is never entirely lost.

Right...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 There's no reason that the hash value of an integer of the same
 size as the hash shouldn't be the integer itself, for example. 
 It's hard to get more predictable than that, yet it works well
 for hash lookups.
 
 Well, no, not really.  For example, it may be that you have a hash
 table with N buckets and values that of the form N+k, 2N+k, 3N+k,
  and everything will collide.
 
That hardly seems convincing if the integer is a sequentially
assigned number, as with many ID columns; but if you want an
algorithm which will work well with numbers which might be assigned
in a pattern with a high correlation with some unpredictable number
of buckets -- sure, it won't work well without some scrambling.  For
sequentially assigned numbers, that scrambling would have a cost and
would be of no value.  I guess it depend a bit on your use case and
goals.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Really?  I think I don't understand when this fails isn't obviously
 better than being able to predict when it fails ...

 Isn't that the whole point of hash functions?  Collisions are
 inevitable, but you want them to be unpredictable.  If you want a hash
 function with predictable collision behavior, just has the first
 element.  It'll be highly predictable AND wicked fast.

That seems like a rather poor straw man, since it suffers from exactly
the defect I'm complaining about, namely failing to consider all the
input values equally.

 I'd be happier about this approach if there were some actual theory
 behind it ... maybe there's some analysis out there, but the one link
 that was given was just about entirely unconvincing.

 I think it's from Knuth, though unfortunately I don't have a copy to
 check.  There are probably better algorithms out there, but this one's
 pretty simple.

I don't see anything in Knuth suggesting a multiplier of 31.  His
recommendation for a multiplier, if you're going to use multiplicative
hashing, is wordsize/phi (phi being the golden ratio) ... and he also
wants you to keep the high order not the low order bits of the product.

However, this is largely beside the point, because that theory, as well
as the Java code you're arguing from, has to do with the initial hashing
of a raw sequence of input items.  Not with combining some existing hash
values.  The rotate-and-xor method I suggested for that is borrowed
exactly from section 6.4 of Knuth (page 512, in the first edition of
volume 3).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Kenneth Marshall
On Tue, Nov 02, 2010 at 04:42:19PM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Really? ?I think I don't understand when this fails isn't obviously
  better than being able to predict when it fails ...
 
  Isn't that the whole point of hash functions?  Collisions are
  inevitable, but you want them to be unpredictable.  If you want a hash
  function with predictable collision behavior, just has the first
  element.  It'll be highly predictable AND wicked fast.
 
 That seems like a rather poor straw man, since it suffers from exactly
 the defect I'm complaining about, namely failing to consider all the
 input values equally.
 
  I'd be happier about this approach if there were some actual theory
  behind it ... maybe there's some analysis out there, but the one link
  that was given was just about entirely unconvincing.
 
  I think it's from Knuth, though unfortunately I don't have a copy to
  check.  There are probably better algorithms out there, but this one's
  pretty simple.
 
 I don't see anything in Knuth suggesting a multiplier of 31.  His
 recommendation for a multiplier, if you're going to use multiplicative
 hashing, is wordsize/phi (phi being the golden ratio) ... and he also
 wants you to keep the high order not the low order bits of the product.
 
 However, this is largely beside the point, because that theory, as well
 as the Java code you're arguing from, has to do with the initial hashing
 of a raw sequence of input items.  Not with combining some existing hash
 values.  The rotate-and-xor method I suggested for that is borrowed
 exactly from section 6.4 of Knuth (page 512, in the first edition of
 volume 3).
 
   regards, tom lane
 

Given that our hash implimentation mixes the input data well (It does.
I tested it.) then a simple rotate-and-xor method is all that should
be needed to maintain all of the needed information. The original
hash function has done the heavy lifting in this case.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Robert Haas
On Nov 2, 2010, at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, this is largely beside the point, because that theory, as well
 as the Java code you're arguing from, has to do with the initial hashing
 of a raw sequence of input items.  Not with combining some existing hash
 values.  The rotate-and-xor method I suggested for that is borrowed
 exactly from section 6.4 of Knuth (page 512, in the first edition of
 volume 3).

It seems undesirable to me to have a situation where transposing two array 
elements doesn't change the hash value.  But I just work here.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Andres Freund
On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:
 Andres Freund and...@anarazel.de wrote:
  * You wont see an error if the next command after the IDLE in
  transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
  around that.
 Well, on a ROLLBACK I'm not sure it's a problem.  On a COMMIT,
 couldn't you call a function to check for it in CommitTransaction
 and PrepareTransaction?
Sure, throwing an error somewhere wouldnt be that hard. But at the moment a 
COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn). 
Doesn't seem to be something to changed lightly.

Does anybody have any idea why COMMIT is allowed there? Seems pretty strange 
to me.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Nov 2, 2010, at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, this is largely beside the point, because that theory, as well
 as the Java code you're arguing from, has to do with the initial hashing
 of a raw sequence of input items.  Not with combining some existing hash
 values.  The rotate-and-xor method I suggested for that is borrowed
 exactly from section 6.4 of Knuth (page 512, in the first edition of
 volume 3).

 It seems undesirable to me to have a situation where transposing two array 
 elements doesn't change the hash value.  But I just work here.

[ shrug... ]  There are always going to be collisions, and you're
overstating the importance of this one (only some transpositions will
result in a duplicate hash, not any transposition).

What's actually *important*, for our purposes, is that all bits of the
final hash value depend in a reasonably uniform way on all bits of the
input hash values.  If we don't have that property, then bucket numbers
(which we form by taking the low-order N bits of the final hash, where
N isn't known in advance) won't be as well distributed as we'd like.

It's possible that the multiply-by-31 method is as good as the
rotate-and-xor method by that measure, or even better; but it's far from
obvious that it's better.  And I'm not convinced that the multiply
method has a pedigree that should encourage me to take it on faith.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Alvaro Herrera
Excerpts from Andres Freund's message of mar nov 02 18:36:19 -0300 2010:
 On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:
  Andres Freund and...@anarazel.de wrote:
   * You wont see an error if the next command after the IDLE in
   transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
   around that.
  Well, on a ROLLBACK I'm not sure it's a problem.  On a COMMIT,
  couldn't you call a function to check for it in CommitTransaction
  and PrepareTransaction?
 Sure, throwing an error somewhere wouldnt be that hard. But at the moment a 
 COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn). 
 Doesn't seem to be something to changed lightly.

If the user calls COMMIT, it calls EndTransactionBlock, which ends up
calling AbortTransaction.  Can't you do it at that point?  (Says he who
hasn't looked at the patch at all)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:
 
  Well, on a ROLLBACK I'm not sure it's a problem.  On a COMMIT,
 couldn't you call a function to check for it in CommitTransaction
 and PrepareTransaction?
 Sure, throwing an error somewhere wouldnt be that hard. But at the
 moment a COMMIT is always successfull (and just reporting a
 ROLLBACK in a failed txn). 
 Doesn't seem to be something to changed lightly.
 
Well, I'm looking at using this with the Serializable Snapshot
Isolation (SSI) patch, which can throw an error from a COMMIT, if
the commit completes the conditions necessary for an anomaly to
occur (i.e., the committing transaction is on the rw-conflict *out*
side of a pivot, and it is the first in the set to commit). If we
succeed in enhancing SSI to use lists of rw-conflicted transactions
rather than its current techniques, we might be able to (and find it
desirable to) always commit in that circumstance and roll back some
*other* transaction which is part of the problem.  Of course, that
other transaction might be idle at the time, and the next thing *it*
tries to execute *might* be a COMMIT.
 
So if the SSI patch goes in, there will always be some chance that a
COMMIT can fail, but doing it through the mechanism your patch
provides could improve performance, because we could guarantee that
nobody ever has a serialization failure without first successfully
committing a transaction which wrote to one or more tables.  If a
commit fails due to SSI, it is highly desirable that the error use
the serialization failure SQLSTATE, so that an application framework
can know that it is reasonable to retry the transaction.
 
 Does anybody have any idea why COMMIT is allowed there? Seems
 pretty strange to me.
 
So that the failed transaction state can be cleared.  The
transaction as a whole has failed, but you don't want the connection
to become useless.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Sam Mason
On Sat, Oct 30, 2010 at 01:01:44PM -0400, Tom Lane wrote:
 marcin mank marcin.m...@gmail.com writes:
  This is what boost does:
  http://www.systomath.com/include/Boost-1_34/doc/html/boost/hash_combine.html
 
 Hmm.  I am reminded of Knuth's famous dictum: never generate random
 numbers with a method chosen at random.  Is there any actual theory
 behind that algorithm, and if so what is it?  The combination of
 shifting with addition (not xor) seems more likely to lead to weird
 cancellations than any improvement in the hash behavior.

As far as I can tell the boost combiner comes from:

  http://goanna.cs.rmit.edu.au/~jz/fulltext/jasist-tch.pdf

Which seems to be solving a completely different problem and I can't see
how relevant it is to the design of a hash combiner.  The fact that they
get trivial details wrong like 32bit integers going up to 2^32 rather
than 2^32-1 doesn't inspire confidence.

One subtle point I noticed on the boost mailing list was that they
didn't want [[1],2] hashing to the same value as [1,2].  I'm pretty sure
that this sort of construction isn't possible to express in PG, but
thought I should mention it just in case.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Andres Freund
On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote:
  Does anybody have any idea why COMMIT is allowed there? Seems
  pretty strange to me.
 
  
 So that the failed transaction state can be cleared.  The
 transaction as a whole has failed, but you don't want the connection
 to become useless.
Well, allowing ROLLBACK is enought there, isnt it?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create custom collation from case insensitive portuguese

2010-11-02 Thread Alexandre Riveira

Thank you all for your help !

When mentioned in Portuguese case-insensitive in fact we are also 
talking about accent-insensitive


A common example is that the name Jose and José also can be written, 
citext or ilike only not solve the problem


My progress is ...

Edit file /usr/share/i18n/locales/i18n e alter section tolower /, an 
example:


(U00C9,U00E9) e alter for (U00C9,U0065)

LOWER reproduce: LOWER(ITAPAGÉ) = itapage,

Example success:

SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%')

this behavior is reproduced in citext (logradouro is column citext)

SELECT * FROM endereco WHERE logradouro = 'itapage'
or
SELECT * FROM endereco WHERE logradouro LIKE 'itapage%'

All examples return the desired value ITAPAGÉ
Issue:

SELECT * FROM endereco WHERE logradouro LIKE 'itapage%'  NOT USE INDEX  
I tried

   CREATE INDEX cep_ik_logradouro ON cep USING btree (logradouro);
   CREATE INDEX like_index ON cep(logradouro varchar_pattern_ops);
   CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops);

I've had success with using index
SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%') 
and CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops)

But I want to solve using only citext


Tank's

Alexandre Riveira
Brazil




Robert Haas escreveu:

On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
  

Alexandre Riveira escreveu:


I've achieved some success in changing collate operating system (linux)
to generate sort of way of Brazil Portuguese hopes by adding the
following code in LC_COLLATE

  

This was already discussed; search the archives [1] [2].



So far, I understood the mechanism of change collate and reproduce in
postgresql, and I could not generate a case-insensitive search, I
believe that would change within the LC_COLLATE variable, but could not
go any further than that.

  

PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]. But you could use ILIKE or regular expression to
achieve  that.



Is citext also useful for this?

  



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create custom collation from case insensitive portuguese

2010-11-02 Thread Dimitri Fontaine
Alexandre Riveira alexan...@objectdata.com.br writes:
 When mentioned in Portuguese case-insensitive in fact we are also talking
 about accent-insensitive

See unaccent dictionary, but don't use only this one in your text search
configuration, IIRC.

  http://www.postgresql.org/docs/9/static/unaccent.html

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-11-02 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote:
 Does anybody have any idea why COMMIT is allowed there? Seems
 pretty strange to me.

 So that the failed transaction state can be cleared.  The
 transaction as a whole has failed, but you don't want the connection
 to become useless.

 Well, allowing ROLLBACK is enought there, isnt it?

The client has no reason to think the transaction has failed, so what
it's going to send is COMMIT, not ROLLBACK.  From the point of view of
the client, this should look like its COMMIT failed (or in general its
next command failed); not like there was some magic action-at-a-distance
on the state of its transaction.

Now you could argue that if you send COMMIT, and that fails, you should
have to send ROLLBACK to get to an idle state ... but that's not how
this ever worked before, and I don't think it's what the SQL standard
expects either.  After a COMMIT, you're out of the transaction either
way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for arrays

2010-11-02 Thread Ron Mayer
Tom Lane wrote:
 It's possible that the multiply-by-31 method is as good as the
 rotate-and-xor method by that measure, or even better; but it's far from
 obvious that it's better.  And I'm not convinced that the multiply
 method has a pedigree that should encourage me to take it on faith.

Short summary:
  * some history of it's trivia follows
  * (nothing here suggests it's better - just old and common and cheap)


Longer - some trivia regarding its pedigree:

It (or at least a *31 variant) seems to have a history of advocacy
going back to Chris Torek in 1990:
http://groups.google.com/group/comp.lang.c/browse_thread/thread/28c2095282f0c1b5/193be99e9836791b?q=#193be99e9836791b

  X#defineHASH(str, h, p) \
  X   for (p = str, h = 0; *p;) h = (h  5) - h + *p++

and gets referred to in Usenet papers in the early 90's as well:
http://www.usenix.com/publications/library/proceedings/sa92/salz.pdf


Regarding why the magic number 31 [or 33 which also often comes
up] apparently the only thing magic about it is that it's an
odd number != 1.The rest of the odd numbers work about as well
according to this guy who tried to explain it:

http://svn.eu.apache.org/repos/asf/apr/apr/trunk/tables/apr_hash.c
 * The magic of number 33, i.e. why it works better than many other
 * constants, prime or not, has never been adequately explained by
 * anyone. So I try an explanation: if one experimentally tests all
 * multipliers between 1 and 256 (as I did while writing a low-level
 * data structure library some time ago) one detects that even
 * numbers are not useable at all. The remaining 128 odd numbers
 * (except for the number 1) work more or less all equally well.
 * They all distribute in an acceptable way and this way fill a hash
 * table with an average percent of approx. 86%.
 *
 * If one compares the chi^2 values of the variants (see
 * Bob Jenkins ``Hashing Frequently Asked Questions'' at
 * http://burtleburtle.net/bob/hash/hashfaq.html for a description
 * of chi^2), the number 33 not even has the best value. But the
 * number 33 and a few other equally good numbers like 17, 31, 63,
 * 127 and 129 have nevertheless a great advantage to the remaining
 * numbers in the large set of possible multipliers: their multiply
 * operation can be replaced by a faster operation based on just one
 * shift plus either a single addition or subtraction operation. And
 * because a hash function has to both distribute good _and_ has to
 * be very fast to compute, those few numbers should be preferred.
 ...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers