[HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-23 Thread Alexander Korotkov

attached patch adds conversion from pg_wchar string to multibyte string.
This functionality is needed for my patch on index support for regular
expression search
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php .
Analyzing conversion from multibyte to pg_wchar I found following types of
1) Trivial conversion for single-byte encoding. It just adds leading zeros
to each byte.
2) Conversion from UTF-8 to unicode.
3) Conversions from euc* encodings. They write bytes of a character to
pg_wchar in inverse order starting from lower byte (this explanation assume
little endian system).
4) Conversion from mule encoding. This conversion is unclear for me and
also seems to be lossy.

It was easy to write inverse conversion for 1-3. I've changed 4 conversion
to behave like 3. I'm not sure my change is ok, because I didn't understand
original conversion.

With best regards,
Alexander Korotkov.

Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-04-23 Thread Marc Cousin
On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote:
 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta:
  2012-04-06 14:47 keltezéssel, Cousin Marc írta:
  On 05/04/12 08:02, Boszormenyi Zoltan wrote:
  2012-04-04 21:30 keltezéssel, Alvaro Herrera írta:
  I think this patch is doing two things: first touching infrastructure
  stuff and then adding lock_timeout on top of that.  Would it work to
  split the patch in two pieces?
  Sure. Attached is the split version.
  Best regards,
  Zoltán Böszörményi
  I've started looking at and testing both patches.
  Technically speaking, I think the source looks much better than the
  first version of lock timeout, and may help adding other timeouts in the
  future. I haven't tested it in depth though, because I encountered the
  following problem:
  While testing the patch, I found a way to crash PG. But what's weird is
  that it crashes also with an unpatched git version.
  Here is the way to reproduce it (I have done it with a pgbench schema):
  - Set a small statement_timeout (just to save time during the tests)
  =#lock TABLE pgbench_accounts ;
  Session 2:
  =#lock TABLE pgbench_accounts ;
  ERROR:  canceling statement due to statement timeout
  =# lock TABLE pgbench_accounts ;
  I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be
  done with a rollback to savepoint of course.
  Session 2 crashes with this : TRAP : FailedAssertion(«
  !(locallock-holdsStrongLockCount == 0) », fichier : « lock.c », ligne :
  It can also be done without a statement_timeout, and a control-C on the
  second lock table.
  I didn't touch anything but this. It occurs everytime, when asserts are
  I tried it on 9.1.3, and I couldn't make it crash with the same sequence
  of events. So maybe it's something introduced since ? Or is the assert
  still valid ?
  Attached are the new patches. I rebased them to current GIT and
  they are expected to be applied after Robert Haas' patch in the
  bug in fast-path locking thread.
  Now it survives the above scenario.
  Best regards,
  Zoltán Böszörményi
 New patch attached, rebased to today's GIT.
 Best regards,
 Zoltán Böszörményi

Ok, I've done what was missing from the review (from when I had a bug in
locking the other day), so here is the full review. By the way, this
patch doesn't belong to current commitfest, but to the next one.

Is the patch in context diff format?

Does it apply cleanly to the current git master?

Does it include reasonable tests, necessary doc patches, etc?
The new lock_timeout GUC is documented. There are regression tests.

Read what the patch is supposed to do, and consider:
Does the patch actually implement that?

Do we want that?
I do. Mostly for administrative jobs which could lock the whole
application. It would be much easier to run reindexes, vacuum full, etc…
without worrying about bringing application down because of lock

Do we already have it?

Does it follow SQL spec, or the community-agreed behavior?
I don't know if there is a consensus on this new GUC. statement_timeout
is obviously not in the SQL spec.

Does it include pg_dump support (if applicable)?
Not applicable

Are there dangers?
Yes, as it rewrites all the timeout code. I feel it is much cleaner this
way, as there is a generic set of function managing all sigalarm code,
but it heavily touches this part.

Have all the bases been covered?
I tried all sql statements I could think of (select, insert, update,
delete, truncate, drop, create index, adding constraint, lock.

I tried having statement_timeout, lock_timeout and deadlock_timeout at
very short and close or equal values. It worked too.

Rollback to savepoint while holding locks dont crash PostgreSQL anymore.

Other timeouts such as archive_timeout and checkpoint_timeout still

Does the feature work as advertised?

Are there corner cases the author has failed to consider?
I didn't find any.

Are there any assertion failures or crashes?

Does the patch slow down simple tests?

If it claims to improve performance, does it?
Not applicable

Does it slow down other things?

Does it follow the project coding guidelines?
I think so

Are there portability issues?
No, all the portable code (acquiring locks and manipulating sigalarm) is
the same as before.

Will it work on Windows/BSD etc?
It should. I couldn't test it though.

Are the comments sufficient and accurate?

Does it do what it says, correctly?

Does it produce compiler warnings?

Can you make it crash?
Not anymore

Is everything done in a way that fits together coherently with other
Yes, I think so. The new way of handling sigalarm seems more robust to

Are there interdependencies that can cause problems?
I don't see any.



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-23 Thread Sandro Santilli
On Sat, Apr 21, 2012 at 02:28:52PM +0800, Qi Huang wrote:
 Hi, Heikki
  Another idea that Robert Haas suggested was to add support doing a TID 
  scan for a query like WHERE ctid '(501,1)'. That's not enough work 
  for GSoC project on its own, but could certainly be a part of it.
 the first one and the last one are still not clear. 

The last one was the TID scan on filters like ctid  '(501,1)'.
TID scans are the fastest access method as they directly access
explicitly referenced addresses. Starting from this observation a sampling
function may select random pages and tuples within pages and directly
access them, optimizing accesses by grouping tuples within the same
page so to fetch them all togheter.

This is what the ANALYZE command already does when providing samples
for the type analyzers.

Unfortunately it looks like at SQL level only the equality operator triggers
a TID scan, so things like WHERE ctid  '(501,1)' won't be as fast as
fetching all visible tuples in the first 501 pages.

I think that's what Heikki was referring about.

I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
using a tidscan.  Something like: WHERE ctid =~ '(501,*)' or a ctidrange.


  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] Namespace of array of user defined types is confused by the parser in insert?

2012-04-23 Thread Krzysztof Nienartowicz
Sorry for re-posting - I initially posted this in pgsql.sql - probably
this group is more appropriate.

I have a bizzare problem that started to manifest itself after
addition of field being the array of compound UDTs to the table
declared in multiple schemas.
It is clearly related to how the type namespace is resolved and shows
up for the JDBC client (probably related to the paramterized query, as
the static query works without problems).

given the types:

drop type if exists periodSearchResults cascade;
create type periodSearchResults as
 period float8[], -- [1] - 1st value, kth... value for the same ts
 periodError float8[],
 probability float8[],
 amplitude float8[]

drop type if exists periodTSResult cascade;
create type periodTSResult as
   method text,
   periods periodSearchResults[] -- 1st raw ts, 2...kth - residual

and a table defined as:

CREATE TABLE timeseriesresult (
   id bigint NOT NULL,
--- ...
   fperiodsearchresults periodTSResult[]

when the type periodTSResult is defined in more than one schema, I got
error for insert using JDBC client (or prepared statement - it works
with a non-parametrized query) in any other then a first schema it was
defined in - in this case the first schema it was defined in is
cu7user_test, the schema of the execution is cu7user_test_2:
ERROR:  column fperiodsearchresults is of type periodtsresult[] but
expression is of type cu7user_test.periodtsresult[] at character 1416
HINT:  You will need to rewrite or cast the expression.
STATEMENT:  INSERT INTO cu7user_test_2.timeseriesresult (runid,
catalogid, sourceid, ftimeseriestype, fstate,
fminimizedfunctionminval, freducedchi2, fstopcriteria,
fweightedmodeling, fhomoscedasticitytest, fkurtosis,
fnumpointsobstime, fljungboxrandomnesstest, fmedianabsolutedeviation,
fmax, fmeanobstime, fmean, fmeanerror, fmedian, fmedianerror, fmin,
frange, frobustweightedstddev, fskewness, fstddev, fsymmetrytest,
ftrimmedweightedmean, ftrimmedweightedrange, fvariabilityflag,
fstatvariabilityflag, fweightedkurtosis, fweightedmean,
fweightedmeanconfidenceinterval, fweightedmeanobstime,
fweightednormalizedp2pscatter, fweightedskewness, fweightedstddevdf,
fweightedstddevwdf, fabbe, fchi2, fiqr, foutliermedian, fpstetson,
fpabbe, fpchi2, fpiqr, fpkurtosis, fpoutliermedian, fpskew,
fpweightedkurtosis, fpweightedskew, fstetson, referencetime,
cadencevalue, cadenceerror, cadencename, fperiodsearchmethod,
fweightedpercentileranks, fweightedpercentiles, fotherparameters,
ffundamentalfrequencies, mapfperiodsearchresults,
fperiodsearchresults, fpolynomialcoefficients, ffouriercoefficients,
derivedtschanges) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10,
$11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24,
$25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38,
$39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52,
$53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66)
DEBUG:  parse S_10: ROLLBACK

I debugged a little to find out that indeed there are two types
defined in separate schemas:

SELECT t.oid, t.typname,(select nspname from pg_namespace n where
n.oid=typnamespace) nspname ,typarray  FROM pg_catalog.pg_type t where
typname like 'periodts%';


 oid  |typname |nspname | typarray
 16646 | periodtsresult | cu7user_test   |16645
 17123 | periodtsresult | cu7user_test_2 |17122

but for user/schema cu7user_test_2

if (!can_coerce_type(1, exprtype, targettype, ccontext))
   return NULL;
the function returns false and subsequently coerce_to_target_type
returns null resulting in the above error.
This is caused by the
parse_coerce: 421 expr pointer returning type_id for the type defined
in the '1st' schema:

type_id = exprType((Node *) expr); //-- type_id returned is 16645,
should be 17122

expr pointer value is already wrong in the parsed list in the loop
starting at analyze:799

foreach(lc, exprlist) - wrong type_id already for the array UDT

Is there any way of avoid this error different than having a single
type defined for all schemas?
Any hints appreciated..
Postgres 9.1.3, latest JDBC driver.

Best regards,

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] ECPG FETCH readahead

2012-04-23 Thread Boszormenyi Zoltan


2012-04-17 06:48 keltezéssel, Michael Meskes írta:

On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote:

I listed two scenarios.
1. occasional bump of the readahead window for large requests,
for smaller requests it uses the originally set size
2. permanent bump of the readahead window for large requests
(larger than previously seen), all subsequent requests use
the new size

Both can be implemented easily, which one do you prefer?
If you always use very large requests, 1) behaves like 2)

I'd say let's go for #2. #1 is probably more efficient but not what the
programmer asked us to do. After all it's easy to increase the window size
accordingly if you want so as a programmer.


OK, I will implement #2. Another question popped up: what to do
with FETCH ALL? The current readahead window size or temporarily
bumping it to say some tens of thousands can be used. We may not
know how much is the all records. This, although lowers performance,
saves memory.

Please, don't apply this patch yet. I discovered a rather big hole
that can confuse the cursor position tracking if you do this:

DECLARE mycur;

If (n+m) is greater, but (n-m) is smaller than the number
of rows in the cursor, the backend's and the caching code's
ideas about where the cursor is will differ. I need to fix this
before it can be applied.

That will also need a new round of review. Sorry for that.

Best regards,
Zoltán Böszörményi

Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] B-tree page deletion boundary cases

2012-04-23 Thread Noah Misch
On Sun, Apr 22, 2012 at 12:13:34AM +0530, Nikhil Sontakke wrote:
 Was wondering if there's a similar bug which gets triggered while using
 VACUUM FULL. See for instance this thread:
 This issue has been reported on-off from time to time and in most cases
 VACUUM or VACUUM FULL appears to be involved. We have usually attributed it
 to hardware issues and reindex has been recommended by default as a
 solution/work around..

I do not perceive much similarity.  The bug I've raised can produce wrong
query results transiently.  It might permit injecting a tuple into the wrong
spot in the tree, yielding persistent wrong results.  It would not introduce
tree-structural anomalies like sibling pointers directed at zeroed pages or
internal pages in an 1-level tree.  Given the symptoms you reported, I share
Robert's suspicion of WAL replay in your scenario.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] psql omits row count under \x auto

2012-04-23 Thread Noah Misch
I've been enjoying \x auto in .psqlrc, but I noticed the row count footer
missing when it chooses ordinary output:

[local] test=# \x off
Expanded display is off.
[local] test=# select 1;
(1 row)

[local] test=# \x auto
Expanded display is used automatically.
[local] test=# select 1;

[local] test=#

Looks like the logic in printQuery() needs further treatment.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-23 Thread Ants Aasma
On Mon, Apr 23, 2012 at 4:37 PM, Sandro Santilli s...@keybit.net wrote:
 I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
 using a tidscan.  Something like: WHERE ctid =~ '(501,*)' or a ctidrange.

Among other things, this would enable user-space implementation of
tablesample. Given the operator =~(tid, int) that matches the page
number and planner/executor integration so that it results in a TID
scan, you would need the following functions:

random_pages(tbl regclass, samples int) returns int[]
aggregate function:
reservoir_sample(item anyelement, samples int) returns anyarray

Implementations for both of the functions could be adapted from analyze.c.

Then tablesample could be implemented with the following query:
SELECT (SELECT reservoir_sample(some_table, 50) AS samples
   FROM some_table WHERE ctid =~ ANY (rnd_pgtids))
FROM random_pages('some_table', 50) AS rnd_pgtids;

Actually, now that I think about it, it could actually be implemented
without any modifications to core at some cost to efficiency.
random_pages would have to return tid[] that contains for each
generated pagenumber all possible tids on that page.

By making the building blocks available users get more flexibility.
The downside would be that we can't automatically make better sampling
methods available.

Ants Aasma
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-23 Thread Robert Haas
On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The internal representation doesn't have to be (and certainly
 shouldn't be) numeric.  But if you translate to numeric before
 returning the data to the user, then you have the freedom, in the
 future, to whack around the internal representation however you like,
 without breaking backward compatibility.  Choosing float8 for the
 external representation is fine as long as we're sure we're not ever
 going to want more than 16 significant digits, but I see no particular
 value in baking in that assumption.  But perhaps, as the saying goes,
 16 digits ought to be enough for anyone.

 There's no particular reason to think that Moore's Law is going to
 result in an increase in the fractional precision of timing data.
 It hasn't done so in the past, for sure.

Perhaps, but nobody's explained what we gain out of NOT using numeric.
 It's slow doesn't impress me; selecting from a system view doesn't
need to be lightning-fast.

However, the main thing here is that we need to do *something* here...

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Aggressive memory consumption in {ts,array}_typanalyze

2012-04-23 Thread Robert Haas
On Wed, Apr 18, 2012 at 11:09 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Apr 16, 2012 at 4:58 PM, Noah Misch n...@leadboat.com wrote:
 The size hint I chose is fairly arbitrary.  Any suggestions for principled

 Based on your test results, it doesn't seem like it matters very much
 what you put in there, so I'm inclined to think that num_mcelem is
 fine.  I thought about maybe allowing for a little slop, like
 num_mcelem * 10, but maybe the way you did it is better.  It's
 possible that people will set ridiculously overblown stats targets on
 some columns, and that's certainly going to cost something no matter
 what we do, but there's no point in making that worse than it has to
 be without some clear reason for doing so.

Hearing no further comments, I have committed your patch.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription: