date:20140307

Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-03-07 Thread Michael Paquier

On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 03/05/2014 09:11 AM, Michael Paquier wrote:
 After testing this feature, I noticed that FORCE_NULL and
 FORCE_NOT_NULL can both be specified with COPY on the same column.

 Strictly they are not actually contradictory, since FORCE NULL relates
 to quoted null strings and FORCE NOT NULL relates to unquoted null
 strings. Arguably the docs are slightly loose on this point. Still,
 applying both FORCE NULL and FORCE NOT NULL to the same column would be
 rather perverse, since it would result in a quoted null string becoming
 null and an unquoted null string becoming not null.

 Given the remarkable lack of standardization of CSV output, who's
 to say that there might not be data sources out there for which this
 is the desired behavior?  It's weird, I agree, but I think throwing
 an error for the combination is not going to be helpful.  It's not
 like somebody might accidentally write both on the same column.

 +1 for clarifying the docs, though, more or less in the words you
 used above.
Following that, I have hacked the patch attached to update the docs
with an additional regression test (actually replaces a test that was
the same as the one before in copy2).

I am attaching as well a second patch for file_fdw, to allow the use
of force_null and force_not_null on the same column, to be consistent
with COPY.
Regards,
-- 
Michael
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 7fb1dbc..97a35d0 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -267,11 +267,6 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg(conflicting or redundant options),
 		 errhint(option \force_not_null\ supplied more than once for a column)));
-			if(force_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg(conflicting or redundant options),
-		 errhint(option \force_not_null\ cannot be used together with \force_null\)));
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
@@ -284,11 +279,6 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg(conflicting or redundant options),
 		 errhint(option \force_null\ supplied more than once for a column)));
-			if(force_not_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg(conflicting or redundant options),
-		 errhint(option \force_null\ cannot be used together with \force_not_null\)));
 			force_null = def;
 			(void) defGetBoolean(def);
 		}
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 0c278aa..b608372 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -91,24 +91,22 @@ ALTER FOREIGN TABLE text_csv OPTIONS (SET format 'csv');
 \pset null _null_
 SELECT * FROM text_csv;
 
+-- force_not_null and force_null can be used together on the same column
+ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true');
+ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS (force_not_null 'true');
+
 -- force_not_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_not_null '*'); -- ERROR
 ALTER SERVER file_server OPTIONS (ADD force_not_null '*'); -- ERROR
 CREATE USER MAPPING FOR public SERVER file_server OPTIONS (force_not_null '*'); -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_not_null '*'); -- ERROR
 
--- force_not_null cannot be specified together with force_null
-ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true'); --ERROR
-
 -- force_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_null '*'); -- ERROR
 ALTER SERVER file_server OPTIONS (ADD force_null '*'); -- ERROR
 CREATE USER MAPPING FOR public SERVER file_server OPTIONS (force_null '*'); -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_null '*'); -- ERROR
 
--- force_null cannot be specified together with force_not_null
-ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS (force_not_null 'true'); --ERROR
-
 -- basic query tests
 SELECT * FROM agg_text WHERE b  10.0 ORDER BY a;
 SELECT * FROM agg_csv ORDER BY a;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 2bec160..bc183b8 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -115,6 +115,9 @@ SELECT * FROM text_csv;
  ABC   | abc|| 
 (5 rows)
 
+-- force_not_null and force_null can be used together on the same column
+ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true');
+ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS

Re: [HACKERS] syslog_ident mentioned as syslog_identify in the docs

2014-03-07 Thread Heikki Linnakangas


On 03/07/2014 08:24 AM, Michael Paquier wrote:

In the documentation, particularly the doc index, syslog_ident is
incorrectly mentioned as syslog_identify. The attached patch fixes
that. This error is in the docs since 8.0.


Thanks, fixed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe Reply-To:

2014-03-07 Thread Simon Riggs

On 6 March 2014 22:43, Noah Misch n...@leadboat.com wrote:
 On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote:
 On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:  I think this is all too
  late for 9.4, though.
 
  I agree with the feeling that a meaningful fix for pg_dump isn't going
  to get done for 9.4.  So that leaves us with the alternatives of (1)
  put off the lock-strength-reduction patch for another year; (2) push
  it anyway and accept a reduction in pg_dump reliability.
 
  I don't care for (2).  I'd like to have lock strength reduction as
  much as anybody, but it can't come at the price of reduction of
  reliability.

 I am sorry, but I think this is vastly overstating the scope of the
 pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
 amount of problems that has caused in the past is surprisingly low. If
 such a frequently used command doesn't cause problems, why are you
 assuming other commands to be that problematic? And I think it's hard to
 argue that the proposed changes are more likely to cause problems.

 Let's try to go at this a bit more methodically. The commands that -
 afaics - change their locklevel due to latest patch (v21) are:
 [snip]

 Good analysis.  The hazards arise when pg_dump uses one of the ruleutils.c
 deparse worker functions.  As a cross-check to your study, I looked briefly at
 the use of those functions in pg_dump and how this patch might affect them:

 -- pg_get_constraintdef()

 pg_dump reads the constraint OID with its transaction snapshot, so we will
 never see a too-new constraint.  Dropping a constraint still requires
 AccessExclusiveLock.

Agreed

 Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its
 transaction snapshot and uses that to decide whether to dump the CHECK
 constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD
 CONSTRAINT following the data load.  However, pg_get_constraintdef() reads the
 latest convalidated to decide whether to emit NOT VALID.  Consequently, one
 can get a dump in which the dumped table data did not yet conform to the
 constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails.
 (Suppose you deleted the last invalid rows just before executing the VALIDATE
 CONSTRAINT.  I tested this by committing the DELETE + VALIDATE CONSTRAINT with
 pg_dump stopped at getTableAttrs().)

 One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT
 problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did
 not do so.  It's, conveniently, the last part of the definition.  I would tend
 to choose this.  We could also just decide this isn't bad enough to worry
 about.  The consequence is that an ALTER TABLE ADD CONSTRAINT fails.  Assuming
 no --single-transaction for the original restoral, you just add NOT VALID to
 the command and rerun.  Like most of the potential new pg_dump problems, this
 can already happen today if the relevant database changes happen between
 taking the pg_dump transaction snapshot and locking the tables.

Too hacky for me, but some good thinking. My proposed solution is below.

 -- pg_get_expr() for default expressions

 pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will
 never see a too-new default.  This does allow us to read a dropped default.
 That's not a problem directly.  However, suppose the default references a
 function dropped at the same time as the default.  pg_dump could fail in
 pg_get_expr().

 -- pg_get_indexdef()

 As you explained elsewhere, new indexes are no problem.  DROP INDEX still
 requires AccessExclusiveLock.  Overall, no problems here.

 -- pg_get_ruledef()

 The patch changes lock requirements for enabling and disabling of rules, but
 that is all separate from the rule expression handling.  No problems.

 -- pg_get_triggerdef()

 The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock.
 The implications for pg_dump are similar to those for pg_get_expr().

These are certainly concerning. What surprises me the most is that
pg_dump has been so happy to randomly mix SQL using the transaction
snapshot with sys cache access code using a different snapshot. If
that was intention there is no documentation in code or in the docs to
explain that.

 -- pg_get_viewdef()

 Untamed: pg_dump does not lock views at all.

OMG, its really a wonder pg_dump works at all.

 One thing not to forget is that you can always get the old mutual exclusion
 back by issuing LOCK TABLE just before a DDL operation.  If some unlucky user
 regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a
 workaround.  There's no comparable way for someone who would not experience
 that problem to weaken the now-hardcoded AccessExclusiveLock.  Many
 consequences of insufficient locking are too severe for that workaround to
 bring comfort, but the pg_dump failure scenarios around pg_get_expr() and
 pg_get_triggerdef()

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Heikki Linnakangas

On 03/06/2014 09:34 PM, Robert Haas wrote:

On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I don't think it's necessary to improve concurrency just to get WAL-logging.
Better concurrency is a worthy goal of its own, of course, but it's a
separate concern.

To some extent, I agree, but only to some extent. To make hash
indexes generally usable, we really need to solve both problems. When
I got rid of just some of the heavyweight locking in commit
76837c1507cb5a5a0048046233568447729e66dd, the results were pretty
dramatic at higher levels of concurrency:

http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com

But there was still an awful lot of contention inside the heavyweight
lock manager, and I don't think hash indexes are going to be ready for
prime time until we solve that problem.

Hmm. You suggested ensuring that a scan always has at least a pin, and
split takes a vacuum-lock. That ought to work. There's no need for the
more complicated maneuvers you described, ISTM that you can just replace
the heavy-weight share lock with holding a pin on the primary page of
the bucket, and an exclusive lock with a vacuum-lock. Note that
_hash_expandtable already takes the exclusive lock conditionally, ie. if
it doesn't get the lock immediately it just gives up. We could do the
same with the cleanup lock.

Vacuum could also be enhanced. It currently takes an exclusive lock on
the bucket, then removes any dead tuples and finally squeezes the
bucket by moving tuples to earlier pages. But you only really need the
exclusive lock for the squeeze-phase. You could do the dead-tuple
removal without the bucket-lock, and only grab for the squeeze-phase.
And the squeezing is optional, so you could just skip that if you can't
get the lock. But that's a separate patch as well.

One more thing we could do to make hash indexes more scalable,
independent of the above: Cache the information in the metapage in
backend-private memory. Then you (usually) wouldn't need to access the
metapage at all when scanning. Store a copy of the bitmask for that
bucket in the primary page, and when scanning, check that it matches the
cached value. If not, refresh the cached metapage and try again.

So, there seems to be a few fairly simple and independent improvements
to be made:

1. Replace the heavy-weight lock with pin vacuum-lock.
2. Make it crash-safe, by adding WAL-logging
3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM
for the squeeze phase.

4. Cache the metapage.

We still don't know if it's going to be any better than B-tree after all
that's done, but the only way to find out is to go ahead and implement it.

This seems like a great GSoC project to me. We have a pretty good idea
of what we want to accomplish. It's uncontroversial: I don't think
anyone is going to object to improving hash indexes (one could argue
that it's a waste of time, but that's different from objecting to the
idea). And it consists of a few mostly independent parts, so it's
possible to do incrementally which makes it easier to track progress,
and we'll probably have something useful at the end of the summer even
if it doesn't all get finished.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

82 matches

Mail list logo