Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2008-03-11 Thread Bruce Momjian

Added to TODO for COPY:

  o Allow COPY in CSV mode to control whether  is treated as NULL

http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php


---

Andrew Dunstan wrote:
 
 [redirecting to -hackers]
 
 Stephen Frost wrote:
  * Gregory Stark ([EMAIL PROTECTED]) wrote:

  Tom Lane [EMAIL PROTECTED] writes:
 
  
  Stephen Frost [EMAIL PROTECTED] writes:

Please find attached a minor patch to remove the constraints that a
user can't include the delimiter or quote characters in a 'NULL AS'
string when importing CSV files.
  
  This can't really be sane can it?


 
 Not very, no :-)

  The alternative would be interpreting NULL strings after dequoting but that
  would leave no way to include the NULL string literally. This solution 
  means
  there's no way to include it (if it needs quoting) but only when you 
  specify
  it this way.
  
 
  Yeah, interpreting NULLs after dequoting means you've lost the
  information about if it's quoted or not, or you have to add some funky
  syntax to say if it's quoted, do it differently..., which is no good,
  imv.
 
  What the patch does basically is say give us the exact string that
  shows up between the unquoted delimiters that you want to be treated
  as a NULL.  This removes the complexity of the question about quoting,
  unquoting, whatever, and makes it a very clear-cut, straight-forward
  solution with no impact on existing users, imv.
 
  

 
 This looks too clever by half, to me. Someone facing the problem you are 
 facing would have to dig quite deep to find the solution you're promoting.
 
 A much better way IMNSHO would be to add an extra FORCE switch. On 
 input, FORCE NOT NULL says to treat an unquoted null as the literal 
 value rather than as a null field for the columns named. The reverse 
 would be to tell it to treat a quoted null as null rather than as the 
 literal value, for the named columns. Perhaps that should just be FORCE 
 NULL columnlist. It would be more explicit and at the same time would 
 only apply to the named columns, rather than discarding totally the 
 ability to distinguish between null and not null values.
 
 This should probably be discussed on -hackers, anyway.
 
 
 
 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-31 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Other, unrelated, options being or not being there doesn't really have
 any bearing on this though.  I'm not inventing new syntax here.  I'm
 just removing a restriction on what the user can do that doesn't need
 to exist.

I don't think you're just removing a restriction.  What you're doing
is exposing a whole lot of strange and arguably broken corner cases.
If we accept this patch I think we'll be fielding bug reports as a
result for years to come.  I *especially* dislike the part about
allowing the delimiter character in the null string --- that will allow
people to complain about the order in which decisions are made.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-31 Thread Andrew Dunstan



Tom Lane wrote:

Stephen Frost [EMAIL PROTECTED] writes:
  

Other, unrelated, options being or not being there doesn't really have
any bearing on this though.  I'm not inventing new syntax here.  I'm
just removing a restriction on what the user can do that doesn't need
to exist.



I don't think you're just removing a restriction.  What you're doing
is exposing a whole lot of strange and arguably broken corner cases.
If we accept this patch I think we'll be fielding bug reports as a
result for years to come.  I *especially* dislike the part about
allowing the delimiter character in the null string --- that will allow
people to complain about the order in which decisions are made.


  


Yeah, if you allow the delimiter in the null string, what do you do if 
it's not quoted? I can't imagine what the real world case for that could 
possibly be.


Even if there's an arguable case for allowing the quote char in a null 
string (and as I indicated upthread I really think the problem being 
addressed here could be solved in a far better fashion) there is surely 
no good case for allowing the delimiter. Oh, and if we did allow the 
quote char we should surely only allow it on input - just because other 
programs produce absurd output there is not reason we should.


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-31 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 ... Oh, and if we did allow the 
 quote char we should surely only allow it on input - just because other 
 programs produce absurd output there is not reason we should.

Yeah.  The *real* problem with the patch as proposed is that it allows a
COPY OUT to emit a file that cannot be reloaded correctly, even given
the same options used to prepare it.  I think that the restrictions were
put there more to prevent that scenario than to restrict COPY IN.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-31 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  ... Oh, and if we did allow the 
  quote char we should surely only allow it on input - just because other 
  programs produce absurd output there is not reason we should.
 
 Yeah.  The *real* problem with the patch as proposed is that it allows a
 COPY OUT to emit a file that cannot be reloaded correctly, even given
 the same options used to prepare it.  I think that the restrictions were
 put there more to prevent that scenario than to restrict COPY IN.

erp.  My apologies, I hadn't ever intended for this to be used with COPY
OUT.  For some reason I had thought my changes were isolated to the COPY
CSV IN path.  I'd be happy to adjust the patch to only accept the
quote-in-null syntax when doing a COPY CSV IN.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Andrew Dunstan


[redirecting to -hackers]

Stephen Frost wrote:

* Gregory Stark ([EMAIL PROTECTED]) wrote:
  

Tom Lane [EMAIL PROTECTED] writes:



Stephen Frost [EMAIL PROTECTED] writes:
  

  Please find attached a minor patch to remove the constraints that a
  user can't include the delimiter or quote characters in a 'NULL AS'
  string when importing CSV files.


This can't really be sane can it?
  
  


Not very, no :-)
  

The alternative would be interpreting NULL strings after dequoting but that
would leave no way to include the NULL string literally. This solution means
there's no way to include it (if it needs quoting) but only when you specify
it this way.



Yeah, interpreting NULLs after dequoting means you've lost the
information about if it's quoted or not, or you have to add some funky
syntax to say if it's quoted, do it differently..., which is no good,
imv.

What the patch does basically is say give us the exact string that
shows up between the unquoted delimiters that you want to be treated
as a NULL.  This removes the complexity of the question about quoting,
unquoting, whatever, and makes it a very clear-cut, straight-forward
solution with no impact on existing users, imv.


  


This looks too clever by half, to me. Someone facing the problem you are 
facing would have to dig quite deep to find the solution you're promoting.


A much better way IMNSHO would be to add an extra FORCE switch. On 
input, FORCE NOT NULL says to treat an unquoted null as the literal 
value rather than as a null field for the columns named. The reverse 
would be to tell it to treat a quoted null as null rather than as the 
literal value, for the named columns. Perhaps that should just be FORCE 
NULL columnlist. It would be more explicit and at the same time would 
only apply to the named columns, rather than discarding totally the 
ability to distinguish between null and not null values.


This should probably be discussed on -hackers, anyway.



cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Andrew Dunstan



Stephen Frost wrote:


I'm honestly not a big fan of the columnlist approach that's been
taken with the options.  While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.
  


Whether or not you like it, the fact is it's there. I think any solution 
should be consistent with what is done.


When CSV was first discussed we looked at doing type-specific behaviour. 
The end of the long debate was that we simply couldn't do that safely, 
and the only recourse was to require the user to specify the behaviour 
required if it differed from the default. You might be inclined to want 
to revisit that, but I am not.


cheers

andrew



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 This looks too clever by half, to me. Someone facing the problem you are 
 facing would have to dig quite deep to find the solution you're promoting.

Oddly enough, it was one of the first things I tried when I discovered
it wasn't just realizing that ,, for an integer column meant NULL (and
instead was complaining loudly that you can't convert an empty string
into an integer).  It's also pretty clear, to me at least, to say 
put the exact string that shows up between the delimiters here
that you want treated as a NULL rather than well, if it's a column
which is quoted then you have to jump through these hoops and tell PG
about each one, but if it's not quoted you have to do this, etc, etc.

 A much better way IMNSHO would be to add an extra FORCE switch. On input, 
 FORCE NOT NULL says to treat an unquoted null as the literal value rather 
 than as a null field for the columns named. The reverse would be to tell it 
 to treat a quoted null as null rather than as the literal value, for the 
 named columns. Perhaps that should just be FORCE NULL columnlist. It 
 would be more explicit and at the same time would only apply to the named 
 columns, rather than discarding totally the ability to distinguish between 
 null and not null values.

I don't see that it needs to be 'more explicit', that's just silly.
Either the user indicated they want it, or they didn't.  What you're
suggesting adds in a bunch of, imv, unnecessary complication and ends up
making the resulting code that much bigger and uglier for not much gain.

I'm honestly not a big fan of the columnlist approach that's been
taken with the options.  While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.

We could instead flip it around and force the users to specify, for
each column, what, exactly, should be done for that column by having
them specify a regexp for that column.  The regexp would implicitly have
the delimiter on each side of it and we'd just step through the string
matching as far as we can for each column.  Then it's nice and explicit
for everyone but probably not much fun to use.

 This should probably be discussed on -hackers, anyway.

As a small, unobtrusive patch, I felt it didn't need a long discussion
about what everyone's CSV files look like and how that just shouldn't
be done or that's just not sane.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 I'm honestly not a big fan of the columnlist approach that's been
 taken with the options.  While I understand the desire to seperate the
 parsing from the typing, making the users essentially do that association
 for us by way of making them specify how to handle each column explicitly
 is worse than just accepting that different types may need to be handled
 in different ways.
   

 Whether or not you like it, the fact is it's there. I think any solution 
 should be consistent with what is done.

Other, unrelated, options being or not being there doesn't really have
any bearing on this though.  I'm not inventing new syntax here.  I'm
just removing a restriction on what the user can do that doesn't need
to exist.  Indeed, other more convoluted and complex things could still
be added, if someone wants them, this doesn't prevent that.

Thanks,

Stephen


signature.asc
Description: Digital signature