Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-03 Thread Bruce Momjian

Patch applied.  Thanks.

---


Andrew Dunstan wrote:
> 
> 
> I wrote:
> 
> >
> > If it bothers you that much. I'd make a flag, cleared at the start of 
> > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, 
> > if the flag is not set then set it and issue the warning.
> 
> 
> 
> I didn't realise until Bruce told me just now that I was on the hook for 
> this. I guess i should keep my big mouth shut. (Yeah, that's gonna 
> happen ...)
> 
> Anyway, here's a tiny patch that does what I had in mind.
> 
> cheers
> 
> andrew

[ text/x-patch is unsupported, treating like TEXT/PLAIN ]

> Index: copy.c
> ===
> RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
> retrieving revision 1.234
> diff -c -r1.234 copy.c
> *** copy.c6 Nov 2004 17:46:27 -   1.234
> --- copy.c2 Dec 2004 23:34:20 -
> ***
> *** 98,103 
> --- 98,104 
>   static EolType eol_type;/* EOL type of input */
>   static int  client_encoding;/* remote side's character encoding */
>   static int  server_encoding;/* local encoding */
> + static bool embedded_line_warning;
>   
>   /* these are just for error messages, see copy_in_error_callback */
>   static bool copy_binary;/* is it a binary copy? */
> ***
> *** 1190,1195 
> --- 1191,1197 
>   attr = tupDesc->attrs;
>   num_phys_attrs = tupDesc->natts;
>   attr_count = list_length(attnumlist);
> + embedded_line_warning = false;
>   
>   /*
>* Get info about the columns we need to process.
> ***
> *** 2627,2632 
> --- 2629,2653 
>!use_quote && (c = *test_string) != '\0';
>test_string += mblen)
>   {
> + /*
> +  * We don't know here what the surrounding line end characters
> +  * might be. It might not even be under postgres' control. So
> +  * we simple warn on ANY embedded line ending character.
> +  *
> +  * This warning will disappear when we make line parsing 
> field-aware,
> +  * so that we can reliably read in embedded line ending 
> characters
> +  * regardless of the file's line-end context.
> +  *
> +  */
> + 
> + if (!embedded_line_warning  && (c == '\n' || c == '\r') )
> + {
> + embedded_line_warning = true;
> + elog(WARNING,
> +  "CSV fields with embedded linefeed or carriage 
> return "
> +  "characters might not be able to be 
> reimported");
> + }
> + 
>   if (c == delimc || c == quotec || c == '\n' || c == '\r')
>   use_quote = true;
>   if (!same_encoding)

> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-02 Thread Andrew Dunstan

I wrote:
If it bothers you that much. I'd make a flag, cleared at the start of 
each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, 
if the flag is not set then set it and issue the warning.

I didn't realise until Bruce told me just now that I was on the hook for 
this. I guess i should keep my big mouth shut. (Yeah, that's gonna 
happen ...)

Anyway, here's a tiny patch that does what I had in mind.
cheers
andrew
Index: copy.c
===
RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.234
diff -c -r1.234 copy.c
*** copy.c	6 Nov 2004 17:46:27 -	1.234
--- copy.c	2 Dec 2004 23:34:20 -
***
*** 98,103 
--- 98,104 
  static EolType eol_type;		/* EOL type of input */
  static int	client_encoding;	/* remote side's character encoding */
  static int	server_encoding;	/* local encoding */
+ static bool embedded_line_warning;
  
  /* these are just for error messages, see copy_in_error_callback */
  static bool copy_binary;		/* is it a binary copy? */
***
*** 1190,1195 
--- 1191,1197 
  	attr = tupDesc->attrs;
  	num_phys_attrs = tupDesc->natts;
  	attr_count = list_length(attnumlist);
+ 	embedded_line_warning = false;
  
  	/*
  	 * Get info about the columns we need to process.
***
*** 2627,2632 
--- 2629,2653 
  		 !use_quote && (c = *test_string) != '\0';
  		 test_string += mblen)
  	{
+ 		/*
+ 		 * We don't know here what the surrounding line end characters
+ 		 * might be. It might not even be under postgres' control. So
+ 		 * we simple warn on ANY embedded line ending character.
+ 		 *
+ 		 * This warning will disappear when we make line parsing field-aware,
+ 		 * so that we can reliably read in embedded line ending characters
+ 		 * regardless of the file's line-end context.
+ 		 *
+ 		 */
+ 
+ 		if (!embedded_line_warning  && (c == '\n' || c == '\r') )
+ 		{
+ 			embedded_line_warning = true;
+ 			elog(WARNING,
+  "CSV fields with embedded linefeed or carriage return "
+  "characters might not be able to be reimported");
+ 		}
+ 
  		if (c == delimc || c == quotec || c == '\n' || c == '\r')
  			use_quote = true;
  		if (!same_encoding)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]