Re: [U2] [UD] Extract a line with a CR and/or LF character in it.

Bill Haskett Thu, 26 May 2011 09:36:56 -0700

Ed:

Actually, it's pretty easy to parse a CSV file. One just has to do itone character at a time. The problem with the usual BASIC statements,READSEQ or REMOVE is they don't read the entire line, just the part upto the CR/LF. The second read reads the balance of the line (or untilanother CR and/or LF is encountered).

That's why I had to figure out a way to ensure the multiple lines werejoined together if, and only if, there was a CR/LF embedded in a quotedfield.


By the way, the wiki has some code to parse CSV files.

Bill

------------------------------------------------------------------------
----- Original Message -----
*From:* [email protected]
*To:* U2 Users List <[email protected]>
*Date:* 5/26/2011 6:48 AM

*Subject:* Re: [U2] [UD] Extract a line with a CR and/or LF character init.

just an idea I haven't thought about too deeply:
Use readseq to read a line, then use the COUNT() function to count the quotes. If there 
are an odd number of quotes (mod(2)=1) then add a value mark and read and append another 
line. Loop until you have an even number of quotes (because there might be more than one 
"multivalued" field in the record), at which point you have the entire line.

On May 26, 2011, at 2:57 AM, Bill Haskett wrote:

I figured out how to do this.  I read each line and use a subroutine to go 
through each character.  It sets a variable 'QuoteOn' if we're in a quoted 
string.  Obviously if the line ends while in a quoted string, the next line 
belongs to the current line.  Man, what a pain this was!  :-)

Thanks for your thoughts and help.

Bill

------------------------------------------------------------------------
----- Original Message -----
*From:* [email protected]
*To:* [email protected]
*Date:* 5/25/2011 10:13 PM
*Subject:* Re: [U2] [UD] Extract a line with a CR and/or LF character in it.

It's been a while - but I'm pretty sure that OSBREAD keeps the CR/LF as part of the block 
(you may need to put NO CONVERT ON in the code). READSEQ automatically ends at the CR/LF 
so you would have to "put the lines together" if you were short fields.

In both cases it would mean going through the block/line a character at a time 
to parse out each field. Of course, to work with embedded quotes and commas you 
pretty much have to any way. With READSEQ you know the line ended on a CRLF - 
you just need to figure out if it's the end of the record or not.

Does that make more sense?

Hht
Colin Alfke
Calgary, Canada

From: wphaskett

I guess that's my problem. I can't use OSBREAD because the Cr/Lf
appears in different columns in the line. I can't guarantee where it
shows up (or what character position). Using READSEQ doesn't work
either because the line read by the statement is only a part of the
entire line in the file! e.g.

0,4300,1BEU,Robert,Smith,Julie,Smith,1 Lakewood Dr,,63031,"1 Lakewood Dr
San Diego, CA 92122",,,$150.00,,,,,
0,4300,1CYN,John Randolph,Bones,,,1 Round Ct,,63031,"1 Round Ct
San Diego, CA 92122",,,$150.00,,,,,

...when the lines should look like (only two lines):

0,4300,1BEU,Robert,Smith,Julie,Smith,1 Lakewood Dr,,63031,"1 Lakewood
Dr, San Diego, CA 92122",,,$150.00,,,,,
0,4300,1CYN,John Randolph,Bones,,,1 Round Ct,,63031,"1 Round Ct, San
Diego, CA 92122",,,$150.00,,,,,

There's no guarantee the field causing the problem will even have any
data in it, so I can't append every 2nd line to the end of every 1st
line. :-(

Once I get the line I can deal with each character at a time. Any other
ideas?

As always, thanks.

Bill

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users


_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] [UD] Extract a line with a CR and/or LF character in it.

Reply via email to