New topic: 

Stripping unwanted characters

<http://forums.realsoftware.com/viewtopic.php?t=46183>

         Page 1 of 1
   [ 11 posts ]                 Previous topic | Next topic          Author  
Message        tseyfarth          Post subject: Stripping unwanted 
charactersPosted: Sun Dec 09, 2012 1:33 am                         
Joined: Sat Dec 04, 2010 9:14 pm
Posts: 773                Hello All,

I am reading in a CSV file and some fields contain unwanted quotation marks.  
How best to remove them?

I tried ReplaceAll, with different settings, but it never worked.

This is a sample row read in from a CSV file... 

"001",100,1,"A","001",1,1,3,1,10,10,10,

while Not t.EOF
  lstImportData.AddRow ReplaceAll(t.ReadLine, " ", "")
Wend


And breaking it down....
While Not t.EOF
  Dim s As String
  S=t.ReadLine
  S =  ReplaceAll(S, " "," " )
  lstImportData.AddRow S
  //lstImportData.AddRow t.ReadLine
  //lstImportData.AddRow ReplaceAll(t.ReadLine, " ", "")
  RecsImported = RecsImported + 1
Wend


Ideas would be appreciated!
Thank you,
Tim   
                             Top                HiddenPaw          Post 
subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 1:42 am      
                   
Joined: Sat Oct 01, 2005 12:01 pm
Posts: 63
Location: San Jose                try changing line to
 S =  ReplaceAll(S, chr(34), "")   
                             Top                ktekinay          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:01 am               
                  
Joined: Mon Feb 05, 2007 5:21 pm
Posts: 300
Location: New York, NY                If I understand your question, what you 
probably want is:
ReplaceAll( t.ReadLine, """", "" )
That replaces all the quotes with nothing.

If you want to get a little fancier, you can use a regular expression, but I'd 
read in the whole file, then apply the replace once, assuming the file won't be 
too big.

Using your method of reading a line at a time:
dim rx as new RegEx
rx.SearchPattern = "(?Umi-s)""([^""]*)"" *(,|$)"
rx.ReplacementPattern = "\1\2"

dim rxOptions as RegExOptions = rx.Options
rxOptions.ReplaceAllMatches = true

While Not t.EOF
  Dim s As String = t.ReadLine
  s = rx.Replace( s )
  lstImportData.AddRow s
  RecsImported = RecsImported + 1
Wend

That pattern will look for a quote followed by zero or more characters that 
aren't a quote, followed by another quote, zero or more spaces, then either a 
comma or the end of the line. It will replace that with just what's between the 
quotes followed by the comma, if there had been one there in the first place.

If you want to do the whole file at once:
dim rx as new RegEx
rx.SearchPattern = "(?Umi-s)""([^""]*)"" *(,|$)"
rx.ReplacementPattern = "\1\2"

dim rxOptions as RegExOptions = rx.Options
rxOptions.ReplaceAllMatches = true

dim s as string = t.ReadAll
if s = "" then
  // There is nothing there, so act accordingly
  return
end if

s = rx.Replace( s ) // or just s = s.ReplaceAll( """", "" )
s = ReplaceLineEndings( s, EndOfLine.UNIX ) // Doesn't really matter which EOL 
you choose...
dim arr() as string = s.Split( EndOfLine.UNIX ) // ...just have to know how to 
split it
if arr( arr.Ubound ) = "" then
  redim arr( arr.Ubound - 1 ) // Remove the last row if blank
end if

for i as integer = 0 to arr.Ubound
  lstImportData.AddRow arr( i )
next i
RecsImported = arr.Ubound + 1
      
_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.
  
                             Top                timhare          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:06 am               
          
Joined: Fri Jan 06, 2006 3:21 pm
Posts: 11869
Location: Portland, OR  USA                Reading CSV is not a simple task.  I 
would avoid removing all quote marks, because some of them could be data.  To 
do it right, you need to use a complex regex, or process each line character by 
character.

If the first character is a quote, then search for the next quote, skipping 
commas until we find one.  If it is an escaped quote keep searching.  Consume 
that quote and the comma that follows.
Otherwise, search for the next comma.
Repeat to end of line.   
                             Top                timhare          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:10 am               
          
Joined: Fri Jan 06, 2006 3:21 pm
Posts: 11869
Location: Portland, OR  USA                npalardy has/had a plugin that 
parses CSV files properly.   
                             Top                ktekinay          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:13 am               
                  
Joined: Mon Feb 05, 2007 5:21 pm
Posts: 300
Location: New York, NY                I have a method in my M_String module on 
my web site that does it properly too. But the posters code did not parse the 
fields, so I didn't mention it.      
_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.
  
                             Top                tseyfarth          Post 
subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:52 am      
                   
Joined: Sat Dec 04, 2010 9:14 pm
Posts: 773                Hi and thank you all for responding!  

I took a short break, had dinner (yea its only midnight something...) came back 
and saw all these wonderful responses!  Again, thank you all!

I *do* have to parse them at some point, just kind of working through this 
right now, which is why I did not add that task to the post.

I'll be looking at this again when sharper. 
Again, all, thank you for your responses!

Tim   
                             Top                ktekinay          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 3:08 am               
                  
Joined: Mon Feb 05, 2007 5:21 pm
Posts: 300
Location: New York, NY                In that case, this would be your code if 
you had M_String installed:
while not t.EOF
  dim s as string = t.ReadLine
  dim fields() as string = s.SplitQuoted_MTC( "," )
  lstImportData.AddRow fields // Or whatever other processing you need
  RecsImported = RecsImported + 1
wend
      
_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.
  
                             Top                timhare          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 3:15 am               
          
Joined: Fri Jan 06, 2006 3:21 pm
Posts: 11869
Location: Portland, OR  USA                Very nice.  M_String is golden.   
                             Top                tseyfarth          Post 
subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 12:18 pm     
                    
Joined: Sat Dec 04, 2010 9:14 pm
Posts: 773                ktekinay,

Thanks for M_String.  I downloaded and tried your code.  It did not work. Not 
every column/field has quotation marks. 
**Edited*
It parses the string properly, but it does not remove the quotation marks.
Input = "001",100,1,"A","001",1,1,3,1,10,10,10,

Output = "001"  100  1  "A"  "001"  1 1 3 1 10 10 10
**

I must be doing something wrong.  Any ideas?  I will take a look at your 
comments in the source to see what is there that might help too...

**Edit**
How to remove the quotes?  That was the original problem....
**

Thanks all,
Tim   
                             Top                ktekinay          Post subject: 
Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 1:27 pm               
                  
Joined: Mon Feb 05, 2007 5:21 pm
Posts: 300
Location: New York, NY                Well shoot, that's one of the first 
methods I added to M_String and I haven't looked at it in years. Let me check 
it now and get back to you.      
_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.
  
                             Top             Display posts from previous: All 
posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost 
timeSubject AscendingDescending          Page 1 of 1
   [ 11 posts ]      
-- 
Over 1500 classes with 29000 functions in one REALbasic plug-in collection. 
The Monkeybread Software Realbasic Plugin v9.3. 
http://www.monkeybreadsoftware.de/realbasic/plugins.shtml

[email protected]

Reply via email to