New topic: Stripping unwanted characters
<http://forums.realsoftware.com/viewtopic.php?t=46183> Page 1 of 1 [ 11 posts ] Previous topic | Next topic Author Message tseyfarth Post subject: Stripping unwanted charactersPosted: Sun Dec 09, 2012 1:33 am Joined: Sat Dec 04, 2010 9:14 pm Posts: 773 Hello All, I am reading in a CSV file and some fields contain unwanted quotation marks. How best to remove them? I tried ReplaceAll, with different settings, but it never worked. This is a sample row read in from a CSV file... "001",100,1,"A","001",1,1,3,1,10,10,10, while Not t.EOF lstImportData.AddRow ReplaceAll(t.ReadLine, " ", "") Wend And breaking it down.... While Not t.EOF Dim s As String S=t.ReadLine S = ReplaceAll(S, " "," " ) lstImportData.AddRow S //lstImportData.AddRow t.ReadLine //lstImportData.AddRow ReplaceAll(t.ReadLine, " ", "") RecsImported = RecsImported + 1 Wend Ideas would be appreciated! Thank you, Tim Top HiddenPaw Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 1:42 am Joined: Sat Oct 01, 2005 12:01 pm Posts: 63 Location: San Jose try changing line to S = ReplaceAll(S, chr(34), "") Top ktekinay Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:01 am Joined: Mon Feb 05, 2007 5:21 pm Posts: 300 Location: New York, NY If I understand your question, what you probably want is: ReplaceAll( t.ReadLine, """", "" ) That replaces all the quotes with nothing. If you want to get a little fancier, you can use a regular expression, but I'd read in the whole file, then apply the replace once, assuming the file won't be too big. Using your method of reading a line at a time: dim rx as new RegEx rx.SearchPattern = "(?Umi-s)""([^""]*)"" *(,|$)" rx.ReplacementPattern = "\1\2" dim rxOptions as RegExOptions = rx.Options rxOptions.ReplaceAllMatches = true While Not t.EOF Dim s As String = t.ReadLine s = rx.Replace( s ) lstImportData.AddRow s RecsImported = RecsImported + 1 Wend That pattern will look for a quote followed by zero or more characters that aren't a quote, followed by another quote, zero or more spaces, then either a comma or the end of the line. It will replace that with just what's between the quotes followed by the comma, if there had been one there in the first place. If you want to do the whole file at once: dim rx as new RegEx rx.SearchPattern = "(?Umi-s)""([^""]*)"" *(,|$)" rx.ReplacementPattern = "\1\2" dim rxOptions as RegExOptions = rx.Options rxOptions.ReplaceAllMatches = true dim s as string = t.ReadAll if s = "" then // There is nothing there, so act accordingly return end if s = rx.Replace( s ) // or just s = s.ReplaceAll( """", "" ) s = ReplaceLineEndings( s, EndOfLine.UNIX ) // Doesn't really matter which EOL you choose... dim arr() as string = s.Split( EndOfLine.UNIX ) // ...just have to know how to split it if arr( arr.Ubound ) = "" then redim arr( arr.Ubound - 1 ) // Remove the last row if blank end if for i as integer = 0 to arr.Ubound lstImportData.AddRow arr( i ) next i RecsImported = arr.Ubound + 1 _________________ Kem Tekinay MacTechnologies Consulting http://www.mactechnologies.com/ Need to develop, test, and refine regular expressions? Try RegExRX. Top timhare Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:06 am Joined: Fri Jan 06, 2006 3:21 pm Posts: 11869 Location: Portland, OR USA Reading CSV is not a simple task. I would avoid removing all quote marks, because some of them could be data. To do it right, you need to use a complex regex, or process each line character by character. If the first character is a quote, then search for the next quote, skipping commas until we find one. If it is an escaped quote keep searching. Consume that quote and the comma that follows. Otherwise, search for the next comma. Repeat to end of line. Top timhare Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:10 am Joined: Fri Jan 06, 2006 3:21 pm Posts: 11869 Location: Portland, OR USA npalardy has/had a plugin that parses CSV files properly. Top ktekinay Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:13 am Joined: Mon Feb 05, 2007 5:21 pm Posts: 300 Location: New York, NY I have a method in my M_String module on my web site that does it properly too. But the posters code did not parse the fields, so I didn't mention it. _________________ Kem Tekinay MacTechnologies Consulting http://www.mactechnologies.com/ Need to develop, test, and refine regular expressions? Try RegExRX. Top tseyfarth Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 2:52 am Joined: Sat Dec 04, 2010 9:14 pm Posts: 773 Hi and thank you all for responding! I took a short break, had dinner (yea its only midnight something...) came back and saw all these wonderful responses! Again, thank you all! I *do* have to parse them at some point, just kind of working through this right now, which is why I did not add that task to the post. I'll be looking at this again when sharper. Again, all, thank you for your responses! Tim Top ktekinay Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 3:08 am Joined: Mon Feb 05, 2007 5:21 pm Posts: 300 Location: New York, NY In that case, this would be your code if you had M_String installed: while not t.EOF dim s as string = t.ReadLine dim fields() as string = s.SplitQuoted_MTC( "," ) lstImportData.AddRow fields // Or whatever other processing you need RecsImported = RecsImported + 1 wend _________________ Kem Tekinay MacTechnologies Consulting http://www.mactechnologies.com/ Need to develop, test, and refine regular expressions? Try RegExRX. Top timhare Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 3:15 am Joined: Fri Jan 06, 2006 3:21 pm Posts: 11869 Location: Portland, OR USA Very nice. M_String is golden. Top tseyfarth Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 12:18 pm Joined: Sat Dec 04, 2010 9:14 pm Posts: 773 ktekinay, Thanks for M_String. I downloaded and tried your code. It did not work. Not every column/field has quotation marks. **Edited* It parses the string properly, but it does not remove the quotation marks. Input = "001",100,1,"A","001",1,1,3,1,10,10,10, Output = "001" 100 1 "A" "001" 1 1 3 1 10 10 10 ** I must be doing something wrong. Any ideas? I will take a look at your comments in the source to see what is there that might help too... **Edit** How to remove the quotes? That was the original problem.... ** Thanks all, Tim Top ktekinay Post subject: Re: Stripping unwanted charactersPosted: Sun Dec 09, 2012 1:27 pm Joined: Mon Feb 05, 2007 5:21 pm Posts: 300 Location: New York, NY Well shoot, that's one of the first methods I added to M_String and I haven't looked at it in years. Let me check it now and get back to you. _________________ Kem Tekinay MacTechnologies Consulting http://www.mactechnologies.com/ Need to develop, test, and refine regular expressions? Try RegExRX. Top Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending Page 1 of 1 [ 11 posts ] -- Over 1500 classes with 29000 functions in one REALbasic plug-in collection. The Monkeybread Software Realbasic Plugin v9.3. http://www.monkeybreadsoftware.de/realbasic/plugins.shtml [email protected]
