Re: [R] using read.table, removing extra quotation mark from a text field? (e.g. cat )

2010-09-12 Thread jim holtman
You can use the 'gsub' command to remove the quote marks.  You could
readLines/writeLines the file to clean it up with gsub before using
read.table on it so it can all be done within R.

On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom eva.nordst...@yahoo.com wrote:
 I am using read.table to import a text file within R.

 There are several errors in my text file.  An extra quotation mark has
 inadvertently been included within a few text fields.


 e.g. for a pipe (|) delimited text file, I have something similar to this:

 1|7|30| dog
 2|6|25| cat
 3|4|20|
 4|5| 56| mouse
 5|3|56| horse
 6|56| 

 In the above example| there are extra quotation marks within the fields for 
 cat
 and horse. (row 2 and row 5)

 e.g. cat , horse

 One solution is to simply edit the text file and remove the extra quotation
 mark.

 Is there a good solution I can implement form within R?

 I am OK with just importing the extra quotation marks and having nit show up 
 as
 part of the text field within R.

 e.g,
 cat
 horse

 Thanks.



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using read.table, removing extra quotation mark from a text field? (e.g. cat )

2010-09-12 Thread Wil M Contreras Arbaje
While you are looking for a solution within R, it might be simpler to  
open your text file in almost any free text editor (Notepad++,  
Textwrangler, Smultron, vim come to mind), and do Replace all ' for .


On Sep 12, 2010, at 3:58 PM, jim holtman wrote:


You can use the 'gsub' command to remove the quote marks.  You could
readLines/writeLines the file to clean it up with gsub before using
read.table on it so it can all be done within R.

On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom eva.nordst...@yahoo.com 
 wrote:

I am using read.table to import a text file within R.

There are several errors in my text file.  An extra quotation  
mark has

inadvertently been included within a few text fields.


e.g. for a pipe (|) delimited text file, I have something similar  
to this:


1|7|30| dog
2|6|25| cat
3|4|20|
4|5| 56| mouse
5|3|56| horse
6|56| 

In the above example| there are extra quotation marks within the  
fields for cat

and horse. (row 2 and row 5)

e.g. cat , horse

One solution is to simply edit the text file and remove the extra  
quotation

mark.

Is there a good solution I can implement form within R?

I am OK with just importing the extra quotation marks and having  
nit show up as

part of the text field within R.

e.g,
cat
horse

Thanks.



   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using read.table, removing extra quotation mark from a text field? (e.g. cat )

2010-09-12 Thread Dennis Murphy
Hi:

On Sun, Sep 12, 2010 at 1:05 PM, Wil M Contreras Arbaje 
wil.contre...@gmail.com wrote:

 While you are looking for a solution within R, it might be simpler to open
 your text file in almost any free text editor (Notepad++, Textwrangler,
 Smultron, vim come to mind), and do Replace all ' for .


There's one problem with that solution: if the character string at the end
of the line is blank (i.e., ), then your suggestion will leave one double
quote at the end of a line. Not good. What is needed is a gsub that takes
two double quotes plus a wild card character and replaces it with one double
quote and a wild card character. If you have an editor that can do that, let
me know...seriously. I suspect emacs can do this, but none of the basic
editors I know have that capability.

Dennis




 On Sep 12, 2010, at 3:58 PM, jim holtman wrote:

  You can use the 'gsub' command to remove the quote marks.  You could
 readLines/writeLines the file to clean it up with gsub before using
 read.table on it so it can all be done within R.

 On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom eva.nordst...@yahoo.com
 wrote:

 I am using read.table to import a text file within R.

 There are several errors in my text file.  An extra quotation mark
 has
 inadvertently been included within a few text fields.


 e.g. for a pipe (|) delimited text file, I have something similar to
 this:

 1|7|30| dog
 2|6|25| cat
 3|4|20|
 4|5| 56| mouse
 5|3|56| horse
 6|56| 

 In the above example| there are extra quotation marks within the fields
 for cat
 and horse. (row 2 and row 5)

 e.g. cat , horse

 One solution is to simply edit the text file and remove the extra
 quotation
 mark.

 Is there a good solution I can implement form within R?

 I am OK with just importing the extra quotation marks and having nit show
 up as
 part of the text field within R.

 e.g,
 cat
 horse

 Thanks.



   [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using read.table, removing extra quotation mark from a text field? (e.g. cat )

2010-09-12 Thread Wil M Contreras Arbaje
True, I'd actually misread the problem as being ' and not .

In the interest of expediency, here's one solution I can think off the  
top of my head: using MS-Word (dunno if it's taboo in these lists, but  
it's what I have at hand at the moment–I believe in using all the  
tools available, if it will save time, which is most valuable):

First, make a copy of your data and test it, to make sure Word will  
preserve proper formatting (when you do Save as, it gives you  
several options under formatting, from which to pick line breaks, etc.)

First, do Replace All  --  (Edit Menu -- Find -- Replace)
Then, depending on whether the rows begin with | or |  (that is,  
pipe, double quotes; or pipe, space, double quotes), you can do either:
Replace All: |^p -- |^p
Replace All:  ^p --  ^p (note that the pipe has been replaced by a  
single whitespace)
^p stands for paragraph break in Word, so, it would essentially look  
for a single double quote followed by a Return, thereby ignoring  
single quotes that are followed by text.

I just tested this, and it worked like a charm.

Dennis: in theory, any text editor that supports regular expression  
should be able to do it. I'm fairly rusty on regex now (haven't used  
it in a while, I wish I could offer the exact command). Here are two  
free ones that do, if anyone wants to play around with regex:

www.barebones.com/products/textwrangler/textwranglerpower.html  
(Textwrangler, OS X)
sourceforge.net/apps/mediawiki/notepad-plus/index.php? 
title=Regular_Expressions (Notepad++, Windows)

Cheers, hope it helps,


Wil

On Sep 12, 2010, at 6:27 PM, Dennis Murphy wrote:

 Hi:

 On Sun, Sep 12, 2010 at 1:05 PM, Wil M Contreras Arbaje 
 wil.contre...@gmail.com 
  wrote:
 While you are looking for a solution within R, it might be simpler  
 to open your text file in almost any free text editor (Notepad++,  
 Textwrangler, Smultron, vim come to mind), and do Replace all ' for  
 .

 There's one problem with that solution: if the character string at  
 the end of the line is blank (i.e., ), then your suggestion will  
 leave one double quote at the end of a line. Not good. What is  
 needed is a gsub that takes two double quotes plus a wild card  
 character and replaces it with one double quote and a wild card  
 character. If you have an editor that can do that, let me  
 know...seriously. I suspect emacs can do this, but none of the basic  
 editors I know have that capability.

 Dennis



 On Sep 12, 2010, at 3:58 PM, jim holtman wrote:

 You can use the 'gsub' command to remove the quote marks.  You could
 readLines/writeLines the file to clean it up with gsub before using
 read.table on it so it can all be done within R.

 On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom eva.nordst...@yahoo.com 
  wrote:
 I am using read.table to import a text file within R.

 There are several errors in my text file.  An extra quotation  
 mark has
 inadvertently been included within a few text fields.


 e.g. for a pipe (|) delimited text file, I have something similar to  
 this:

 1|7|30| dog
 2|6|25| cat
 3|4|20|
 4|5| 56| mouse
 5|3|56| horse
 6|56| 

 In the above example| there are extra quotation marks within the  
 fields for cat
 and horse. (row 2 and row 5)

 e.g. cat , horse

 One solution is to simply edit the text file and remove the extra  
 quotation
 mark.

 Is there a good solution I can implement form within R?

 I am OK with just importing the extra quotation marks and having nit  
 show up as
 part of the text field within R.

 e.g,
 cat
 horse

 Thanks.



   [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using read.table, removing extra quotation mark from a text field? (e.g. cat )

2010-09-12 Thread Dennis Murphy
Hi:

Thanks to Jakson Aquino, who showed me how to do a proper text substitution,
we have a way out. It also turns out that in the last line, the last numeric
field was missing, so I inserted an NA| in the last line of the data file
before calling readLines(). His (correct) code is at the bottom of the mail.

The first two lines of code below are courtesy of Jakson. Afterward, I tried
to shape the result into a data frame for export as a flat file. There's an
interesting lesson to be (re)learned in the process, so bear with me.

Input file file1.txt (revised):
1|7|30| dog
2|6|25| cat
3|4|20|
4|5| 56| mouse
5|3|56| horse
6|56|NA| 

x - readLines(file1.txt)
y - sub('(.)', '\\1', x)
d - do.call(rbind, strsplit(y, split = '\\|'))
d - as.data.frame(d)
d
  V1 V2  V3   V4
1  1  7  30dog
2  2  6  25cat
3  3  4  20   
4  4  5  56  mouse
5  5  3  56  horse
6  6 56  NA   
 str(d)
'data.frame':   6 obs. of  4 variables:
 $ V1: Factor w/ 6 levels 1,2,3,4,..: 1 2 3 4 5 6
 $ V2: Factor w/ 6 levels 3,4,5,56,..: 6 5 2 3 1 4
 $ V3: Factor w/ 6 levels  56,20,25,..: 4 3 2 1 5 6
 $ V4: Factor w/ 6 levels  \\, \cat\,..: 3 2 6 5 4 1

Everything is a factor, as it should be since we converted a character
matrix into a data frame. Now convert the factors to numeric and character
and write out to a file.

d$V1 - as.numeric(d$V1)
d$V2 - as.numeric(d$V2)
d$V3 - as.numeric(d$V3)
d$V4 - as.character(d$V4)
d
  V1 V2 V3   V4
1  1  6  4dog
2  2  5  3cat
3  3  2  2   
4  4  3  1  mouse
5  5  1  5  horse
6  6  4  6   

Oopsie. We got the numeric factor codes back in V2 and V3. The FAQ 7.10
trap...

# Back to the drawing board.
d - do.call(rbind, strsplit(y, split = '\\|'))
d - as.data.frame(d)
d1 - d
d1$V1 - as.numeric(as.character(d1$V1))
d1$V2 - as.numeric(as.character(d1$V2))
d1$V3 - as.numeric(as.character(d1$V3))
d1$V4 - as.character(as.character(d1$V4))

 d1
  V1 V2  V3   V4
1  1  7  30dog
2  2  6  25cat
3  3  4  20   
4  4  5  56  mouse
5  5  3  56  horse
6  6 56  NA   

Much better. Let's double check that we're OK.
str(d1)
'data.frame':   6 obs. of  4 variables:
 $ V1: num  1 2 3 4 5 6
 $ V2: num  7 6 4 5 3 56
 $ V3: num  30 25 20 56 56 NA
 $ V4: chr   \dog\  \cat\ \\  \mouse\ ...

# NOW write it out...
write.table(d1, file = 'file3.dat', quote = FALSE)   # looks good

And that's why FAQ 7.10 is written the way it is.

If one is happy with y (just the paired double quotes removed), then
Jakson's final line is sufficient:
writeLines(y, file2.txt)


Dennis

On Sun, Sep 12, 2010 at 5:05 PM, Jakson A. Aquino jaksonaqu...@gmail.comwrote:

 On Sun, Sep 12, 2010 at 7:27 PM, Dennis Murphy djmu...@gmail.com wrote:
  Hi:
 
  On Sun, Sep 12, 2010 at 1:05 PM, Wil M Contreras Arbaje 
  wil.contre...@gmail.com wrote:
 
  While you are looking for a solution within R, it might be simpler to
 open
  your text file in almost any free text editor (Notepad++, Textwrangler,
  Smultron, vim come to mind), and do Replace all ' for .
 
 
  There's one problem with that solution: if the character string at the
 end
  of the line is blank (i.e., ), then your suggestion will leave one
 double
  quote at the end of a line. Not good. What is needed is a gsub that takes
  two double quotes plus a wild card character and replaces it with one
 double
  quote and a wild card character. If you have an editor that can do that,
 let
  me know...seriously. I suspect emacs can do this, but none of the basic
  editors I know have that capability.
 
  Dennis
 
 
 
 
  On Sep 12, 2010, at 3:58 PM, jim holtman wrote:
 
   You can use the 'gsub' command to remove the quote marks.  You could
  readLines/writeLines the file to clean it up with gsub before using
  read.table on it so it can all be done within R.
 
  On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom 
 eva.nordst...@yahoo.com
  wrote:
 
  I am using read.table to import a text file within R.
 
  There are several errors in my text file.  An extra quotation mark
  has
  inadvertently been included within a few text fields.
 
 
  e.g. for a pipe (|) delimited text file, I have something similar to
  this:
 
  1|7|30| dog
  2|6|25| cat
  3|4|20|
  4|5| 56| mouse
  5|3|56| horse
  6|56| 

 x - readLines(file1.txt)
 y - sub('(.)', '\\1', x)
 writeLines(y, file2.txt)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.