Thanks again Marc for your help.

At this point I already have the whole file as a data.frame in R (via Splus dump and then R source) so my problem for this specific problem is solved.

I had changed my file in Excel and thought everything was fine but apparently it wasn't. What program is used to display a tab file separated in columns that doesn't corrupt the data?

I tried again from the initial file and a very simple:
x <- read.table('file.txt', header=T, sep='\t') works fine. The sep='\t' is very important, otherwise the columns are imported in the wrong places when there are empty spaces next to them
I would suggest again advising people to use sep='\t' for tab delimited files in the help page for read.data.


##

If anyone is interested in a detailed history of the problem:

I had gotten my initial by exporting from Splus6.1, windows 2000 as a tab delimited file.

I tried to open the file in R, it didn't work and I opened the file in EXCEL and substituted the empty cells with NA. I saved the file as txt file - tab delimited. This was the file that I could not read only 9543 lines instead of the 15797 that the file is. The file is probably corrupted through the use of Excel, so I guess the lesson is don't do this in Excel.

I went back to Splus, exported a new tab delimited file and tried again:

x <- read.table('file.txt', header=T, sep='\t') #works fine

x <- read.table('file.txt', header=T) #gives an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
        line 1 did not have 194 elements

x <- read.table('file.txt', header=T, fill=T) #wrong columns take empty (NA) space

x <- read.table('file.txt', header=T, fill=T, sep='\t') #works fine


On Wed, 2005-01-19 at 19:28 +0000, Tiago R Magalhaes wrote:
 Thanks very much Mark and Prof Ripley

 a) using sep='\t' when using read.table() helps somewhat

 there is still a problem:I cannot get all the lines:
 df <- read.table('file.txt', fill=T, header=T, sep='\t')
 dim(df)
   9543  195

 while with the shorter file (11 cols) I get all the rows
 dim(df)
   15797    11

 I have looked at row 9544 where the file seems to stop reading, but I
 cannot see in any of the cols an obvious reason for this to happen.
 Any ideas why? Maybe there is one col that is stopping the reading
 process and that column is not one of the 11 that are present in the
 smaller file.

 b) fill=T is necessary
 without fill=T, I get an error:
 "line 1892 did not have 195 elements"

Tiago,

How was this data file generated? Is it a raw file created by some other
application or was it an ASCII export, perhaps from a spreadsheet or
database program?

It seems that there is something inconsistent in the large data file,
which is either by design or perhaps the result of being corrupted by a
poor export.

It may be helpful to know how the file was generated in the effort to
assist you.

 c) help page for read.table
 I reread the help file for read.table and I would suggest to change
 it. From what I think I am reading, the '\t' would not be needed to
 work in my file, but it actually is:from the help page:

   If 'sep = ""' (the default for 'read.table') the separator is "white
 space", that is one or more spaces, tabs or newlines.

Under normal circumstances, this should not be a problem, but given the unknowns about your file, it leaves an open question as to the etiology of the incorrect import.

Marc

______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to