Thanks again Marc for your help.
At this point I already have the whole file as a data.frame in R (via Splus dump and then R source) so my problem for this specific problem is solved.
I had changed my file in Excel and thought everything was fine but apparently it wasn't. What program is used to display a tab file separated in columns that doesn't corrupt the data?
I tried again from the initial file and a very simple:
x <- read.table('file.txt', header=T, sep='\t') works fine. The sep='\t' is very important, otherwise the columns are imported in the wrong places when there are empty spaces next to them
I would suggest again advising people to use sep='\t' for tab delimited files in the help page for read.data.
##
If anyone is interested in a detailed history of the problem:
I had gotten my initial by exporting from Splus6.1, windows 2000 as a tab delimited file.
I tried to open the file in R, it didn't work and I opened the file in EXCEL and substituted the empty cells with NA. I saved the file as txt file - tab delimited. This was the file that I could not read only 9543 lines instead of the 15797 that the file is. The file is probably corrupted through the use of Excel, so I guess the lesson is don't do this in Excel.
I went back to Splus, exported a new tab delimited file and tried again:
x <- read.table('file.txt', header=T, sep='\t') #works finex <- read.table('file.txt', header=T) #gives an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 194 elementsx <- read.table('file.txt', header=T, fill=T) #wrong columns take empty (NA) space
x <- read.table('file.txt', header=T, fill=T, sep='\t') #works fineOn Wed, 2005-01-19 at 19:28 +0000, Tiago R Magalhaes wrote:Thanks very much Mark and Prof Ripley
a) using sep='\t' when using read.table() helps somewhat
there is still a problem:I cannot get all the lines: df <- read.table('file.txt', fill=T, header=T, sep='\t') dim(df) 9543 195
while with the shorter file (11 cols) I get all the rows dim(df) 15797 11
I have looked at row 9544 where the file seems to stop reading, but I cannot see in any of the cols an obvious reason for this to happen. Any ideas why? Maybe there is one col that is stopping the reading process and that column is not one of the 11 that are present in the smaller file.
b) fill=T is necessary without fill=T, I get an error: "line 1892 did not have 195 elements"
Tiago,
How was this data file generated? Is it a raw file created by some other application or was it an ASCII export, perhaps from a spreadsheet or database program?
It seems that there is something inconsistent in the large data file, which is either by design or perhaps the result of being corrupted by a poor export.
It may be helpful to know how the file was generated in the effort to assist you.
c) help page for read.table I reread the help file for read.table and I would suggest to change it. From what I think I am reading, the '\t' would not be needed to work in my file, but it actually is:from the help page:
If 'sep = ""' (the default for 'read.table') the separator is "white space", that is one or more spaces, tabs or newlines.
Under normal circumstances, this should not be a problem, but given the unknowns about your file, it leaves an open question as to the etiology of the incorrect import.
Marc
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
