On 6/4/2012 12:12 PM, Marc Schwartz wrote:
To jump into the fray, he really needs to read the Details section of
?read.table and arguably, the source code for read.table().
It is not that the resultant data frame has row names, but that an
additional first *column name* called 'row.names' is created, which
does not exist in the source data.
The Details section has:
If row.names is not specified and the header line has one less entry
than the number of columns, the first column is taken to be the row
names. This allows data frames to be read in from the format in which
they are printed. If row.names is specified and does not refer to the
first column, that column is discarded from such files.
The number of data columns is determined by looking at the first five
lines of input (or the whole file if it has less than five lines), or
from the length of col.names if it is specified and is longer. This
could conceivably be wrong if fill or blank.lines.skip are true, so
specify col.names if necessary (as in the ‘Examples’).
In the source code for read.table(), which is called by read.delim()
with differing defaults, there is:
rlabp<- (cols - col1) == 1L
and a few lines further down:
if (rlabp) col.names<- c("row.names", col.names)
So the last code snippet is where a new first column name called
'row.names' is pre-pended to the column names found from reading the
header row. 'cols' and 'col1' are defined in prior code based upon
various conditions.
Not having the full data set and possibly having line wrap and TAB
problems with the text that Ed pasted into his original post, I
cannot properly replicate the conditions that cause the above code to
be triggered.
If Ed can put the entire file someplace and provide a URL for
download, perhaps we can better trace the source of the problem, or
Ed might use ?debug to follow the code execution in read.table() and
see where the relevant flags get triggered. The latter option would
help Ed learn how to use the debugging tools that R provides to dig
more deeply into such issues.
I agree that the actual file would be helpful. But I can get it to
happen if there are extra delimiters at the end of the data lines (which
there can be with a separator of tab which is not obviously visible). I
can get it with:
BACS<-read.delim(textConnection(
"start\tstop\tSymbol\tInsert sequence\tClone End Pair\tFISH
203048\t67173930\t\tABC8-43024000D23\tTI:993812543\tTI:993834585\t
255176\t87869359\t\tABC8-43034700N15\tTI:995224581\tTI:995237913\t
1022033\t1060472\t\tABC27-1253C21\tTI:2094436044\tTI:2094696079\t
1022033\t1061172\t\tABC23-1388A1\tTI:2120730727\tTI:2121592459\t"),
row.names=NULL, fill=TRUE)
which gives
> BACS
row.names start stop Symbol Insert.sequence
1 203048 67173930 NA ABC8-43024000D23 TI:993812543
2 255176 87869359 NA ABC8-43034700N15 TI:995224581
3 1022033 1060472 NA ABC27-1253C21 TI:2094436044
4 1022033 1061172 NA ABC23-1388A1 TI:2120730727
Clone.End.Pair FISH
1 TI:993834585 NA
2 TI:995237913 NA
3 TI:2094696079 NA
4 TI:2121592459 NA
or
> str(BACS)
'data.frame': 4 obs. of 7 variables:
$ row.names : chr "203048" "255176" "1022033" "1022033"
$ start : int 67173930 87869359 1060472 1061172
$ stop : logi NA NA NA NA
$ Symbol : Factor w/ 4 levels "ABC23-1388A1",..: 3 4 2 1
$ Insert.sequence: Factor w/ 4 levels "TI:2094436044",..: 3 4 1 2
$ Clone.End.Pair : Factor w/ 4 levels "TI:2094696079",..: 3 4 1 2
$ FISH : logi NA NA NA NA
The extra delimiter at the end of the line triggers the
one-more-data-than-column-name condition, which then gives the row.names
column.
Regards,
Marc Schwartz
On Jun 4, 2012, at 1:30 PM, Bert Gunter wrote:
Actually, I think it's ?data.frame that he should read.
The salient points are that:
1. All data frames must have unique row names. If not provided, they
are produced. Row numbers **are** row names.
2. The return value of read methods are data frames.
-- Bert
On Mon, Jun 4, 2012 at 11:05 AM, David L Carlson<dcarl...@tamu.edu> wrote:
Try help("read.delim") - always a good strategy before using a function for
the first time:
In it, you will find: "Using row.names = NULL forces row numbering. Missing
or NULL row.names generate row names that are considered to be 'automatic'
(and not preserved by as.matrix)."
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Ed Siefker
Sent: Monday, June 04, 2012 12:47 PM
To: r-help@r-project.org
Subject: [R] Why do I have a column called row.names?
I'm trying to read in a tab separated table with read.delim().
I don't particularly care what the row names are.
My data file looks like this:
start stop Symbol Insert sequence Clone End Pair FISH
203048 67173930 ABC8-43024000D23 TI:993812543
TI:993834585
255176 87869359 ABC8-43034700N15 TI:995224581
TI:995237913
1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079
1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459
I have to do something with row.names because my first column has
duplicate entries. So I read in the file like this:
BACS<-read.delim("testdata.txt", row.names=NULL, fill=TRUE)
head(BACS)
row.names start stop Symbol Insert.sequence
Clone.End.Pair
1 203048 67173930 ABC8-43024000D23 NA TI:993812543
TI:993834585
2 255176 87869359 ABC8-43034700N15 NA TI:995224581
TI:995237913
3 1022033 1060472 ABC27-1253C21 NA TI:2094436044
TI:2094696079
4 1022033 1061172 ABC23-1388A1 NA TI:2120730727
TI:2121592459
FISH
1 NA
2 NA
3 NA
4 NA
Why is there a column named "row.names"? I've tried a few different
ways of invoking this, but I always get the first column named
row.names,
and the rest of the columns shifted by one.
Obviously I could fix this by using row.names<-, but I'd like to
understand
why this happens. Any insight?
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.