Re: [R] Why do I have a column called row.names?
Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stopSymbol Insert sequence Clone End Pair FISH 203048 67173930ABC8-43024000D23TI:993812543 TI:993834585 255176 87869359ABC8-43034700N15TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.namesstart stop Symbol Insert.sequence Clone.End.Pair 1203048 67173930 ABC8-43024000D23 NATI:993812543 TI:993834585 2255176 87869359 ABC8-43034700N15 NATI:995224581 TI:995237913 3 1022033 1060472ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
I did read that, and I still don't understand why I have a column called row.names. I used row.names = NULL in order to get numbered row names, which was successful: row.names(BACS) [1] 1 2 3 4 I don't see what this has to do with an extraneous column name. Can you be more explicit as to what exactly I'm supposed to take away from this segment of the help file? Thanks. On Mon, Jun 4, 2012 at 1:05 PM, David L Carlson dcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stop Symbol Insert sequence Clone End Pair FISH 203048 67173930 ABC8-43024000D23 TI:993812543 TI:993834585 255176 87869359 ABC8-43034700N15 TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.names start stop Symbol Insert.sequence Clone.End.Pair 1 203048 67173930 ABC8-43024000D23 NA TI:993812543 TI:993834585 2 255176 87869359 ABC8-43034700N15 NA TI:995224581 TI:995237913 3 1022033 1060472 ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
Actually, I think it's ?data.frame that he should read. The salient points are that: 1. All data frames must have unique row names. If not provided, they are produced. Row numbers **are** row names. 2. The return value of read methods are data frames. -- Bert On Mon, Jun 4, 2012 at 11:05 AM, David L Carlson dcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stop Symbol Insert sequence Clone End Pair FISH 203048 67173930 ABC8-43024000D23 TI:993812543 TI:993834585 255176 87869359 ABC8-43034700N15 TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.names start stop Symbol Insert.sequence Clone.End.Pair 1 203048 67173930 ABC8-43024000D23 NA TI:993812543 TI:993834585 2 255176 87869359 ABC8-43034700N15 NA TI:995224581 TI:995237913 3 1022033 1060472 ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
Hello, It must be something else. Mandatory row names have nothing to do with a column header. I've put the data example in a tab separated file and the strange behavior was not reproduced. read.delim(test.txt, row.names=NULL, fill=TRUE) start stop Symbol Insert.sequence Clone.End.Pair FISH 1 203048 67173930 ABC8-43024000D23TI:993812543 TI:993834585 NA 2 255176 87869359 ABC8-43034700N15TI:995224581 TI:995237913 NA 3 1022033 1060472ABC27-1253C21 TI:2094436044 TI:2094696079 NA 4 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 NA With read.table, I tried with and without header=TRUE. No success. Rui Barradas Em 04-06-2012 19:30, Bert Gunter escreveu: Actually, I think it's ?data.frame that he should read. The salient points are that: 1. All data frames must have unique row names. If not provided, they are produced. Row numbers **are** row names. 2. The return value of read methods are data frames. -- Bert On Mon, Jun 4, 2012 at 11:05 AM, David L Carlsondcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stopSymbol Insert sequence Clone End Pair FISH 203048 67173930ABC8-43024000D23TI:993812543 TI:993834585 255176 87869359ABC8-43034700N15TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.namesstart stop Symbol Insert.sequence Clone.End.Pair 1203048 67173930 ABC8-43024000D23 NATI:993812543 TI:993834585 2255176 87869359 ABC8-43034700N15 NATI:995224581 TI:995237913 3 1022033 1060472ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
You will probably need to show us the first few lines of the .csv file. Assuming that the lines look like this start,stop,Symbol,Insert sequence,Clone End Pair,FISH 203048,67173930,ABC8-43024000D23,TI:993812543,TI:993834585 255176,87869359,ABC8-43034700N15,TI:995224581,TI:995237913 1022033,1060472,ABC27-1253C21,TI:2094436044,TI:2094696079 1022033,1061172,ABC23-1388A1,TI:2120730727,TI:2121592459 If I copy those lines to the clipboard and then use the command read.csv(clipboard) start stop Symbol Insert.sequence Clone.End.Pair FISH 1 203048 67173930 ABC8-43024000D23TI:993812543 TI:993834585 NA 2 255176 87869359 ABC8-43034700N15TI:995224581 TI:995237913 NA 3 1022033 1060472ABC27-1253C21 TI:2094436044 TI:2094696079 NA 4 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 NA I get numbered rows but no row.names (and I get the same when row.names=NULL and FILL=TRUE is included). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: Ed Siefker [mailto:ebs15...@gmail.com] Sent: Monday, June 04, 2012 1:16 PM To: dcarl...@tamu.edu Subject: Re: [R] Why do I have a column called row.names? I did read that, and I still don't understand why I have a column called row.names. I used row.names = NULL in order to get numbered row names, which was successful: row.names(BACS) [1] 1 2 3 4 I don't see what this has to do with an extraneous column name. Can you be more explicit as to what exactly I'm supposed to take away from this segment of the help file? Thanks. On Mon, Jun 4, 2012 at 1:05 PM, David L Carlson dcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stop Symbol Insert sequence Clone End Pair FISH 203048 67173930 ABC8-43024000D23 TI:993812543 TI:993834585 255176 87869359 ABC8-43034700N15 TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.names start stop Symbol Insert.sequence Clone.End.Pair 1 203048 67173930 ABC8-43024000D23 NA TI:993812543 TI:993834585 2 255176 87869359 ABC8-43034700N15 NA TI:995224581 TI:995237913 3 1022033 1060472 ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
To jump into the fray, he really needs to read the Details section of ?read.table and arguably, the source code for read.table(). It is not that the resultant data frame has row names, but that an additional first *column name* called 'row.names' is created, which does not exist in the source data. The Details section has: If row.names is not specified and the header line has one less entry than the number of columns, the first column is taken to be the row names. This allows data frames to be read in from the format in which they are printed. If row.names is specified and does not refer to the first column, that column is discarded from such files. The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). In the source code for read.table(), which is called by read.delim() with differing defaults, there is: rlabp - (cols - col1) == 1L and a few lines further down: if (rlabp) col.names - c(row.names, col.names) So the last code snippet is where a new first column name called 'row.names' is pre-pended to the column names found from reading the header row. 'cols' and 'col1' are defined in prior code based upon various conditions. Not having the full data set and possibly having line wrap and TAB problems with the text that Ed pasted into his original post, I cannot properly replicate the conditions that cause the above code to be triggered. If Ed can put the entire file someplace and provide a URL for download, perhaps we can better trace the source of the problem, or Ed might use ?debug to follow the code execution in read.table() and see where the relevant flags get triggered. The latter option would help Ed learn how to use the debugging tools that R provides to dig more deeply into such issues. Regards, Marc Schwartz On Jun 4, 2012, at 1:30 PM, Bert Gunter wrote: Actually, I think it's ?data.frame that he should read. The salient points are that: 1. All data frames must have unique row names. If not provided, they are produced. Row numbers **are** row names. 2. The return value of read methods are data frames. -- Bert On Mon, Jun 4, 2012 at 11:05 AM, David L Carlson dcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stopSymbol Insert sequence Clone End Pair FISH 203048 67173930ABC8-43024000D23TI:993812543 TI:993834585 255176 87869359ABC8-43034700N15TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: BACS-read.delim(testdata.txt, row.names=NULL, fill=TRUE) head(BACS) row.namesstart stop Symbol Insert.sequence Clone.End.Pair 1203048 67173930 ABC8-43024000D23 NATI:993812543 TI:993834585 2255176 87869359 ABC8-43034700N15 NATI:995224581 TI:995237913 3 1022033 1060472ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named row.names? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
On 6/4/2012 12:12 PM, Marc Schwartz wrote: To jump into the fray, he really needs to read the Details section of ?read.table and arguably, the source code for read.table(). It is not that the resultant data frame has row names, but that an additional first *column name* called 'row.names' is created, which does not exist in the source data. The Details section has: If row.names is not specified and the header line has one less entry than the number of columns, the first column is taken to be the row names. This allows data frames to be read in from the format in which they are printed. If row.names is specified and does not refer to the first column, that column is discarded from such files. The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). In the source code for read.table(), which is called by read.delim() with differing defaults, there is: rlabp- (cols - col1) == 1L and a few lines further down: if (rlabp) col.names- c(row.names, col.names) So the last code snippet is where a new first column name called 'row.names' is pre-pended to the column names found from reading the header row. 'cols' and 'col1' are defined in prior code based upon various conditions. Not having the full data set and possibly having line wrap and TAB problems with the text that Ed pasted into his original post, I cannot properly replicate the conditions that cause the above code to be triggered. If Ed can put the entire file someplace and provide a URL for download, perhaps we can better trace the source of the problem, or Ed might use ?debug to follow the code execution in read.table() and see where the relevant flags get triggered. The latter option would help Ed learn how to use the debugging tools that R provides to dig more deeply into such issues. I agree that the actual file would be helpful. But I can get it to happen if there are extra delimiters at the end of the data lines (which there can be with a separator of tab which is not obviously visible). I can get it with: BACS-read.delim(textConnection( start\tstop\tSymbol\tInsert sequence\tClone End Pair\tFISH 203048\t67173930\t\tABC8-43024000D23\tTI:993812543\tTI:993834585\t 255176\t87869359\t\tABC8-43034700N15\tTI:995224581\tTI:995237913\t 1022033\t1060472\t\tABC27-1253C21\tTI:2094436044\tTI:2094696079\t 1022033\t1061172\t\tABC23-1388A1\tTI:2120730727\tTI:2121592459\t), row.names=NULL, fill=TRUE) which gives BACS row.namesstart stop Symbol Insert.sequence 1203048 67173930 NA ABC8-43024000D23TI:993812543 2255176 87869359 NA ABC8-43034700N15TI:995224581 3 1022033 1060472 NAABC27-1253C21 TI:2094436044 4 1022033 1061172 NA ABC23-1388A1 TI:2120730727 Clone.End.Pair FISH 1 TI:993834585 NA 2 TI:995237913 NA 3 TI:2094696079 NA 4 TI:2121592459 NA or str(BACS) 'data.frame': 4 obs. of 7 variables: $ row.names : chr 203048 255176 1022033 1022033 $ start : int 67173930 87869359 1060472 1061172 $ stop : logi NA NA NA NA $ Symbol : Factor w/ 4 levels ABC23-1388A1,..: 3 4 2 1 $ Insert.sequence: Factor w/ 4 levels TI:2094436044,..: 3 4 1 2 $ Clone.End.Pair : Factor w/ 4 levels TI:2094696079,..: 3 4 1 2 $ FISH : logi NA NA NA NA The extra delimiter at the end of the line triggers the one-more-data-than-column-name condition, which then gives the row.names column. Regards, Marc Schwartz On Jun 4, 2012, at 1:30 PM, Bert Gunter wrote: Actually, I think it's ?data.frame that he should read. The salient points are that: 1. All data frames must have unique row names. If not provided, they are produced. Row numbers **are** row names. 2. The return value of read methods are data frames. -- Bert On Mon, Jun 4, 2012 at 11:05 AM, David L Carlsondcarl...@tamu.edu wrote: Try help(read.delim) - always a good strategy before using a function for the first time: In it, you will find: Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be 'automatic' (and not preserved by as.matrix). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ed Siefker Sent: Monday, June 04, 2012 12:47 PM To: r-help@r-project.org Subject: [R] Why do I have a column called row.names? I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stopSymbol Insert sequence Clone End Pair FISH