Re: [R] Why only a "" string for heading for row.names with write.csv with a matrix?

2005-08-10 Thread Tony Plate
Here's a relatively easy way to get what I think you want.  Note that 
converting x to a data frame before cbind'ing allows the type of the 
elements of x to be preserved:

 > x <- matrix(1:6, 2,3)
 > rownames(x) <- c("ID1", "ID2")
 > colnames(x) <- c("Attr1", "Attr2", "Attr3")
 > x
 Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6
 > write.table(cbind(id=row.names(x), as.data.frame(x)), 
row.names=FALSE, sep=",")
"id","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6
 >

As to why you can't get this via an argument to write.table (or 
write.csv), I suspect that part of the answer is a wish to avoid 
"creeping featuritis".  Transferring data between programs is 
notoriously infuriating.  There are more data formats than there are 
programs, but few programs use the same format as their default & 
preferred format.  So to accommodate everyone's preferred format would 
require an extremely large number of features in the data import/export 
functions.  Maintaining software that contains a large number of 
features is difficult -- it's easy for errors to creep in because there 
are so many combinations of how different features can be used on 
different functions.

The alternative to having lots of features on each function is to have a 
relatively small set of powerful functions that can be used to construct 
the behavior you want.  This type of software is thought by many to be 
easier to maintain and extend.  I think is is pretty much the preferred 
approach in R.  The above one-liner for writing the data in the form you 
want is really not much more complex than using an additional argument 
to write.table().  (And if you need to do this kind of thing frequently, 
then it's easy in R to create your own wrapper function for 'write.table'.)

One might object to this line of explanation by noting that many 
functions already have many arguments and lots of features.  I think the 
situation is that the original author of any particular function gets to 
decide what features the function will have, and after that there is 
considerable reluctance (justifiably) to add new features, especially in 
cases where there desired functionality can be easily achieved in other 
ways with existing functions.

-- Tony Plate

Earl F. Glynn wrote:
> Consider:
> 
>>x <- matrix(1:6, 2,3)
>>rownames(x) <- c("ID1", "ID2")
>>colnames(x) <- c("Attr1", "Attr2", "Attr3")
> 
> 
>>x
> 
> Attr1 Attr2 Attr3
> ID1 1 3 5
> ID2 2 4 6
> 
> 
>>write.csv(x,file="x.csv")
> 
> "","Attr1","Attr2","Attr3"
> "ID1",1,3,5
> "ID2",2,4,6
> 
> Have I missed an easy way to get the "" string to be something meaningful?
> 
> There is no information in the "" string.  This column heading for the row
> names often could used as a database key, but the "" entry would need to be
> manually edited first.  Why not provide a way to specify the string instead
> of putting "" as the heading for the rownames?
> 
>>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html
> 
>   Header line
>   R prefers the header line to have no entry for the row names,
>   . . .
>   Some other systems require a (possibly empty) entry for the row names,
> which is what write.table will provide if argument col.names = NA  is
> specified. Excel is one such system.
> 
> Why is an "empty" entry the only option here?
> 
> A quick solution that comes to mind seems a bit kludgy:
> 
> 
>>y <- cbind(rownames(x), x)
>>colnames(y)[1] <- "ID"
>>y
> 
> IDAttr1 Attr2 Attr3
> ID1 "ID1" "1"   "3"   "5"
> ID2 "ID2" "2"   "4"   "6"
> 
> 
>>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")
> 
> "ID","Attr1","Attr2","Attr3"
> "ID1","1","3","5"
> "ID2","2","4","6"
> 
> Now the rownames have an "ID" header, which could be used as a key in a
> database if desired without editing (but all the "numbers" are now
> characters strings, too).
> 
> It's also not clear why I had to use write.table above, instead of
> write.csv:
> 
>>write.csv(y, row.names=F, col.names=T, file="y.csv")
> 
> Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
> col.names = NA makes no sense when row.names = FALSE
> 
> Thanks for any insight about this.
> 
> efg
> --
> Earl F. Glynn
> Bioinformatics
> Stowers Institute
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Why only a "" string for heading for row.names with write.csv with a matrix?

2005-08-10 Thread Earl F. Glynn
Consider:
> x <- matrix(1:6, 2,3)
> rownames(x) <- c("ID1", "ID2")
> colnames(x) <- c("Attr1", "Attr2", "Attr3")

> x
Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6

> write.csv(x,file="x.csv")
"","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6

Have I missed an easy way to get the "" string to be something meaningful?

There is no information in the "" string.  This column heading for the row
names often could used as a database key, but the "" entry would need to be
manually edited first.  Why not provide a way to specify the string instead
of putting "" as the heading for the rownames?

>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html

  Header line
  R prefers the header line to have no entry for the row names,
  . . .
  Some other systems require a (possibly empty) entry for the row names,
which is what write.table will provide if argument col.names = NA  is
specified. Excel is one such system.

Why is an "empty" entry the only option here?

A quick solution that comes to mind seems a bit kludgy:

> y <- cbind(rownames(x), x)
> colnames(y)[1] <- "ID"
> y
IDAttr1 Attr2 Attr3
ID1 "ID1" "1"   "3"   "5"
ID2 "ID2" "2"   "4"   "6"

> write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")
"ID","Attr1","Attr2","Attr3"
"ID1","1","3","5"
"ID2","2","4","6"

Now the rownames have an "ID" header, which could be used as a key in a
database if desired without editing (but all the "numbers" are now
characters strings, too).

It's also not clear why I had to use write.table above, instead of
write.csv:
> write.csv(y, row.names=F, col.names=T, file="y.csv")
Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
col.names = NA makes no sense when row.names = FALSE

Thanks for any insight about this.

efg
--
Earl F. Glynn
Bioinformatics
Stowers Institute

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html