Re: [R] Reformatting text inside a data frame
Hi John, Thanks for the reply; I'm pasting here the output from dput, with a 'df <-' added in front: df <- structure(list(rowNum = c(1, 2, 3), first = structure(c(NA, 1L, 2L), .Label = c("AD=2;BA=8", "AD=9;BA=1"), class = "factor"), second = structure(c(2L, 1L, NA), .Label = c("AD=1;BA=2", "AD=13;BA=49"), class = "factor")), .Names = c("rowNum", "first", "second"), row.names = c(NA, -3L), class = "data.frame") To add more specifics, about what I would like; each value to be adjusted has the following general format: "AD=X;BA=Y" I would like to extract the values of X and Y and format them as a string as such: "X_X-Y" Here's how I would handle a specific instance using awk in a shell script: echo "AD=X;BA=Y" | awk '{split($1,a,"AD="); split(a[2],b,";"); split(b[2],c,"BA="); print b[1]"_"b[1]"-"c[2]}' X_X-Y I'd like this to apply for all the entries that aren't NA to the right of column 1. Hoping this adds clarity for any others who also didn't follow my example. Thanks in advance for any tips- Best, Jonathan On Mon, Sep 7, 2015 at 3:48 PM, John Kanewrote: > I'm not making a lot of sense of the data, it looks like you want more > recodes than you have mentioned but in any case you might want to look at > the recode function in the car package. It "should" do what you want > thought there may be faster ways to do it. > > BTW, for supplying sample data have a look at ?dput . Using dput() means > that we see exactly the same data as you do. > > Sorry not to be of more help > John Kane > Kingston ON Canada > > > > -Original Message- > > From: jonsle...@gmail.com > > Sent: Mon, 7 Sep 2015 15:27:05 -0400 > > To: r-help@r-project.org > > Subject: [R] Reformatting text inside a data frame > > > > Hi all, > > I've read in a large data frame that has formatting similar to the > > one > > in the small example below: > > > > df <- > > > data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA)); > > names(df) <- c("rowNum","first","second") > > > >> df > > rowNum first second > > 1 1 AD=13;BA=49 > > 2 2 AD=2;BA=8 AD=1;BA=2 > > 3 3 AD=9;BA=1 > > > > > > I'd like to reformat all of the non-NA entries in df from "first" and > > "second" and so-on such that "AD=13;BA=49" will be replaced by the > > following string: "13_13-49". > > > > So applied to df, the output would be the following: > > > > rowNum first second > > 1 1 13_13-49 > > 2 2 2_2-8 1_1-2 > > 3 3 9_9-1 > > > > > > I'm generally a big proponent of shell scripting with awk, but I'd prefer > > an all-R solution if one exists (and also to learn how to do this more > > generally). > > > > Could someone point out an appropriate paradigm or otherwise point me in > > the right direction? > > > > Best, > > Jonathan > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! > Check it out at http://www.inbox.com/earth > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reformatting text inside a data frame
Hi all, I've read in a large data frame that has formatting similar to the one in the small example below: df <- data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA)); names(df) <- c("rowNum","first","second") > df rowNum first second 1 1 AD=13;BA=49 2 2 AD=2;BA=8 AD=1;BA=2 3 3 AD=9;BA=1 I'd like to reformat all of the non-NA entries in df from "first" and "second" and so-on such that "AD=13;BA=49" will be replaced by the following string: "13_13-49". So applied to df, the output would be the following: rowNum first second 1 1 13_13-49 2 2 2_2-8 1_1-2 3 3 9_9-1 I'm generally a big proponent of shell scripting with awk, but I'd prefer an all-R solution if one exists (and also to learn how to do this more generally). Could someone point out an appropriate paradigm or otherwise point me in the right direction? Best, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame formatting
Hello all, I would like to take a data frame such as the following one: df - data.frame(id=c(A,A,B,B),first=c(BX,NA,NA,LF),second=c(NA,TD,BZ,NA),third=c(NA,NA,RB,BT),fourth=c(LG,QR,NA,NA)) df id first second third fourth 1 ABX NA NA LG 2 A NA TD NA QR 3 B NA BZRB NA 4 BLF NABT NA and merge rows based on the id, such that the value in the column will be one of four possibilities: if both values in the original df are NA, the new value should also be NA. If there are two non-NA values, then the new value should read clash. Otherwise, the new value should be whichever value was not NA. An example output from the command would read in df and read out: id first second third fourth 1 ABX TD NA clash 2 BLF BZclash NA I'd be grateful if someone could point me in the right direction. Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dplyr help
Hello, I've recently discovered the helpful dplyr package. I'm using the 'aggregate' function as such: bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee, tea, cocoa, water), cost = seq(1:8), sex = c(male,female))); bevs$cost - seq(1:8) bevs name drink costsex 1 Bill coffee1 male 2 Marytea2 female 3 Bill cocoa3 male 4 Mary water4 female 5 Bill coffee5 male 6 Marytea6 female 7 Bill cocoa7 male 8 Mary water8 female aggregate(cost ~ name + drink, data = bevs, sum) name drink cost 1 Bill cocoa 10 2 Bill coffee6 3 Marytea8 4 Mary water 12 My issue is that I would like to keep a column for 'sex', for which there is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is always 'male'. Does anyone know of a way to accomplish this, with or without dplyr? The ideal command(s) would produce this: name drink cost sex 1 Bill cocoa 10 male 2 Bill coffee6 male 3 Marytea8 female 4 Mary water 12 female I would be thankful for any suggestion! Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dplyr help
Hi Brian, Thanks for the suggestion, although the command is throwing an error as such: bevs %% group_by(name, sex, drink) %% summarise( cost = sum(cost)) %% select(name, drink, cost, sex) Error: unexpected input in bevs %% group_by(name, sex, drink) %% summarise( Your syntax is new to me so I'm not immediately clear on how to fix it; any idea how? Thanks again, Jonathan On Wed, Jul 29, 2015 at 11:07 PM, Brian Kreeger brian.kree...@gmail.com wrote: dplyr solution: bevs %% group_by(name, sex, drink) %% summarise(cost = sum(cost)) %% select(name, drink, cost, sex) The last select statement puts the output in the column order you wanted in your result. I hope this helps. Brian On Wed, Jul 29, 2015 at 9:37 PM, Jon BR jonsle...@gmail.com wrote: Hello, I've recently discovered the helpful dplyr package. I'm using the 'aggregate' function as such: bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee, tea, cocoa, water), cost = seq(1:8), sex = c(male,female))); bevs$cost - seq(1:8) bevs name drink costsex 1 Bill coffee1 male 2 Marytea2 female 3 Bill cocoa3 male 4 Mary water4 female 5 Bill coffee5 male 6 Marytea6 female 7 Bill cocoa7 male 8 Mary water8 female aggregate(cost ~ name + drink, data = bevs, sum) name drink cost 1 Bill cocoa 10 2 Bill coffee6 3 Marytea8 4 Mary water 12 My issue is that I would like to keep a column for 'sex', for which there is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is always 'male'. Does anyone know of a way to accomplish this, with or without dplyr? The ideal command(s) would produce this: name drink cost sex 1 Bill cocoa 10 male 2 Bill coffee6 male 3 Marytea8 female 4 Mary water 12 female I would be thankful for any suggestion! Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dplyr help
David, I do appreciate your help, if not the dose of contempt. I hope you feel OK. Thanks for the tips, -Jonathan On Wed, Jul 29, 2015 at 11:14 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 29, 2015, at 7:37 PM, Jon BR wrote: Hello, I've recently discovered the helpful dplyr package. I'm using the 'aggregate' function as such: The `aggregate` function is part of base-R: bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee, tea, cocoa, water), cost = seq(1:8), sex = c(male,female))); bevs$cost - seq(1:8) bevs name drink costsex 1 Bill coffee1 male 2 Marytea2 female 3 Bill cocoa3 male 4 Mary water4 female 5 Bill coffee5 male 6 Marytea6 female 7 Bill cocoa7 male 8 Mary water8 female aggregate(cost ~ name + drink, data = bevs, sum) name drink cost 1 Bill cocoa 10 2 Bill coffee6 3 Marytea8 4 Mary water 12 My issue is that I would like to keep a column for 'sex', for which there is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is always 'male'. Does anyone know of a way to accomplish this, with or without dplyr? As pointed out you have not yet demonstrated any dplyr functions. The ideal command(s) would produce this: name drink cost sex 1 Bill cocoa 10 male 2 Bill coffee6 male 3 Marytea8 female 4 Mary water 12 female Doesn't this (glaringly obvious?) approach succeed? aggregate(cost ~ name + drink+sex, data = bevs, sum) name drinksex cost 1 Marytea female8 2 Mary water female 12 3 Bill cocoa male 10 4 Bill coffee male6 I would be thankful for any suggestion! Thanks, Jonathan [[alternative HTML version deleted]] Please learn to post in plain text. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape a data frame
I found the gather function from the tidyr package, which worked nicely: gather(ex,bcX,value, bc1:bc2) gIN group bcX value 1 A_1 A bc1 1219.79 2 A_2 A bc1 1486.84 3 A_3 A bc1 1255.80 4 A_4 A bc1 941.87 5 B_1 B bc1 588.19 6 B_2 B bc1 304.02 7 A_1 A bc2 319.79 8 A_2 A bc2 186.84 9 A_3 A bc2 125.80 10 A_4 A bc294.87 11 B_1 B bc2 1008.19 12 B_2 B bc2 314.02 Thanks. On Wed, Jun 3, 2015 at 5:44 PM, Jon BR jonsle...@gmail.com wrote: Hello, I would like to ask for some advice in reformatting a data frame such as the following one: gIN - c(A_1,A_2,A_3,A_4,B_1,B_2) bc1 - c(1219.79, 1486.84, 1255.80, 941.87, 588.19, 304.02) bc2 - c(319.79, 186.84, 125.80, 94.87, 1008.19, 314.02) group - c(A,A,A,A,B,B) ex - data.frame(gIN = gIN, bc1 = bc1, bc2=bc2, group = group) ex gIN bc1 bc2 group 1 A_1 1219.79 319.79 A 2 A_2 1486.84 186.84 A 3 A_3 1255.80 125.80 A 4 A_4 941.87 94.87 A 5 B_1 588.19 1008.19 B 6 B_2 304.02 314.02 B I would like to reshape this data frame where all the columns that have bc1, bc2,...etc are merged into a single column (call it bcX or something) and the other variables are kept apart, the example solution follows: ex_reshaped gIN bcX group 1 A_1 1219.79 A 2 A_2 1486.84 A 3 A_3 1255.80 A 4 A_4 941.87A 5 B_1 588.19 B 6 B_2 304.02 B 7 A_1 319.79 A 8 A_2 186.84 A 9 A_3 125.80 A 10 A_4 94.87 A 11 B_1 1008.19 B 12 B_2 314.02 B Does anyone know of a package, and/or command to accomplish this? Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape a data frame
Hello, I would like to ask for some advice in reformatting a data frame such as the following one: gIN - c(A_1,A_2,A_3,A_4,B_1,B_2) bc1 - c(1219.79, 1486.84, 1255.80, 941.87, 588.19, 304.02) bc2 - c(319.79, 186.84, 125.80, 94.87, 1008.19, 314.02) group - c(A,A,A,A,B,B) ex - data.frame(gIN = gIN, bc1 = bc1, bc2=bc2, group = group) ex gIN bc1 bc2 group 1 A_1 1219.79 319.79 A 2 A_2 1486.84 186.84 A 3 A_3 1255.80 125.80 A 4 A_4 941.87 94.87 A 5 B_1 588.19 1008.19 B 6 B_2 304.02 314.02 B I would like to reshape this data frame where all the columns that have bc1, bc2,...etc are merged into a single column (call it bcX or something) and the other variables are kept apart, the example solution follows: ex_reshaped gIN bcX group 1 A_1 1219.79 A 2 A_2 1486.84 A 3 A_3 1255.80 A 4 A_4 941.87A 5 B_1 588.19 B 6 B_2 304.02 B 7 A_1 319.79 A 8 A_2 186.84 A 9 A_3 125.80 A 10 A_4 94.87 A 11 B_1 1008.19 B 12 B_2 314.02 B Does anyone know of a package, and/or command to accomplish this? Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Basic data frame manipulation
Hi R-help, Although I know that variations of this question are frequently asked, I searched and haven't found an answer for this specific variant, and wonder if any of you know this off the top of your head: df1 - data.frame(a = 1:5, row.names = letters[1:5]) # letters a to e df2 - data.frame(a = 1:5, row.names = letters[3:7]) # letters c to g df3 - data.frame(a = 1:5, row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e, and g I would like a command to produce a data frame which contains the same rows (with rownames) as in df1, with elements in the columns corresponding to the values present in each of the data frames (if there exists a matching row; else NA if not present). This should ideally work even if the rows are in random order and if not sorted. The result would look something like this: df1.a df2.a df3.a a 1 NA 1 b 2 NA 2 c 3 1 3 d 4 2 NA e 5 3 4 Thank you in advance for any tips. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 beginner question
Hello, I'm having fun exploring the pretty graphing options in R, although I'm struggling to figure out how to do some simple things; would be thankful if someone could point me toward relevant sections of the manual or provide some starter code to get me going. I'd like to extend what is offered in the manual here for stacked bar plots: http://docs.ggplot2.org/current/geom_bar.html For starters library(ggplot2) ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar() Which makes a nice stacked barplot featuring counts on the y-axis. I'd like to transform this to fraction or percentage, and (with some googling) came up with this: ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position = 'fill') However, I prefer using a line via frequency polygons. Using counts, this is: ggplot(diamonds, aes(clarity, colour=cut)) + geom_freqpoly(aes(group = cut)) I'd like to adjust this to show fraction instead of counts on the y-axis (as in the previous example), but this command is obviously incorrectly constructed: ggplot(diamonds, aes(clarity, colour=cut)) + geom_freqpoly(aes(group = cut), position = 'fill') Error: position_fill requires the following missing aesthetics: ymax Any pointers would be appreciated. Best, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame pointers?
Hi Arun, That seemed to do the trick - thanks!! Jonathan On Wed, Oct 23, 2013 at 11:12 PM, arun smartpink...@yahoo.com wrote: HI, Better would be: res1 - dcast(df,gene~case,value.var=issue,paste,collapse=,,fill=0) str(res1) #'data.frame':2 obs. of 4 variables: # $ gene : chr gene1 gene2 # $ case_1: chr nsyn,amp 0 # $ case_2: chr del 0 # $ case_3: chr 0 UTR write.table(res1,test.txt,sep=\t,quote=FALSE,row.names=FALSE) A.K. On , arun smartpink...@yahoo.com wrote: Hi Jonathan,If you look at the str() str(res) 'data.frame':2 obs. of 4 variables: $ gene : chr gene1 gene2 $ case_1:List of 2 ..$ : chr nsyn amp ..$ : chr $ case_2:List of 2 ..$ : chr del ..$ : chr $ case_3:List of 2 ..$ : chr ..$ : chr UTR In this case, capture.output(res,file=test.txt) #should work But, if you wanted to use ?write.table() and also to substitute zeros, perhaps: res[,2:4] - lapply(res[,2:4],function(x) {x1 -unlist(lapply(x,paste,collapse=,));x1[x1==] - 0; x1}) str(res) #'data.frame':2 obs. of 4 variables: # $ gene : chr gene1 gene2 # $ case_1: chr nsyn,amp 0 # $ case_2: chr del 0 # $ case_3: chr 0 UTR write.table(res,test.txt,sep=\t,quote=FALSE,row.names=FALSE) A.K. On Wednesday, October 23, 2013 10:44 PM, Jon BR jonsle...@gmail.com wrote: Hi Arun, Your suggestion using dcast is simple and worked splendidly! Unfortunately, the resulting data frame does not play nicely with write.table. Any idea how to could print this out to a tab-delimited text file, perhaps substituting zeros in for the empty cells? See the error below: write.table(res,test.txt) Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, : unimplemented type 'list' in 'EncodeElement' Best, Jonathan On Wed, Oct 23, 2013 at 9:50 PM, arun smartpink...@yahoo.com wrote: HI, You may try: library(reshape2) df - data.frame(case=c(case_1,case_1,case_2,case_3), gene=c(gene1,gene1,gene1,gene2), issue=c(nsyn,amp,del,UTR), stringsAsFactors=FALSE) res - dcast(df,gene~case,value.var=issue,list) res # genecase_1 case_2 case_3 #1 gene1 nsyn, ampdel #2 gene2 UTR A.K. On Wednesday, October 23, 2013 7:38 PM, Jon BR jonsle...@gmail.com wrote: Hello, I've been running several programs in the unix shell, and it's time to combine results from several different pipelines. I've been writing shell scripts with heavy use of awk and grep to make big text files, but I'm thinking it would be better to have all my data in one big structure in R so that I can query whatever attributes I like, and print several corresponding tables to separate files. I haven't used R in years, so I was hoping somebody might be able to suggest a solution or combinatin of functions that could help me get oriented.. Right now, I can import my data into a data frame that looks like this: df - data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR)) df case gene issue 1 case_1 gene1 nsyn 2 case_1 gene1 amp 3 case_2 gene1 del 4 case_3 gene2 UTR I'd like to cook up some combination of functions/scripting that can convert a table like df to produce a list or a data frame/ matrix that looks like df2: df2 case_1 case_2 case_3 gene1 nsyn,ampdel 0 gene20 0UTR I can build df2 manually, like this: df2 -data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR)) rownames(df2)-c(gene1,gene2) but obviously do not want to do this by hand; I want R to generate df2 from df. Any pointers/ideas would be most welcome! Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame pointers?
Hello, I've been running several programs in the unix shell, and it's time to combine results from several different pipelines. I've been writing shell scripts with heavy use of awk and grep to make big text files, but I'm thinking it would be better to have all my data in one big structure in R so that I can query whatever attributes I like, and print several corresponding tables to separate files. I haven't used R in years, so I was hoping somebody might be able to suggest a solution or combinatin of functions that could help me get oriented.. Right now, I can import my data into a data frame that looks like this: df - data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR)) df case gene issue 1 case_1 gene1 nsyn 2 case_1 gene1 amp 3 case_2 gene1 del 4 case_3 gene2 UTR I'd like to cook up some combination of functions/scripting that can convert a table like df to produce a list or a data frame/ matrix that looks like df2: df2 case_1 case_2 case_3 gene1 nsyn,ampdel 0 gene20 0UTR I can build df2 manually, like this: df2 -data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR)) rownames(df2)-c(gene1,gene2) but obviously do not want to do this by hand; I want R to generate df2 from df. Any pointers/ideas would be most welcome! Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame pointers?
Hi Arun, Your suggestion using dcast is simple and worked splendidly! Unfortunately, the resulting data frame does not play nicely with write.table. Any idea how to could print this out to a tab-delimited text file, perhaps substituting zeros in for the empty cells? See the error below: write.table(res,test.txt) Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, : unimplemented type 'list' in 'EncodeElement' Best, Jonathan On Wed, Oct 23, 2013 at 9:50 PM, arun smartpink...@yahoo.com wrote: HI, You may try: library(reshape2) df - data.frame(case=c(case_1,case_1,case_2,case_3), gene=c(gene1,gene1,gene1,gene2), issue=c(nsyn,amp,del,UTR), stringsAsFactors=FALSE) res - dcast(df,gene~case,value.var=issue,list) res # genecase_1 case_2 case_3 #1 gene1 nsyn, ampdel #2 gene2 UTR A.K. On Wednesday, October 23, 2013 7:38 PM, Jon BR jonsle...@gmail.com wrote: Hello, I've been running several programs in the unix shell, and it's time to combine results from several different pipelines. I've been writing shell scripts with heavy use of awk and grep to make big text files, but I'm thinking it would be better to have all my data in one big structure in R so that I can query whatever attributes I like, and print several corresponding tables to separate files. I haven't used R in years, so I was hoping somebody might be able to suggest a solution or combinatin of functions that could help me get oriented.. Right now, I can import my data into a data frame that looks like this: df - data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR)) df case gene issue 1 case_1 gene1 nsyn 2 case_1 gene1 amp 3 case_2 gene1 del 4 case_3 gene2 UTR I'd like to cook up some combination of functions/scripting that can convert a table like df to produce a list or a data frame/ matrix that looks like df2: df2 case_1 case_2 case_3 gene1 nsyn,ampdel 0 gene20 0UTR I can build df2 manually, like this: df2 -data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR)) rownames(df2)-c(gene1,gene2) but obviously do not want to do this by hand; I want R to generate df2 from df. Any pointers/ideas would be most welcome! Thanks, Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using sample() for a vector of length 1
Hi All, I'm trying to use the sample function within a loop where the vector being sampled from (the first argument in the function) will vary in length and composition. When the vector is down in size to containing only one element, I run into the undesired behaviour acknowledged in the ?sample help file. I don't want sample(10,1) to return a number from within 1:10, but rather I'd just want it to return 10 every time. Example): Actual: sample(10,1) [1] 2 sample(10,1) [1] 9 sample(10,1) [1] 4 Desired: sample(10,1) [1] 10 sample(10,1) [1] 10 sample(10,1) [1] 10 Perhaps sample is not the appropriate function. I dunno. Any thoughts? Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.