Re: [R] Reformatting text inside a data frame

2015-09-07 Thread Jon BR
Hi John,
 Thanks for the reply; I'm pasting here the output from dput, with a
'df <-' added in front:

df <- structure(list(rowNum = c(1, 2, 3), first = structure(c(NA, 1L,
2L), .Label = c("AD=2;BA=8", "AD=9;BA=1"), class = "factor"),
second = structure(c(2L, 1L, NA), .Label = c("AD=1;BA=2",
"AD=13;BA=49"), class = "factor")), .Names = c("rowNum",
"first", "second"), row.names = c(NA, -3L), class = "data.frame")




To add more specifics, about what I would like; each value to be adjusted
has the following general format:

"AD=X;BA=Y"

I would like to extract the values of X and Y and format them as a string
as such:

"X_X-Y"


Here's how I would handle a specific instance using awk in a shell script:

echo  "AD=X;BA=Y" | awk '{split($1,a,"AD="); split(a[2],b,";");
split(b[2],c,"BA="); print b[1]"_"b[1]"-"c[2]}'
X_X-Y

I'd like this to apply for all the entries that aren't NA to the right of
column 1.

Hoping this adds clarity for any others who also didn't follow my example.

Thanks in advance for any tips-

Best,
Jonathan

On Mon, Sep 7, 2015 at 3:48 PM, John Kane  wrote:

> I'm not making a lot of sense of the data, it looks like you want more
> recodes than you have mentioned  but in any case  you might want to look at
> the recode function in the car package.  It "should" do what you want
> thought there may be faster ways to do it.
>
> BTW, for supplying sample data have a look at ?dput . Using dput() means
> that we see exactly the same data as you do.
>
> Sorry not to be of more help
> John Kane
> Kingston ON Canada
>
>
> > -Original Message-
> > From: jonsle...@gmail.com
> > Sent: Mon, 7 Sep 2015 15:27:05 -0400
> > To: r-help@r-project.org
> > Subject: [R] Reformatting text inside a data frame
> >
> > Hi all,
> > I've read in a large data frame that has formatting similar to the
> > one
> > in the small example below:
> >
> > df <-
> >
> data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA));
> > names(df) <- c("rowNum","first","second")
> >
> >> df
> >   rowNum first  second
> > 1  1   AD=13;BA=49
> > 2  2 AD=2;BA=8   AD=1;BA=2
> > 3  3 AD=9;BA=1
> >
> >
> > I'd like to reformat all of the non-NA entries in df from "first" and
> > "second" and so-on such that "AD=13;BA=49" will be replaced by the
> > following string: "13_13-49".
> >
> > So applied to df, the output would be the following:
> >
> >   rowNum first  second
> > 1  1   13_13-49
> > 2  2 2_2-8   1_1-2
> > 3  3 9_9-1
> >
> >
> > I'm generally a big proponent of shell scripting with awk, but I'd prefer
> > an all-R solution if one exists (and also to learn how to do this more
> > generally).
> >
> > Could someone point out an appropriate paradigm or otherwise point me in
> > the right direction?
> >
> > Best,
> > Jonathan
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> 
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
> Check it out at http://www.inbox.com/earth
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reformatting text inside a data frame

2015-09-07 Thread Jon BR
Hi all,
I've read in a large data frame that has formatting similar to the one
in the small example below:

df <-
data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA));
names(df) <- c("rowNum","first","second")

> df
  rowNum first  second
1  1   AD=13;BA=49
2  2 AD=2;BA=8   AD=1;BA=2
3  3 AD=9;BA=1


I'd like to reformat all of the non-NA entries in df from "first" and
"second" and so-on such that "AD=13;BA=49" will be replaced by the
following string: "13_13-49".

So applied to df, the output would be the following:

  rowNum first  second
1  1   13_13-49
2  2 2_2-8   1_1-2
3  3 9_9-1


I'm generally a big proponent of shell scripting with awk, but I'd prefer
an all-R solution if one exists (and also to learn how to do this more
generally).

Could someone point out an appropriate paradigm or otherwise point me in
the right direction?

Best,
Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data frame formatting

2015-08-18 Thread Jon BR
Hello all,
I would like to take a data frame such as the following one:

 df -
data.frame(id=c(A,A,B,B),first=c(BX,NA,NA,LF),second=c(NA,TD,BZ,NA),third=c(NA,NA,RB,BT),fourth=c(LG,QR,NA,NA))
 df
  id first second third fourth
1  ABX   NA  NA LG
2  A  NA TD  NA QR
3  B  NA BZRB   NA
4  BLF   NABT   NA

and merge rows based on the id, such that the value in the column will be
one of four possibilities: if both values in the original df are NA, the
new value should also be NA.  If there are two non-NA values, then the
new value should read clash.  Otherwise, the new value should be
whichever value was not NA.

An example output from the command would read in df and read out:


  id first second third fourth
1  ABX   TD  NA clash
2  BLF   BZclash   NA


I'd be grateful if someone could point me in the right direction.

Thanks,
Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dplyr help

2015-07-29 Thread Jon BR
Hello,
I've recently discovered the helpful dplyr package.  I'm using the
'aggregate' function as such:


bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee,
tea, cocoa, water), cost = seq(1:8), sex = c(male,female)));
bevs$cost - seq(1:8)

 bevs
  name  drink costsex
1 Bill coffee1   male
2 Marytea2 female
3 Bill  cocoa3   male
4 Mary  water4 female
5 Bill coffee5   male
6 Marytea6 female
7 Bill  cocoa7   male
8 Mary  water8 female


 aggregate(cost ~ name + drink, data = bevs, sum)
  name  drink cost
1 Bill  cocoa   10
2 Bill coffee6
3 Marytea8
4 Mary  water   12

My issue is that I would like to keep a column for 'sex', for which there
is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is
always 'male'.

Does anyone know of a way to accomplish this, with or without dplyr?  The
ideal command(s) would produce this:

  name  drink cost sex
1 Bill  cocoa   10   male
2 Bill coffee6   male
3 Marytea8   female
4 Mary  water   12   female

I would be thankful for any suggestion!

Thanks,
Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dplyr help

2015-07-29 Thread Jon BR
Hi Brian,
Thanks for the suggestion, although the command is throwing an error as
such:

 bevs %% group_by(name, sex, drink) %% summarise( cost = sum(cost)) %%
select(name, drink, cost, sex)
Error: unexpected input in bevs %% group_by(name, sex, drink) %%
summarise( 

Your syntax is new to me so I'm not immediately clear on how to fix it; any
idea how?

Thanks again,
Jonathan


On Wed, Jul 29, 2015 at 11:07 PM, Brian Kreeger brian.kree...@gmail.com
wrote:

 ​dplyr solution:

 bevs %% group_by(name, sex, drink) %% summarise(​cost = sum(cost)) %%
 select(name, drink, cost, sex)

 The last select statement puts the output in the column order you wanted
 in your result.

 I hope this helps.

 Brian



 On Wed, Jul 29, 2015 at 9:37 PM, Jon BR jonsle...@gmail.com wrote:

 Hello,
 I've recently discovered the helpful dplyr package.  I'm using the
 'aggregate' function as such:


 bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee,
 tea, cocoa, water), cost = seq(1:8), sex = c(male,female)));
 bevs$cost - seq(1:8)

  bevs
   name  drink costsex
 1 Bill coffee1   male
 2 Marytea2 female
 3 Bill  cocoa3   male
 4 Mary  water4 female
 5 Bill coffee5   male
 6 Marytea6 female
 7 Bill  cocoa7   male
 8 Mary  water8 female
 

  aggregate(cost ~ name + drink, data = bevs, sum)
   name  drink cost
 1 Bill  cocoa   10
 2 Bill coffee6
 3 Marytea8
 4 Mary  water   12

 My issue is that I would like to keep a column for 'sex', for which there
 is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is
 always 'male'.

 Does anyone know of a way to accomplish this, with or without dplyr?  The
 ideal command(s) would produce this:

   name  drink cost sex
 1 Bill  cocoa   10   male
 2 Bill coffee6   male
 3 Marytea8   female
 4 Mary  water   12   female

 I would be thankful for any suggestion!

 Thanks,
 Jonathan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr help

2015-07-29 Thread Jon BR
David,
I do appreciate your help, if not the dose of contempt.  I hope you
feel OK.

Thanks for the tips,
-Jonathan

On Wed, Jul 29, 2015 at 11:14 PM, David Winsemius dwinsem...@comcast.net
wrote:


 On Jul 29, 2015, at 7:37 PM, Jon BR wrote:

  Hello,
 I've recently discovered the helpful dplyr package.  I'm using the
  'aggregate' function as such:

 The `aggregate` function is part of base-R:

  bevs - data.frame(cbind(name = c(Bill, Mary), drink = c(coffee,
  tea, cocoa, water), cost = seq(1:8), sex = c(male,female)));
  bevs$cost - seq(1:8)
 
  bevs
   name  drink costsex
  1 Bill coffee1   male
  2 Marytea2 female
  3 Bill  cocoa3   male
  4 Mary  water4 female
  5 Bill coffee5   male
  6 Marytea6 female
  7 Bill  cocoa7   male
  8 Mary  water8 female
 
 
  aggregate(cost ~ name + drink, data = bevs, sum)
   name  drink cost
  1 Bill  cocoa   10
  2 Bill coffee6
  3 Marytea8
  4 Mary  water   12
 
  My issue is that I would like to keep a column for 'sex', for which there
  is a 1:1 mapping with 'name', such that every time 'Bill' appears, it is
  always 'male'.
 
  Does anyone know of a way to accomplish this, with or without dplyr?

 As pointed out you have not yet demonstrated any dplyr functions.

  The
  ideal command(s) would produce this:
 
   name  drink cost sex
  1 Bill  cocoa   10   male
  2 Bill coffee6   male
  3 Marytea8   female
  4 Mary  water   12   female

 Doesn't this (glaringly obvious?) approach succeed?

  aggregate(cost ~ name + drink+sex, data = bevs, sum)
   name  drinksex cost
 1 Marytea female8
 2 Mary  water female   12
 3 Bill  cocoa   male   10
 4 Bill coffee   male6
 


 
  I would be thankful for any suggestion!
 
  Thanks,
  Jonathan
 
[[alternative HTML version deleted]]
 
 

 Please learn to post in plain text.

 --

 David Winsemius
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshape a data frame

2015-06-03 Thread Jon BR
I found the gather function from the tidyr package, which worked nicely:

gather(ex,bcX,value, bc1:bc2)
   gIN group bcX value
1  A_1 A bc1  1219.79
2  A_2 A bc1  1486.84
3  A_3 A bc1  1255.80
4  A_4 A bc1   941.87
5  B_1 B bc1   588.19
6  B_2 B bc1   304.02
7  A_1 A bc2   319.79
8  A_2 A bc2   186.84
9  A_3 A bc2   125.80
10 A_4 A bc294.87
11 B_1 B bc2  1008.19
12 B_2 B bc2   314.02

Thanks.





On Wed, Jun 3, 2015 at 5:44 PM, Jon BR jonsle...@gmail.com wrote:

 Hello,

 I would like to ask for some advice in reformatting a data frame such as
 the following one:


 gIN - c(A_1,A_2,A_3,A_4,B_1,B_2)
 bc1 - c(1219.79, 1486.84, 1255.80, 941.87, 588.19, 304.02)
 bc2 - c(319.79, 186.84, 125.80, 94.87, 1008.19, 314.02)
 group - c(A,A,A,A,B,B)

 ex - data.frame(gIN = gIN, bc1 = bc1, bc2=bc2, group = group)

  ex
   gIN bc1 bc2 group
 1 A_1 1219.79  319.79 A
 2 A_2 1486.84  186.84 A
 3 A_3 1255.80  125.80 A
 4 A_4  941.87   94.87 A
 5 B_1  588.19 1008.19 B
 6 B_2  304.02  314.02 B

 I would like to reshape this data frame where all the columns that have
 bc1, bc2,...etc are merged into a single column (call it bcX or something)
 and the other variables are kept apart, the example solution follows:


  ex_reshaped
   gIN bcX group
 1 A_1 1219.79   A
 2 A_2 1486.84   A
 3 A_3 1255.80   A
 4 A_4  941.87A
 5 B_1  588.19  B
 6 B_2  304.02   B
 7 A_1 319.79   A
 8 A_2 186.84   A
 9 A_3 125.80   A
 10 A_4 94.87   A
 11 B_1 1008.19   B
 12 B_2 314.02 B

 Does anyone know of a package, and/or command to accomplish this?

 Thank you


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshape a data frame

2015-06-03 Thread Jon BR
Hello,

I would like to ask for some advice in reformatting a data frame such as
the following one:


gIN - c(A_1,A_2,A_3,A_4,B_1,B_2)
bc1 - c(1219.79, 1486.84, 1255.80, 941.87, 588.19, 304.02)
bc2 - c(319.79, 186.84, 125.80, 94.87, 1008.19, 314.02)
group - c(A,A,A,A,B,B)

ex - data.frame(gIN = gIN, bc1 = bc1, bc2=bc2, group = group)

 ex
  gIN bc1 bc2 group
1 A_1 1219.79  319.79 A
2 A_2 1486.84  186.84 A
3 A_3 1255.80  125.80 A
4 A_4  941.87   94.87 A
5 B_1  588.19 1008.19 B
6 B_2  304.02  314.02 B

I would like to reshape this data frame where all the columns that have
bc1, bc2,...etc are merged into a single column (call it bcX or something)
and the other variables are kept apart, the example solution follows:


 ex_reshaped
  gIN bcX group
1 A_1 1219.79   A
2 A_2 1486.84   A
3 A_3 1255.80   A
4 A_4  941.87A
5 B_1  588.19  B
6 B_2  304.02   B
7 A_1 319.79   A
8 A_2 186.84   A
9 A_3 125.80   A
10 A_4 94.87   A
11 B_1 1008.19   B
12 B_2 314.02 B

Does anyone know of a package, and/or command to accomplish this?

Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Basic data frame manipulation

2015-02-23 Thread Jon BR
Hi R-help,
Although I know that variations of this question are frequently asked,
I searched and haven't found an answer for this specific variant, and
wonder if any of you know this off the top of your head:

df1 - data.frame(a = 1:5,
  row.names = letters[1:5]) # letters a to e
df2 - data.frame(a = 1:5,
  row.names = letters[3:7]) # letters c to g
df3 - data.frame(a = 1:5,
  row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e,
and g


I would like a command to produce a data frame which contains the same rows
(with rownames) as in df1, with elements in the columns corresponding to
the values present in each of the data frames (if there exists a matching
row; else NA if not present).  This should ideally work even if the rows
are in random order and if not sorted.

The result would look something like this:

df1.a df2.a df3.a
a 1 NA 1
b 2 NA 2
c 3 1 3
d 4 2 NA
e 5 3 4

Thank you in advance for any tips.

Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 beginner question

2013-11-06 Thread Jon BR
Hello,
   I'm having fun exploring the pretty graphing options in R, although
I'm struggling to figure out how to do some simple things; would be
thankful if someone could point me toward relevant sections of the manual
or provide some starter code to get me going.

I'd like to extend what is offered in the manual here for stacked bar plots:

http://docs.ggplot2.org/current/geom_bar.html

For starters

library(ggplot2)
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()

Which makes a nice stacked barplot featuring counts on the y-axis. I'd like
to transform this to fraction or percentage, and (with some googling) came
up with this:

ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position = 'fill')

However, I prefer using  a line via frequency polygons.  Using counts, this
is:

ggplot(diamonds, aes(clarity, colour=cut)) + geom_freqpoly(aes(group = cut))

I'd like to adjust this to show fraction instead of counts on the y-axis
(as in the previous example), but this command is obviously incorrectly
constructed:

ggplot(diamonds, aes(clarity, colour=cut)) + geom_freqpoly(aes(group =
cut), position = 'fill')
Error: position_fill requires the following missing aesthetics: ymax

Any pointers would be appreciated.

Best,
Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame pointers?

2013-10-24 Thread Jon BR
Hi Arun,
That seemed to do the trick - thanks!!

Jonathan


On Wed, Oct 23, 2013 at 11:12 PM, arun smartpink...@yahoo.com wrote:

 HI,

 Better would be:
 res1 - dcast(df,gene~case,value.var=issue,paste,collapse=,,fill=0)

 str(res1)
 #'data.frame':2 obs. of  4 variables:
 # $ gene  : chr  gene1 gene2
 # $ case_1: chr  nsyn,amp 0
 # $ case_2: chr  del 0
 # $ case_3: chr  0 UTR

  write.table(res1,test.txt,sep=\t,quote=FALSE,row.names=FALSE)


 A.K.


 On , arun smartpink...@yahoo.com wrote:
 Hi Jonathan,If you look at the str()
  str(res)
 'data.frame':2 obs. of  4 variables:
  $ gene  : chr  gene1 gene2
  $ case_1:List of 2
   ..$ : chr  nsyn amp
   ..$ : chr
  $ case_2:List of 2
   ..$ : chr del
   ..$ : chr
  $ case_3:List of 2
   ..$ : chr
   ..$ : chr UTR

 In this case,

 capture.output(res,file=test.txt) #should work

 But, if you wanted to use ?write.table() and also to substitute zeros,
 perhaps:


 res[,2:4] - lapply(res[,2:4],function(x) {x1
 -unlist(lapply(x,paste,collapse=,));x1[x1==] - 0; x1})


  str(res)
 #'data.frame':2 obs. of  4 variables:
 # $ gene  : chr  gene1 gene2
 # $ case_1: chr  nsyn,amp 0
 # $ case_2: chr  del 0
 # $ case_3: chr  0 UTR

  write.table(res,test.txt,sep=\t,quote=FALSE,row.names=FALSE)


 A.K.





 On Wednesday, October 23, 2013 10:44 PM, Jon BR jonsle...@gmail.com
 wrote:

 Hi Arun,
Your suggestion using dcast is simple and worked splendidly!
  Unfortunately, the resulting data frame does not play nicely with
 write.table.

 Any idea how to could print this out to a tab-delimited text file, perhaps
 substituting zeros in for the empty cells?

 See the error below:
  write.table(res,test.txt)
 Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,
  :
   unimplemented type 'list' in 'EncodeElement'


 Best,
 Jonathan






 On Wed, Oct 23, 2013 at 9:50 PM, arun smartpink...@yahoo.com wrote:

 HI,
 
 You may try:
 library(reshape2)
 df -
 data.frame(case=c(case_1,case_1,case_2,case_3),
 gene=c(gene1,gene1,gene1,gene2), issue=c(nsyn,amp,del,UTR),
 stringsAsFactors=FALSE)
 res - dcast(df,gene~case,value.var=issue,list)
  res
 #   genecase_1 case_2 case_3
 #1 gene1 nsyn, ampdel
 #2 gene2 UTR
 
 
 A.K.
 
 
 
 On Wednesday, October 23, 2013 7:38 PM, Jon BR jonsle...@gmail.com
 wrote:
 Hello,
 I've been running several programs in the unix shell, and it's time to
 combine results from several different pipelines.  I've been writing shell
 scripts with heavy use of awk and grep to make big text files, but I'm
 thinking it would be better to have all my data in one big structure in R
 so that I can query whatever attributes I like, and print several
 corresponding tables to separate files.
 
 I haven't used R in years, so I was hoping somebody might be able to
 suggest a solution or combinatin of functions that could help me get
 oriented..
 
 Right now, I can import my data into a data frame that looks like this:
 
 df -

 data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR))
  df
 case  gene issue
 1 case_1 gene1  nsyn
 2 case_1 gene1   amp
 3 case_2 gene1   del
 4 case_3 gene2   UTR
 
 
 I'd like to cook up some combination of functions/scripting that can
 convert a table like df to produce a list or a data frame/ matrix that
 looks like df2:
 
  df2
 case_1 case_2 case_3
 gene1 nsyn,ampdel  0
 gene20  0UTR
 
 I can build df2 manually, like this:
 df2

 -data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR))
 rownames(df2)-c(gene1,gene2)
 
 but obviously do not want to do this by hand; I want R to generate df2
 from
 df.
 
 Any pointers/ideas would be most welcome!
 
 Thanks,
 Jonathan
 
 [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data frame pointers?

2013-10-23 Thread Jon BR
Hello,
I've been running several programs in the unix shell, and it's time to
combine results from several different pipelines.  I've been writing shell
scripts with heavy use of awk and grep to make big text files, but I'm
thinking it would be better to have all my data in one big structure in R
so that I can query whatever attributes I like, and print several
corresponding tables to separate files.

I haven't used R in years, so I was hoping somebody might be able to
suggest a solution or combinatin of functions that could help me get
oriented..

Right now, I can import my data into a data frame that looks like this:

df -
data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR))
 df
case  gene issue
1 case_1 gene1  nsyn
2 case_1 gene1   amp
3 case_2 gene1   del
4 case_3 gene2   UTR


I'd like to cook up some combination of functions/scripting that can
convert a table like df to produce a list or a data frame/ matrix that
looks like df2:

 df2
case_1 case_2 case_3
gene1 nsyn,ampdel  0
gene20  0UTR

I can build df2 manually, like this:
df2
-data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR))
rownames(df2)-c(gene1,gene2)

but obviously do not want to do this by hand; I want R to generate df2 from
df.

Any pointers/ideas would be most welcome!

Thanks,
Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame pointers?

2013-10-23 Thread Jon BR
Hi Arun,
   Your suggestion using dcast is simple and worked splendidly!
 Unfortunately, the resulting data frame does not play nicely with
write.table.

Any idea how to could print this out to a tab-delimited text file, perhaps
substituting zeros in for the empty cells?

See the error below:
 write.table(res,test.txt)
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :
  unimplemented type 'list' in 'EncodeElement'


Best,
Jonathan





On Wed, Oct 23, 2013 at 9:50 PM, arun smartpink...@yahoo.com wrote:

 HI,

 You may try:
 library(reshape2)
 df -
 data.frame(case=c(case_1,case_1,case_2,case_3),
 gene=c(gene1,gene1,gene1,gene2), issue=c(nsyn,amp,del,UTR),
 stringsAsFactors=FALSE)
 res - dcast(df,gene~case,value.var=issue,list)
  res
 #   genecase_1 case_2 case_3
 #1 gene1 nsyn, ampdel
 #2 gene2 UTR


 A.K.


 On Wednesday, October 23, 2013 7:38 PM, Jon BR jonsle...@gmail.com
 wrote:
 Hello,
 I've been running several programs in the unix shell, and it's time to
 combine results from several different pipelines.  I've been writing shell
 scripts with heavy use of awk and grep to make big text files, but I'm
 thinking it would be better to have all my data in one big structure in R
 so that I can query whatever attributes I like, and print several
 corresponding tables to separate files.

 I haven't used R in years, so I was hoping somebody might be able to
 suggest a solution or combinatin of functions that could help me get
 oriented..

 Right now, I can import my data into a data frame that looks like this:

 df -

 data.frame(case=c(case_1,case_1,case_2,case_3),gene=c(gene1,gene1,gene1,gene2),issue=c(nsyn,amp,del,UTR))
  df
 case  gene issue
 1 case_1 gene1  nsyn
 2 case_1 gene1   amp
 3 case_2 gene1   del
 4 case_3 gene2   UTR


 I'd like to cook up some combination of functions/scripting that can
 convert a table like df to produce a list or a data frame/ matrix that
 looks like df2:

  df2
 case_1 case_2 case_3
 gene1 nsyn,ampdel  0
 gene20  0UTR

 I can build df2 manually, like this:
 df2

 -data.frame(case_1=c(nsyn,amp,0),case_2=c(del,0),case_3=c(0,UTR))
 rownames(df2)-c(gene1,gene2)

 but obviously do not want to do this by hand; I want R to generate df2 from
 df.

 Any pointers/ideas would be most welcome!

 Thanks,
 Jonathan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using sample() for a vector of length 1

2010-07-22 Thread Jon BR
Hi All,
I'm trying to use the sample function within a loop where the
vector being sampled from (the first argument in the function) will
vary in length and composition.  When the vector is down in size to
containing only one element, I run into the undesired behaviour
acknowledged in the ?sample help file.  I don't want sample(10,1) to
return a number from within 1:10, but rather I'd just want it to
return 10 every time.

Example):


Actual:
 sample(10,1)
[1] 2
 sample(10,1)
[1] 9
 sample(10,1)
[1] 4


Desired:
 sample(10,1)
[1] 10
 sample(10,1)
[1] 10
 sample(10,1)
[1] 10


Perhaps sample is not the appropriate function.  I dunno.  Any thoughts?

Regards,
Jonathan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.