Re: [R] data frame question

2017-08-06 Thread Andras Farkas via R-help
thank you both... assumption is in fact that a and b are always the same 
length... these work for me well...

much appreciate it... 
Andras 


On Sunday, August 6, 2017 12:14 PM, Ulrik Stervbo  
wrote:



Hi Andreas,

assuming that the increment is always indicated by the same value (in your 
example 0), this could work:

df$a <- cumsum(seq_along(df$b) %in% which(df$b == 0))

df

HTH,
Ulrik

On Sun, 6 Aug 2017 at 18:06 Bert Gunter  wrote:

Your specification is a bit unclear to me, so I'm not sure the below
>is really what you want. For example, your example seems to imply that
>a and b must be of the same length, but I do not see that your
>description requires this. So the following may not be what you want
>exactly, but one way to do this(there may be cleverer ones!) is to
>make use of ?rep. Everything else is just fussy detail. (Your example
>suggests that you should also learn about ?seq. Both of these should
>be covered in any good R tutorial, which you should probably spend
>time with if you haven't already).
>
>Anyway...
>
>## WARNING: Not thoroughly tested! May (probably :-( ) contain bugs.
>
>f <- function(x,y,switch_val =0)
>{
>   wh <- which(y == switch_val)
>   len <- length(wh)
>   len_x <- length(x)
>   if(!len) x
>   else if(wh[1] == 1){
>  if(len ==1) return(rep(x[1],len_x))
>  else {
> wh <- wh[-1]
> len <- len -1
>  }
>   }
>   count <- c(wh[1]-1,diff(wh))
>   if(wh[len] == len_x) count<- c(count,1)
>   else count <- c(count, len_x - wh[len] +1)
>   rep(x[seq_along(count)],times = count)
>}
>
>> a <- c(1:5,1:8)
>> b <- c(0:4,0:7)
>> f(a,b)
> [1] 1 1 1 1 1 2 2 2 2 2 2 2 2
>
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Sun, Aug 6, 2017 at 4:10 AM, Andras Farkas via R-help
> wrote:
>> Dear All,
>>
>> wonder if you have thoughts on the following:
>>
>> let us say we have:
>>
>> df<-data.frame(a=c(1,2,3,4,5,1,2,3,4,5,6,7,8),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
>>
>>
>>  I would like to rewrite values in column name "a" based on values in column 
>> name "b", where based on a certain value of column "b" the next value of 
>> column 'a' is prompted, in other words would like to have this as a result:
>>
>> df<-data.frame(a=c(1,1,1,1,1,2,2,2,2,2,2,2,2),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
>>
>>
>> where at the value of 0 in column 'b' the number in column a changes from 1 
>> to 2. From the first zero value of column 'b' and until the next zero in 
>> column 'b' the numbers would not change in 'a', ie: they are all 1 in my 
>> example... then from 2 it would change to 3 again as 'b' will have zero 
>> again in a row, and so on.. Would be grateful for a solution that would 
>> allow me to set the values (from 'b') that determine how the values get 
>> established in 'a' (ie: lets say instead of 0 I would want 3 being the value 
>> where 1 changes to 2 in 'a') and that would be flexible to take into account 
>> that the number of rows and the number of time 0 shows up in a row in column 
>> 'b' may vary...
>>
>> much appreciate your thoughts..
>>
>> Andras
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2017-08-06 Thread Ulrik Stervbo
Hi Andreas,

assuming that the increment is always indicated by the same value (in your
example 0), this could work:

df$a <- cumsum(seq_along(df$b) %in% which(df$b == 0))
df

HTH,
Ulrik

On Sun, 6 Aug 2017 at 18:06 Bert Gunter  wrote:

> Your specification is a bit unclear to me, so I'm not sure the below
> is really what you want. For example, your example seems to imply that
> a and b must be of the same length, but I do not see that your
> description requires this. So the following may not be what you want
> exactly, but one way to do this(there may be cleverer ones!) is to
> make use of ?rep. Everything else is just fussy detail. (Your example
> suggests that you should also learn about ?seq. Both of these should
> be covered in any good R tutorial, which you should probably spend
> time with if you haven't already).
>
> Anyway...
>
> ## WARNING: Not thoroughly tested! May (probably :-( ) contain bugs.
>
> f <- function(x,y,switch_val =0)
> {
>wh <- which(y == switch_val)
>len <- length(wh)
>len_x <- length(x)
>if(!len) x
>else if(wh[1] == 1){
>   if(len ==1) return(rep(x[1],len_x))
>   else {
>  wh <- wh[-1]
>  len <- len -1
>   }
>}
>count <- c(wh[1]-1,diff(wh))
>if(wh[len] == len_x) count<- c(count,1)
>else count <- c(count, len_x - wh[len] +1)
>rep(x[seq_along(count)],times = count)
> }
>
> > a <- c(1:5,1:8)
> > b <- c(0:4,0:7)
> > f(a,b)
>  [1] 1 1 1 1 1 2 2 2 2 2 2 2 2
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Aug 6, 2017 at 4:10 AM, Andras Farkas via R-help
>  wrote:
> > Dear All,
> >
> > wonder if you have thoughts on the following:
> >
> > let us say we have:
> >
> >
> df<-data.frame(a=c(1,2,3,4,5,1,2,3,4,5,6,7,8),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
> >
> >
> >  I would like to rewrite values in column name "a" based on values in
> column name "b", where based on a certain value of column "b" the next
> value of column 'a' is prompted, in other words would like to have this as
> a result:
> >
> >
> df<-data.frame(a=c(1,1,1,1,1,2,2,2,2,2,2,2,2),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
> >
> >
> > where at the value of 0 in column 'b' the number in column a changes
> from 1 to 2. From the first zero value of column 'b' and until the next
> zero in column 'b' the numbers would not change in 'a', ie: they are all 1
> in my example... then from 2 it would change to 3 again as 'b' will have
> zero again in a row, and so on.. Would be grateful for a solution that
> would allow me to set the values (from 'b') that determine how the values
> get established in 'a' (ie: lets say instead of 0 I would want 3 being the
> value where 1 changes to 2 in 'a') and that would be flexible to take into
> account that the number of rows and the number of time 0 shows up in a row
> in column 'b' may vary...
> >
> > much appreciate your thoughts..
> >
> > Andras
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2017-08-06 Thread Bert Gunter
Your specification is a bit unclear to me, so I'm not sure the below
is really what you want. For example, your example seems to imply that
a and b must be of the same length, but I do not see that your
description requires this. So the following may not be what you want
exactly, but one way to do this(there may be cleverer ones!) is to
make use of ?rep. Everything else is just fussy detail. (Your example
suggests that you should also learn about ?seq. Both of these should
be covered in any good R tutorial, which you should probably spend
time with if you haven't already).

Anyway...

## WARNING: Not thoroughly tested! May (probably :-( ) contain bugs.

f <- function(x,y,switch_val =0)
{
   wh <- which(y == switch_val)
   len <- length(wh)
   len_x <- length(x)
   if(!len) x
   else if(wh[1] == 1){
  if(len ==1) return(rep(x[1],len_x))
  else {
 wh <- wh[-1]
 len <- len -1
  }
   }
   count <- c(wh[1]-1,diff(wh))
   if(wh[len] == len_x) count<- c(count,1)
   else count <- c(count, len_x - wh[len] +1)
   rep(x[seq_along(count)],times = count)
}

> a <- c(1:5,1:8)
> b <- c(0:4,0:7)
> f(a,b)
 [1] 1 1 1 1 1 2 2 2 2 2 2 2 2



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Aug 6, 2017 at 4:10 AM, Andras Farkas via R-help
 wrote:
> Dear All,
>
> wonder if you have thoughts on the following:
>
> let us say we have:
>
> df<-data.frame(a=c(1,2,3,4,5,1,2,3,4,5,6,7,8),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
>
>
>  I would like to rewrite values in column name "a" based on values in column 
> name "b", where based on a certain value of column "b" the next value of 
> column 'a' is prompted, in other words would like to have this as a result:
>
> df<-data.frame(a=c(1,1,1,1,1,2,2,2,2,2,2,2,2),b=c(0,1,2,3,4,0,1,2,3,4,5,6,7))
>
>
> where at the value of 0 in column 'b' the number in column a changes from 1 
> to 2. From the first zero value of column 'b' and until the next zero in 
> column 'b' the numbers would not change in 'a', ie: they are all 1 in my 
> example... then from 2 it would change to 3 again as 'b' will have zero again 
> in a row, and so on.. Would be grateful for a solution that would allow me to 
> set the values (from 'b') that determine how the values get established in 
> 'a' (ie: lets say instead of 0 I would want 3 being the value where 1 changes 
> to 2 in 'a') and that would be flexible to take into account that the number 
> of rows and the number of time 0 shows up in a row in column 'b' may vary...
>
> much appreciate your thoughts..
>
> Andras
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2013-12-09 Thread Sarah Goslee
Thank you for providing a reproducible example. I tweaked it a little
bit to make it actually a data frame problem.

There are lots of ways to do this; here's one approach.

On second thought, this looks a lot like homework, so perhaps instead
I'll just suggest using subset() with more than one condition.

Sarah

On Mon, Dec 9, 2013 at 3:27 PM, Andras Farkas motyoc...@yahoo.com wrote:
 Dear All

 please help with the following:

 I have:

 a -seq(0,10,by=1)
 b -c(10:20)
 d -cbind(a,b)
 f -16

 I would like to select the value in column a based on a value in column b, 
 where the value in column b is the 1st value that is smaller then f. Thus I 
 should end up with the number 5 because the 1st value that is below 16 would 
 be 15, and in the same row column a has the number 5

 appreciate your insights,

 andras

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2013-12-09 Thread Sarah Goslee
If it's not homework, then I'm happy to provide more help:


a -seq(0,10,by=1)
b -c(10:20)
d -data.frame(a=a,b=b)
f -16

subset(d, b  f  b == max(b[b  f]))$a

# I'd turn it into a function
getVal - function(d, f) {
subset(d, b  f  b == max(b[b  f]))$a
}


Sarah


On Mon, Dec 9, 2013 at 3:50 PM, Andras Farkas motyoc...@yahoo.com wrote:
 Sarah,

 thank you, not homework though, I guess it just looks like it I will
 look into subset()

 Andras


 On Monday, December 9, 2013 3:45 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
 Thank you for providing a reproducible example. I tweaked it a little
 bit to make it actually a data frame problem.

 There are lots of ways to do this; here's one approach.

 On second thought, this looks a lot like homework, so perhaps instead
 I'll just suggest using subset() with more than one condition.

 Sarah

 On Mon, Dec 9, 2013 at 3:27 PM, Andras Farkas motyoc...@yahoo.com wrote:
 Dear All

 please help with the following:

 I have:

 a -seq(0,10,by=1)
 b -c(10:20)
 d -cbind(a,b)
 f -16

 I would like to select the value in column a based on a value in column b,
 where the value in column b is the 1st value that is smaller then f. Thus I
 should end up with the number 5 because the 1st value that is below 16 would
 be 15, and in the same row column a has the number 5

 appreciate your insights,

 andras



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2013-12-09 Thread Toth, Denes

Hi Andras,

here is an other solution which also works if b contains missing values:

a -seq(0,10,by=1)
b -c(NA, 11:20)
f -16
#
a[which.max(b[bf])]
#

However, your question seems a bit artificial. Maybe you converted your
original question to a suboptimal problem.

HTH,
  Denes


 If it's not homework, then I'm happy to provide more help:


 a -seq(0,10,by=1)
 b -c(10:20)
 d -data.frame(a=a,b=b)
 f -16

 subset(d, b  f  b == max(b[b  f]))$a

 # I'd turn it into a function
 getVal - function(d, f) {
 subset(d, b  f  b == max(b[b  f]))$a
 }


 Sarah


 On Mon, Dec 9, 2013 at 3:50 PM, Andras Farkas motyoc...@yahoo.com wrote:
 Sarah,

 thank you, not homework though, I guess it just looks like it I will
 look into subset()

 Andras


 On Monday, December 9, 2013 3:45 PM, Sarah Goslee
 sarah.gos...@gmail.com
 wrote:
 Thank you for providing a reproducible example. I tweaked it a little
 bit to make it actually a data frame problem.

 There are lots of ways to do this; here's one approach.

 On second thought, this looks a lot like homework, so perhaps instead
 I'll just suggest using subset() with more than one condition.

 Sarah

 On Mon, Dec 9, 2013 at 3:27 PM, Andras Farkas motyoc...@yahoo.com
 wrote:
 Dear All

 please help with the following:

 I have:

 a -seq(0,10,by=1)
 b -c(10:20)
 d -cbind(a,b)
 f -16

 I would like to select the value in column a based on a value in column
 b,
 where the value in column b is the 1st value that is smaller then f.
 Thus I
 should end up with the number 5 because the 1st value that is below 16
 would
 be 15, and in the same row column a has the number 5

 appreciate your insights,

 andras



 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2013-04-01 Thread Sarah Goslee
That sounds like a job for merge().

If you provide an actual reproducible example using dput(), then you
will likely get some actual runnable code.

Sarah

On Mon, Apr 1, 2013 at 11:54 AM, ramoss ramine.mossad...@finra.org wrote:
 Hello,

 I have 2 data frames:  activity and dates.  Activity contains a l variable
 listing all activities:  activityA, activityB etc.
 The dates contain all the valid business dates.  I need to combine the 2 so
 that I get a single data frame activitydat that contains the activity name
 along w/ evevry valid business dates such as

 Name  dat
 activity A   2013-02-01
 activity A  2013-02-04
 activity A  2013-02-05
 etc


 Any thought?  Thanks ahead for your help.




--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2013-04-01 Thread arun
Hi,

Not sure if this is what you wanted:
activity- 
data.frame(Name=paste0(activity,LETTERS[1:5]),stringsAsFactors=FALSE)
dates1- 
data.frame(dat=as.Date(c(2013-02-01,2013-02-04,2013-02-05),format=%Y-%m-%d))
merge(dates1,activity)
#  dat  Name
#1  2013-02-01 activityA
#2  2013-02-04 activityA
#3  2013-02-05 activityA
#4  2013-02-01 activityB
#5  2013-02-04 activityB
#6  2013-02-05 activityB
#7  2013-02-01 activityC
#8  2013-02-04 activityC
#9  2013-02-05 activityC
#10 2013-02-01 activityD
#11 2013-02-04 activityD
#12 2013-02-05 activityD
#13 2013-02-01 activityE
#14 2013-02-04 activityE
#15 2013-02-05 activityE


#or
expand.grid(dat=dates1[,1],Name=activity[,1])
  dat  Name
#1  2013-02-01 activityA
#2  2013-02-04 activityA
#3  2013-02-05 activityA
#4  2013-02-01 activityB
#5  2013-02-04 activityB
#6  2013-02-05 activityB
#7  2013-02-01 activityC
#8  2013-02-04 activityC
#9  2013-02-05 activityC
#10 2013-02-01 activityD
#11 2013-02-04 activityD
#12 2013-02-05 activityD
#13 2013-02-01 activityE
#14 2013-02-04 activityE
#15 2013-02-05 activityE
A.K.



- Original Message -
From: ramoss ramine.mossad...@finra.org
To: r-help@r-project.org
Cc: 
Sent: Monday, April 1, 2013 11:54 AM
Subject: [R] Data frame question

Hello,

I have 2 data frames:  activity and dates.  Activity contains a l variable
listing all activities:  activityA, activityB etc.
The dates contain all the valid business dates.  I need to combine the 2 so
that I get a single data frame activitydat that contains the activity name
along w/ evevry valid business dates such as

Name          dat
activity A   2013-02-01
activity A  2013-02-04
activity A  2013-02-05
etc


Any thought?  Thanks ahead for your help.



--
View this message in context: 
http://r.789695.n4.nabble.com/Data-frame-question-tp4662967.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2010-03-12 Thread Claudia Beleites

Andy,

Did you run into any kind of trouble?
I'm asking because I'm maintaining a package for spectroscopic data that heavily 
uses I (spectra.matrix) ...


However, once you have the matrix safe inside the data.frame, you can delete the 
 AsIs:


 a - matrix (1:9, 3)
 str (a)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df - data.frame (a = I (a))
 str (df)
'data.frame':   3 obs. of  1 variable:
 $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df$a - unclass (df$a)
 str (df)
'data.frame':   3 obs. of  1 variable:
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df$a
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369
 dim (df)
[1] 3 1

However, I don't know whether something can now trigger a conversion to 
data.frame that the AsIs would have stopped.


Cheers,

Claudia

apjawor...@mmm.com wrote:

Hi,

I have the following question about creating data frames.  I want to 
create a data frame with 2 components: a vector and a matrix.


Let me use a simple example:

y - rnorm(10)
x - matrix(rnorm(150), nrow=10)

Now if I do

dd - data.frame(x=x, y=y)

I get a data frame with 16 colums, but if, according to the documentation, 
 I do


dd - data.frame(x=I(x), y=y)

then str(dd) gives:

'data.frame':   10 obs. of  2 variables:
 $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625 
0.977337 0.509786 ...

 $ y: num  0.4676 -1.4343 -0.3671 0.0637 -0.231 ...

This looks and works OK.

Now, there exists a CRAN package called pls.  It has a yarn data set in 
it.



data(yarn)
str(yarn)

'data.frame':   28 obs. of  3 variables:
 $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ density: num  100 80.2 79.5 60.8 60 ...
 $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

This looks almost the same, except the matrix component in my example has 
the AsIs instead of num.


Is this just some older behavior of the data.frame function producing this 
difference?  If not, how can I get my data frame (dd) to look like yarn?


I read the help pages for data.frame and as.data.frame and found this 
paragraph


If a list is supplied, each element is converted to a column in the data 
frame. Similarly, each column of a matrix is converted separately. This 
can be overridden if the object has a class which has a method for 
as.data.frame: two examples are matrices of class model.matrix (which 
are included as a single column) and list objects of class POSIXlt which 
are coerced to class POSIXct. 

If I do 


methods(as.data.frame)
 [1] as.data.frame.aovproj*as.data.frame.array 
 [3] as.data.frame.AsIsas.data.frame.character 
 [5] as.data.frame.complex as.data.frame.data.frame 
 [7] as.data.frame.Dateas.data.frame.default 
 [9] as.data.frame.difftimeas.data.frame.factor 
[11] as.data.frame.ftable* as.data.frame.integer 
[13] as.data.frame.listas.data.frame.logical 
[15] as.data.frame.logLik* as.data.frame.matrix 
[17] as.data.frame.model.matrixas.data.frame.numeric 
[19] as.data.frame.numeric_version as.data.frame.ordered 
[21] as.data.frame.POSIXct as.data.frame.POSIXlt 
[23] as.data.frame.raw as.data.frame.table 
[25] as.data.frame.ts  as.data.frame.vector 

so it looks like there is a matrix method for as.data.frame.  The question 
then is how can I override the default behavior for the matrix object 
(converting columns separately).



Any hint will be appreciated,

Andy


__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: apjawor...@mmm.com
Tel:  (651) 733-6092
Fax:  (651) 736-3122
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2010-03-12 Thread Claudia Beleites

apjawor...@mmm.com wrote:


Thanks for the quick reply.

No, I did not run into any problems so far.  I have been using the PLS 
package and the modelling functions seem to work just fine.


In fact, even if I let the data.frame convert the x matrix to separate 
column, the y ~ x modeling syntax still seems to work fine.



I don't see that behaviour:

rm (x)  # make sure there is no leftover x in the workspace
mat - matrix (1 : 9, 3)
df - data.frame (y = 1 : 3, x = mat)
str (df)
df
coef (plsr (y ~ x, data = df, ncomp = 1)) # error
coef (plsr (y ~ x.1 + x.2 + x.3, data = df, ncomp = 1)) # works

df$x - I (-mat)
str (df)
df
coef (plsr (y ~ x, data = df, ncomp = 1)) # works

Claudia

PS: May I be curious: what kind of data do you analyze with PLS?



Thanks again,

Andy

__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: apjawor...@mmm.com
Tel:  (651) 733-6092
Fax:  (651) 736-3122


From:   Claudia Beleites cbelei...@units.it
To: apjawor...@mmm.com
Cc: r-help@r-project.org
Date:   03/12/2010 02:13 PM
Subject:Re: [R] Data frame question





Andy,

Did you run into any kind of trouble?
I'm asking because I'm maintaining a package for spectroscopic data that 
heavily

uses I (spectra.matrix) ...

However, once you have the matrix safe inside the data.frame, you can 
delete the

 AsIs:

  a - matrix (1:9, 3)
  str (a)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df - data.frame (a = I (a))
  str (df)
'data.frame': 3 obs. of  1 variable:
 $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df$a - unclass (df$a)
  str (df)
'data.frame': 3 obs. of  1 variable:
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df$a
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369
  dim (df)
[1] 3 1

However, I don't know whether something can now trigger a conversion to
data.frame that the AsIs would have stopped.

Cheers,

Claudia

apjawor...@mmm.com wrote:
  Hi,
 
  I have the following question about creating data frames.  I want to
  create a data frame with 2 components: a vector and a matrix.
 
  Let me use a simple example:
 
  y - rnorm(10)
  x - matrix(rnorm(150), nrow=10)
 
  Now if I do
 
  dd - data.frame(x=x, y=y)
 
  I get a data frame with 16 colums, but if, according to the 
documentation,

   I do
 
  dd - data.frame(x=I(x), y=y)
 
  then str(dd) gives:
 
  'data.frame':   10 obs. of  2 variables:
   $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625
  0.977337 0.509786 ...
   $ y: num  0.4676 -1.4343 -0.3671 0.0637 -0.231 ...
 
  This looks and works OK.
 
  Now, there exists a CRAN package called pls.  It has a yarn data set in
  it.
 
  data(yarn)
  str(yarn)
  'data.frame':   28 obs. of  3 variables:
   $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ...
..- attr(*, dimnames)=List of 2
.. ..$ : NULL
.. ..$ : NULL
   $ density: num  100 80.2 79.5 60.8 60 ...
   $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 
  This looks almost the same, except the matrix component in my example 
has

  the AsIs instead of num.
 
  Is this just some older behavior of the data.frame function producing 
this

  difference?  If not, how can I get my data frame (dd) to look like yarn?
 
  I read the help pages for data.frame and as.data.frame and found this
  paragraph
 
  If a list is supplied, each element is converted to a column in the data
  frame. Similarly, each column of a matrix is converted separately. This
  can be overridden if the object has a class which has a method for
  as.data.frame: two examples are matrices of class model.matrix (which
  are included as a single column) and list objects of class POSIXlt 
which

  are coerced to class POSIXct.
 
  If I do
 
  methods(as.data.frame)
   [1] as.data.frame.aovproj*as.data.frame.array
   [3] as.data.frame.AsIsas.data.frame.character
   [5] as.data.frame.complex as.data.frame.data.frame
   [7] as.data.frame.Dateas.data.frame.default
   [9] as.data.frame.difftimeas.data.frame.factor
  [11] as.data.frame.ftable* as.data.frame.integer
  [13] as.data.frame.listas.data.frame.logical
  [15] as.data.frame.logLik* as.data.frame.matrix
  [17] as.data.frame.model.matrixas.data.frame.numeric
  [19] as.data.frame.numeric_version as.data.frame.ordered
  [21] as.data.frame.POSIXct as.data.frame.POSIXlt
  [23] as.data.frame.raw as.data.frame.table
  [25] as.data.frame.ts  as.data.frame.vector
 
  so it looks like there is a matrix method for as.data.frame.  The 
question

  then is how can I override the default behavior for the matrix object
  (converting columns separately).
 
 
  Any hint will be appreciated,
 
  Andy
 
 
  __
  Andy Jaworski
  518-1-01
  Process Laboratory
  3M Corporate Research Laboratory
  -
  E-mail: apjawor...@mmm.com
  Tel:  (651) 733-6092

Re: [R] data frame question

2008-02-14 Thread John Kane
Create the new data.frame and do the muliplying on it?

df2 - df1
df2[,1] - df2[,1]*2

--- joseph [EMAIL PROTECTED] wrote:

 
 
 Hi
 
 I have a data frame df1 in which I would like to
 multiply col1
 by 2.
 
 
 The way I did it does not allow me to keep the old
 data
 frame.
 
 
 How can I do this and be able to create a new data
 frame
 df2?
 
 
  df1= data.frame(col1= c(3, 5, NA, 1), col2= c(4,
 NA,6,
 2))
 
 
  df1
 
 
   col1 col2
 
 
 134
 
 
 25   NA
 
 
 3   NA6
 
 
 412
 
 
  df1$col1=df1$col1*2
 
 
  df1
 
 
   col1 col2
 
 
 164
 
 
 2   10   NA
 
 
 3   NA6
 
 
 422
 
 
 
 
 
  


 Be a better friend, newshound, and 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2008-02-14 Thread joseph
Thanks. I have another question:
In the following data frame df, I want to replace all values in col1 that are 
higher than 3 with NA.
df= data.frame(col1=c(1:5, NA),col2= c(2,NA,4:7))

- Original Message 
From: John Kane [EMAIL PROTECTED]
To: joseph [EMAIL PROTECTED]; r-help@r-project.org
Cc: r-help@r-project.org
Sent: Thursday, February 14, 2008 3:09:40 PM
Subject: Re: [R] data frame question


Create 
the 
new 
data.frame 
and 
do 
the 
muliplying 
on 
it?

df2 
- 
df1
df2[,1] 
- 
df2[,1]*2

--- 
joseph 
[EMAIL PROTECTED] 
wrote:

 
 
 
Hi
 
 
I 
have 
a 
data 
frame 
df1 
in 
which 
I 
would 
like 
to
 
multiply 
col1
 
by 
2.
 
 
 
The 
way 
I 
did 
it 
does 
not 
allow 
me 
to 
keep 
the 
old
 
data
 
frame.
 
 
 
How 
can 
I 
do 
this 
and 
be 
able 
to 
create 
a 
new 
data
 
frame
 
df2?
 
 
 
 
df1= 
data.frame(col1= 
c(3, 
5, 
NA, 
1), 
col2= 
c(4,
 
NA,6,
 
2))
 
 
 
 
df1
 
 
  
 
col1 
col2
 
 
 
1  
  
3  
  
4
 
 
 
2  
  
5  
 
NA
 
 
 
3  
 
NA  
  
6
 
 
 
4  
  
1  
  
2
 
 
 
 
df1$col1=df1$col1*2
 
 
 
 
df1
 
 
  
 
col1 
col2
 
 
 
1  
  
6  
  
4
 
 
 
2  
 
10  
 
NA
 
 
 
3  
 
NA  
  
6
 
 
 
4  
  
2  
  
2
 
 
 
 
 
  
  
  


 
Be 
a 
better 
friend, 
newshound, 
and 
 
 
 

[[alternative 
HTML 
version 
deleted]]
 
 
__
 
R-help@r-project.org 
mailing 
list
 
https://stat.ethz.ch/mailman/listinfo/r-help
 
PLEASE 
do 
read 
the 
posting 
guide
 
http://www.R-project.org/posting-guide.html
 
and 
provide 
commented, 
minimal, 
self-contained,
 
reproducible 
code.
 



  
  
  
Connect 
with 
friends 
from 
any 
web 
browser 
- 
no 
download 
required. 
Try 
the 
new 

Canada 
Messenger 
for 
the 
Web 
BETA 
at 







  

Looking for last minute shopping deals?  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2008-02-14 Thread K. Elo
Hi,

joseph wrote (15.2.2008):
 Thanks. I have another question:
 In the following data frame df, I want to replace all values in col1
 that are higher than 3 with NA. df= data.frame(col1=c(1:5, NA),col2=
 c(2,NA,4:7))

My suggestion:

x-df$col1; x[ x3 ]-NA; df$col1-x; rm(x)

-Kimmo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2008-02-14 Thread Bill.Venables
 
... or in one step

df - transform(df, 
col1 = ifelse(col1  3, NA, col1))

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of K. Elo
Sent: Friday, 15 February 2008 4:29 PM
To: r-help@r-project.org
Subject: Re: [R] data frame question

Hi,

joseph wrote (15.2.2008):
 Thanks. I have another question:
 In the following data frame df, I want to replace all values in col1
 that are higher than 3 with NA. df= data.frame(col1=c(1:5, NA),col2=
 c(2,NA,4:7))

My suggestion:

x-df$col1; x[ x3 ]-NA; df$col1-x; rm(x)

-Kimmo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2008-02-10 Thread Mark Wardle
On 10/02/2008, joseph [EMAIL PROTECTED] wrote:
 Hello
 I have 2 data frames df1 and df2. I would like to create a
 new data frame new_df which will contain only the common rows based on the 
 first 2
 columns (chrN and start). The column score in the new data frame
 should
 be replaced with a column containing the average score (average_score) from 
 df1
 and df2.


Try this:   (avoiding underscores)

new.df - merge(df1, df2, by=c('chrN','start'))
new.df$average.score - apply(df3[,c('score.x','score.y')], 1, mean, na.rm=T)

As always, interested to see whether it can be done in one line...

-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame question

2008-02-10 Thread David Winsemius
joseph [EMAIL PROTECTED] wrote in
news:[EMAIL PROTECTED]: 

 I have 2 data frames df1 and df2. I would like to create a
 new data frame new_df which will contain only the common rows based
 on the first 2 columns (chrN and start). The column score in the new
 data frame should
 be replaced with a column containing the average score
 (average_score) from df1 and df2. 
 

 df1= data.frame(chrN= c(chr1, chr1, chr1, chr1, chr2,
 chr2, chr2), 
 start= c(23, 82, 95, 108, 95, 108, 121),
 end= c(33, 92, 105, 118, 105, 118, 131),
 score= c(3, 6, 2, 4, 9, 2, 7))
 
 df2= data.frame(chrN= c(chr1, chr2, chr2, chr2 , chr2),
 start= c(23, 50, 95, 20, 121),
 end= c(33, 60, 105, 30, 131),
 score= c(9, 3, 7, 7, 3))

Clunky to be sure, but this should worked for me:

df3 - merge(df1,df2,by=c(chrN,start)
#non-match variables get auto-relabeled

df3$avg.scr - with(df3, (score.x+score.y)/2) # or mean( )
df3 - df3[,c(chrN,start,avg.scr)]
#drops the variables not of interest

df3
  chrN start avg.scr
1 chr123   6
2 chr2   121   5
3 chr295   8

-- 
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.