subject:"\[R\] subset"

Re: [R] $ subset operator behavior in lapply

2022-10-28 Thread Hilmar Berger

Hi Andrew,

thanks a lot, that fully explains it.

Sorry for the HTML text. For the record I put the original code again
below.

Best regards

Hilmar


On 27.10.22 18:34, Andrew Simmons wrote:
> $ does not evaluate its second argument, it does something like
> as.character(substitute(name)).
>
> You should be using
>
> lapply(list, function(x) x$a)
>
> or
>
> lapply(list, `[[`, "a")
>
>
> On Thu, Oct 27, 2022, 12:29 Hilmar Berger  wrote:
>
> Dear all,
>
> I'm a little bit surprised by the behavior of the $ operator when used
> in lapply - any indication what might be wrong is appreciated.
>

 > xx = list(A=list(a=1:3, b=LETTERS[1:3]),"B"=list(a=7:9, b=LETTERS[7:9]))
 > lapply(xx,`$`,"a")
$A
NULL

$B
NULL

 > `$`(xx[[1]],"a")
[1] 1 2 3
 > lapply(xx,`[`,"a")
$A
$A$a
[1] 1 2 3


$B
$B$a
[1] 7 8 9


> Any idea why I
> `$`(object, name) works when applied to the single list element
> but not
> within lapply (in contrast to `[`)?
> I checked the help page of the extraction operators but could not find
> anything that explains this. Thanks and best regards Hilmar >
> sessionInfo() R version 4.2.1 (2022-06-23) Platform:
> x86_64-pc-linux-gnu
> (64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK:
> /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1]
> LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8
> LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_DE.UTF-8
> LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9]
> LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8
> LC_IDENTIFICATION=C attached base packages: [1] stats graphics
> grDevices
> utils datasets methods base loaded via a namespace (and not attached):
> [1] compiler_4.2.1
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] $ subset operator behavior in lapply

2022-10-27 Thread Jeff Newmiller

Your message is garbled. Please send plain text to the mailing list.

On October 27, 2022 2:31:47 AM PDT, Hilmar Berger  wrote:
>Dear all,
>
>I'm a little bit surprised by the behavior of the $ operator when used
>in lapply - any indication what might be wrong is appreciated.
>
>> xx = list(A=list(a=1:3, b=LETTERS[1:3]),"B"=list(a=7:9, b=LETTERS[7:9]))  > 
>> lapply(xx,`$`,"a") $A NULL $B NULL > `$`(xx[[1]],"a") [1] 1 2 3 >
>lapply(xx,`[`,"a") $A $A$a [1] 1 2 3 $B $B$a [1] 7 8 9 Any idea why I
>`$`(object, name) works when applied to the single list element but not
>within lapply (in contrast to `[`)?
>I checked the help page of the extraction operators but could not find
>anything that explains this. Thanks and best regards Hilmar >
>sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu
>(64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default
>BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK:
>/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1]
>LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8
>LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_DE.UTF-8
>LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9]
>LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8
>LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices
>utils datasets methods base loaded via a namespace (and not attached):
>[1] compiler_4.2.1
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] $ subset operator behavior in lapply

2022-10-27 Thread Andrew Simmons

$ does not evaluate its second argument, it does something like
as.character(substitute(name)).

You should be using

lapply(list, function(x) x$a)

or

lapply(list, `[[`, "a")


On Thu, Oct 27, 2022, 12:29 Hilmar Berger  wrote:

> Dear all,
>
> I'm a little bit surprised by the behavior of the $ operator when used
> in lapply - any indication what might be wrong is appreciated.
>
> > xx = list(A=list(a=1:3, b=LETTERS[1:3]),"B"=list(a=7:9,
> b=LETTERS[7:9]))  > lapply(xx,`$`,"a") $A NULL $B NULL > `$`(xx[[1]],"a")
> [1] 1 2 3 >
> lapply(xx,`[`,"a") $A $A$a [1] 1 2 3 $B $B$a [1] 7 8 9 Any idea why I
> `$`(object, name) works when applied to the single list element but not
> within lapply (in contrast to `[`)?
> I checked the help page of the extraction operators but could not find
> anything that explains this. Thanks and best regards Hilmar >
> sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu
> (64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK:
> /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1]
> LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8
> LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_DE.UTF-8
> LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9]
> LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8
> LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices
> utils datasets methods base loaded via a namespace (and not attached):
> [1] compiler_4.2.1
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] $ subset operator behavior in lapply

2022-10-27 Thread Hilmar Berger

Dear all,

I'm a little bit surprised by the behavior of the $ operator when used
in lapply - any indication what might be wrong is appreciated.

> xx = list(A=list(a=1:3, b=LETTERS[1:3]),"B"=list(a=7:9, b=LETTERS[7:9]))  > 
> lapply(xx,`$`,"a") $A NULL $B NULL > `$`(xx[[1]],"a") [1] 1 2 3 >
lapply(xx,`[`,"a") $A $A$a [1] 1 2 3 $B $B$a [1] 7 8 9 Any idea why I
`$`(object, name) works when applied to the single list element but not
within lapply (in contrast to `[`)?
I checked the help page of the extraction operators but could not find
anything that explains this. Thanks and best regards Hilmar >
sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu
(64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK:
/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1]
LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8
LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9]
LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices
utils datasets methods base loaded via a namespace (and not attached):
[1] compiler_4.2.1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data frame problem

2021-12-13 Thread Kai Yang via R-help

 Thans Richard, it works well. --- Kai
On Monday, December 13, 2021, 04:00:33 AM PST, Richard O'Keefe 
 wrote:  
 
 You want to DELETE rows satisfying the condition P & Q.The subset() function 
requires an expression saying whatyou want to RETAIN, so you need subset(PD, 
!(P & Q)).
test <- subset(PD, !(Class == "1st" & Survived == "No"))
By de Morgan's laws, !(P & Q) is the same as (!P) | (!Q)so you could also write
test <- subset(PD, Class != "1st" | Survived != "No")
I'd actually be tempted to do this in two steps:
unwanted <- PD$Class == "1st" & PD$Survived == "No"test <- PD[!unwanted,]



On Mon, 13 Dec 2021 at 17:30, Kai Yang via R-help  wrote:

Hi R team,I want to delete records from a data frame if Class = '1st' and 
Survived = 'No'. I wrote the code below, test <- subset(PD, Class != '1st' && 
Survived != 'No')
but the code return a wrong result. Can someone help me for this? 
Thanks,Kai
        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data frame problem

2021-12-13 Thread Richard O'Keefe

You want to DELETE rows satisfying the condition P & Q.
The subset() function requires an expression saying what
you want to RETAIN, so you need subset(PD, !(P & Q)).

test <- subset(PD, !(Class == "1st" & Survived == "No"))

By de Morgan's laws, !(P & Q) is the same as (!P) | (!Q)
so you could also write

test <- subset(PD, Class != "1st" | Survived != "No")

I'd actually be tempted to do this in two steps:

unwanted <- PD$Class == "1st" & PD$Survived == "No"
test <- PD[!unwanted,]

On Mon, 13 Dec 2021 at 17:30, Kai Yang via R-help 
wrote:

> Hi R team,I want to delete records from a data frame if Class = '1st' and
> Survived = 'No'. I wrote the code below, test <- subset(PD, Class != '1st'
> && Survived != 'No')
> but the code return a wrong result. Can someone help me for this?
> Thanks,Kai
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data frame problem

2021-12-12 Thread Jeff Newmiller

Use one ampersand, not two.

And post plain text.

On December 12, 2021 8:30:11 PM PST, Kai Yang via R-help  
wrote:
>Hi R team,I want to delete records from a data frame if Class = '1st' and 
>Survived = 'No'. I wrote the code below, test <- subset(PD, Class != '1st' && 
>Survived != 'No')
>but the code return a wrong result. Can someone help me for this? 
>Thanks,Kai
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset data frame problem

2021-12-12 Thread Kai Yang via R-help

Hi R team,I want to delete records from a data frame if Class = '1st' and 
Survived = 'No'. I wrote the code below, test <- subset(PD, Class != '1st' && 
Survived != 'No')
but the code return a wrong result. Can someone help me for this? 
Thanks,Kai
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset command

2021-10-15 Thread Bert Gunter

I assume that prim, etc. are columns of your data frame, mydata. Ergo, the
error message "prim not found" as 'prim' etc. does not exist in the Global
environment.

exclude <- with(mydata, prim == -9, etc. ) should get what you want to
evaluate your second subset statement if I have understood correctly, as it
will look for those names within mydata.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Oct 15, 2021 at 6:24 PM Steven Yen  wrote:

> The following "subset command works. I was hoping the second would as
> well but it does not.
>
> My definition of exclude is rejected.
>
> Help please? Thanks.
>
>  > mydata<-subset(mydata,
> +prim>-9 & highsch>-9  & tert>-9 &
> +govt>-9 & nongovt>-9  &
> +married>-9  & urban>-9&
> +smhmyes>-9  & smhmno>-9   & smhmnoru>-9 &
> +workouts>-9 & seconhan>-9 & reliyes>-9)
>
>  > exclude<-  prim==-9 | highsch==-9  | tert==-9 |
> +govt==-9 | nongovt==-9  |
> +married==-9  | urban==-9|
> +smhmyes==-9  | smhmno==-9   | smhmnoru==-9 |
> +workouts==-9 | seconhan==-9 | reliyes==-9
> Error: object 'prim' not found
>  > mydata<-subset(mydata,-exclude)
> Error in eval(e, x, parent.frame()) : object 'exclude' not found
>  >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset command

2021-10-15 Thread Steven Yen

Thanks. YES the second call to subset is there, trying to use my failed 
definition of "exclude". Read on..


On 2021/10/16 上午 09:35, Jeff Newmiller wrote:

I don't see a "second one". Looks like you forgot the subset function call?

On October 15, 2021 6:23:56 PM PDT, Steven Yen  wrote:

The following "subset command works. I was hoping the second would as
well but it does not.

My definition of exclude is rejected.

Help please? Thanks.


mydata<-subset(mydata,

+    prim>-9 & highsch>-9  & tert>-9 &
+    govt>-9 & nongovt>-9  &
+    married>-9  & urban>-9    &
+    smhmyes>-9  & smhmno>-9   & smhmnoru>-9 &
+    workouts>-9 & seconhan>-9 & reliyes>-9)


exclude<-  prim==-9 | highsch==-9  | tert==-9 |

+    govt==-9 | nongovt==-9  |
+    married==-9  | urban==-9    |
+    smhmyes==-9  | smhmno==-9   | smhmnoru==-9 |
+    workouts==-9 | seconhan==-9 | reliyes==-9
Error: object 'prim' not found

mydata<-subset(mydata,-exclude)

Error in eval(e, x, parent.frame()) : object 'exclude' not found
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset command

2021-10-15 Thread Jeff Newmiller

I don't see a "second one". Looks like you forgot the subset function call?

On October 15, 2021 6:23:56 PM PDT, Steven Yen  wrote:
>The following "subset command works. I was hoping the second would as 
>well but it does not.
>
>My definition of exclude is rejected.
>
>Help please? Thanks.
>
> > mydata<-subset(mydata,
>+    prim>-9 & highsch>-9  & tert>-9 &
>+    govt>-9 & nongovt>-9  &
>+    married>-9  & urban>-9    &
>+    smhmyes>-9  & smhmno>-9   & smhmnoru>-9 &
>+    workouts>-9 & seconhan>-9 & reliyes>-9)
>
> > exclude<-  prim==-9 | highsch==-9  | tert==-9 |
>+    govt==-9 | nongovt==-9  |
>+    married==-9  | urban==-9    |
>+    smhmyes==-9  | smhmno==-9   | smhmnoru==-9 |
>+    workouts==-9 | seconhan==-9 | reliyes==-9
>Error: object 'prim' not found
> > mydata<-subset(mydata,-exclude)
>Error in eval(e, x, parent.frame()) : object 'exclude' not found
> >
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset command

2021-10-15 Thread Steven Yen

The following "subset command works. I was hoping the second would as 
well but it does not.


My definition of exclude is rejected.

Help please? Thanks.

> mydata<-subset(mydata,
+    prim>-9 & highsch>-9  & tert>-9 &
+    govt>-9 & nongovt>-9  &
+    married>-9  & urban>-9    &
+    smhmyes>-9  & smhmno>-9   & smhmnoru>-9 &
+    workouts>-9 & seconhan>-9 & reliyes>-9)

> exclude<-  prim==-9 | highsch==-9  | tert==-9 |
+    govt==-9 | nongovt==-9  |
+    married==-9  | urban==-9    |
+    smhmyes==-9  | smhmno==-9   | smhmnoru==-9 |
+    workouts==-9 | seconhan==-9 | reliyes==-9
Error: object 'prim' not found
> mydata<-subset(mydata,-exclude)
Error in eval(e, x, parent.frame()) : object 'exclude' not found
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Subset by Factor levels

2020-07-30 Thread Engin Yılmaz

I solve this as follows

m2 <- subset(m1,`Classification Description`=="Borrowing from the Public" |
`Classification Description`=="By Other Means"  | `Classification
Description`=="Total Surplus (+) or Deficit (-)")

sincerely
Engin YILMAZ

Engin Yılmaz , 29 Tem 2020 Çar, 16:57 tarihinde şunu
yazdı:

> Dear
>
> I try to create a new subset from my dataframe.
> My dataframe's name is m1.
> "Classification Description" column has 15 different factors.
> The following code is used creating a subset for 1 factor.
> m2<-m1[m1$`Classification Description` == levels(m1$`Classification
> Description`)[1],]
>
> My aim is to create a subset with 4 different factors. For example,
> levels(m1$`Classification Description`)[1]
> levels(m1$`Classification Description`)[15]
> levels(m1$`Classification Description`)[2]
> levels(m1$`Classification Description`)[4]
>
> I try to following code but it didnt work
>
> m2<-m1[m1$`Classification Description` == levels(m1$`Classification
> Description`)[c(1,15,2,4],]
>
> How can I solve This Problem ?
>
> Example from my dataframe
>
> `Record Date` `Classification Description` `Current Month
> Budget Amount`
>
> 
>  1 2019-06-30Total On-Budget and Off-Budget Results:
>   NA
>  2 2019-06-30Off-Budget Surplus (+) or Deficit (-)
>  41998597035.
>  3 2019-06-30Total Outlays
> 342428650968.
>  4 2019-06-30By Other Means
> 51648504883.
>  5 2019-06-30On-Budget Outlays
> 292169836521.
>  6 2019-06-30Off-Budget Outlays
> 50258814447.
>  7 2019-06-30Total Receipts
>  333952332514.
>  8 2019-06-30On-Budget Surplus (+) or Deficit (-)
>  -50474915489.
>  9 2019-06-30Off-Budget Receipts
>  92257411482
> 10 2019-06-30Total On-Budget and Off-Budget Financing
>  8476318454
>
>
>
> --
> *Saygılarımla*
> Engin YILMAZ
>


-- 
*Saygılarımla*
Engin YILMAZ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Subset by Factor levels

2020-07-29 Thread Rui Barradas


Hello,

Try %in% instead of == in:

m2<-m1[m1$`Classification Description` == levels(m1$`Classification 
Description`)[c(1,15,2,4],]


Hope this helps,

Rui Barradas



Às 14:57 de 29/07/2020, Engin Yılmaz escreveu:

Dear

I try to create a new subset from my dataframe.
My dataframe's name is m1.
"Classification Description" column has 15 different factors.
The following code is used creating a subset for 1 factor.
m2<-m1[m1$`Classification Description` == levels(m1$`Classification
Description`)[1],]

My aim is to create a subset with 4 different factors. For example,
levels(m1$`Classification Description`)[1]
levels(m1$`Classification Description`)[15]
levels(m1$`Classification Description`)[2]
levels(m1$`Classification Description`)[4]

I try to following code but it didnt work

m2<-m1[m1$`Classification Description` == levels(m1$`Classification
Description`)[c(1,15,2,4],]

How can I solve This Problem ?

Example from my dataframe

`Record Date` `Classification Description` `Current Month
Budget Amount`

   
  1 2019-06-30Total On-Budget and Off-Budget Results:
 NA
  2 2019-06-30Off-Budget Surplus (+) or Deficit (-)
  41998597035.
  3 2019-06-30Total Outlays
342428650968.
  4 2019-06-30By Other Means
51648504883.
  5 2019-06-30On-Budget Outlays
292169836521.
  6 2019-06-30Off-Budget Outlays
50258814447.
  7 2019-06-30Total Receipts
  333952332514.
  8 2019-06-30On-Budget Surplus (+) or Deficit (-)
  -50474915489.
  9 2019-06-30Off-Budget Receipts
  92257411482
10 2019-06-30Total On-Budget and Off-Budget Financing
  8476318454






--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Subset by Factor levels

2020-07-29 Thread Rasmus Liland

Dear Engin,

On 2020-07-29 16:57 +0300, Engin Yılmaz wrote:
> Dear
> 
> I try to create a new subset from my dataframe.
> My dataframe's name is m1.
> "Classification Description" column has 15 different factors.
> The following code is used creating a subset for 1 factor.
> m2<-m1[m1$`Classification Description` == levels(m1$`Classification
> Description`)[1],]
> 
> My aim is to create a subset with 4 different factors. For example,
> levels(m1$`Classification Description`)[1]
> levels(m1$`Classification Description`)[15]
> levels(m1$`Classification Description`)[2]
> levels(m1$`Classification Description`)[4]
> 
> I try to following code but it didnt work
> 
> m2<-m1[m1$`Classification Description` == levels(m1$`Classification
> Description`)[c(1,15,2,4],]

You're almost correct, you just need to 
use match instead of ==:

m1[m1$`Classification Description` %in%
   levels(m1$`Classification Description`)[c(1, 15, 2, 4)],]

Read more about it at ?match (?`%in%`).

Best,
Rasmus

signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Subset by Factor levels

2020-07-29 Thread Engin Yılmaz

Dear

I try to create a new subset from my dataframe.
My dataframe's name is m1.
"Classification Description" column has 15 different factors.
The following code is used creating a subset for 1 factor.
m2<-m1[m1$`Classification Description` == levels(m1$`Classification
Description`)[1],]

My aim is to create a subset with 4 different factors. For example,
levels(m1$`Classification Description`)[1]
levels(m1$`Classification Description`)[15]
levels(m1$`Classification Description`)[2]
levels(m1$`Classification Description`)[4]

I try to following code but it didnt work

m2<-m1[m1$`Classification Description` == levels(m1$`Classification
Description`)[c(1,15,2,4],]

How can I solve This Problem ?

Example from my dataframe

`Record Date` `Classification Description` `Current Month
Budget Amount`
   
  
 1 2019-06-30Total On-Budget and Off-Budget Results:
NA
 2 2019-06-30Off-Budget Surplus (+) or Deficit (-)
 41998597035.
 3 2019-06-30Total Outlays
342428650968.
 4 2019-06-30By Other Means
51648504883.
 5 2019-06-30On-Budget Outlays
292169836521.
 6 2019-06-30Off-Budget Outlays
50258814447.
 7 2019-06-30Total Receipts
 333952332514.
 8 2019-06-30On-Budget Surplus (+) or Deficit (-)
 -50474915489.
 9 2019-06-30Off-Budget Receipts
 92257411482
10 2019-06-30Total On-Budget and Off-Budget Financing
 8476318454



-- 
*Saygılarımla*
Engin YILMAZ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset a data frame with specific date

2020-01-14 Thread PIKAL Petr

Hi Bert

I sometimes use indexing with  "which" too, depends on desired result,
especially with data frames.

x <- 1:10
x[5:6] <- NA
> xd <- data.frame(x, y=rnorm(10))

> xd[xd$x>3,]
  x  y
4 4 -1.5086790
NA   NA NA
NA.1 NA NA
7 7 -0.2302614
8 8 -0.1660547
9 9  1.3197811
10   10 -0.3234029
> xd[which(xd$x>3),]
x  y
4   4 -1.5086790
7   7 -0.2302614
8   8 -0.1660547
9   9  1.3197811
10 10 -0.3234029

The variant without which retains NA values, which may be sometimes
undesirable.

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Bert Gunter
> Sent: Tuesday, January 14, 2020 8:10 AM
> To: ani jaya 
> Cc: r-help 
> Subject: Re: [R] Subset a data frame with specific date
> 
> That's fine, but do note that the which() function is wholly unnecessary
in
> your last line as R allows logical indexing. Perhaps another topic you
need to
> study.
> 
> -- Bert
> 
> 
> 
> On Mon, Jan 13, 2020 at 10:56 PM ani jaya  wrote:
> 
> > Dear Jeff and Bert,
> >
> > Thank you very much for your correction and explanation.
> > And yes, I need to study about date format more.
> > Sorry for HTML mail, don't realize.
> >
> > I was able to subset the data that I want.
> >
> > mjo30<-read.table("rmm.txt", header=FALSE, skip=4234, nrows=10957)
> > mjo30$V8<-NULL names(mjo30)<-c("year","month","day",
> > "rmm1","rmm2","phase","amp")
> > mjo3<-as.Date(with(mjo30,paste(year,month, day, sep="-")),"%Y-%m-%d")
> > mjo<-mjo30[which(mjo3%in%date),]
> >
> > head(mjo)
> >  year month day  rmm1  rmm2 phase  amp
> > 115  1986 4  25 -0.319090 -0.363030 2 0.483332
> > 526  1987 6  10  1.662870  0.291632 5 1.688250
> > 977  1988 9   3 -0.604950 -0.299850 1 0.675181
> > 1374 198910   5  0.972298 -0.461030 4 1.076060
> > 1760 199010  26 -1.183110 -1.589810 2 1.981730
> > 1953 1991 5   7 -0.317180  0.953061 7 1.004450
> >
> >
> > Best,
> > Ani
> >
> >
> > On Tue, Jan 14, 2020 at 3:20 PM Jeff Newmiller
> > 
> > wrote:
> > >
> > > The dput function is for re-creating an R object in another R
> > > workspace,
> > so it uses fundamental base types to define objects. A Date is really
> > the number of days since a specific date (typically 1970-01-01) that
> > get converted to look like dates whenever you display or print them,
> > so what you are seiing are those numbers. If we enter the R code
> > returned by dput into our R session we will be able to see the dates.
> > >
> > > Your mjo30 table seems to call the day of the month the "date"...
> > > which
> > is confusing. I would combine those three columns into one like
> > >
> > > mjo30$Dt <- as.Date( ISOdate( mjo30$year, mjo30$month, mjo30$date )
> > > )
> > >
> > > You could then use indexing
> > >
> > > mjo30[ date[1] == mjo30$Dt, ]
> > >
> > > or
> > >
> > > mjo30[ mjo30$Dt %in% date, ]
> > >
> > > but the subset function would not work in this case because you have
> > > two
> > different objects (a column in mjo30 and a vector in your global
> > environment) both referred to as 'date'.
> > >
> > > On January 13, 2020 8:53:38 PM PST, ani jaya 
> wrote:
> > > >Good morning R-Help,
> > > >
> > > >I have a dataframe with 7 columns and 1+ rows. I want to
> > > >subset/extract those data frame with specific date (not in order).
> > > >Here the head of my data frame:
> > > >
> > > >head(mjo30)  year month date  rmm1 rmm2 phase amp
> > > >1 1986 11 -0.326480 -1.55895 2 1.59277
> > > >2 1986 12 -0.417700 -1.82689 2 1.87403
> > > >3 1986 13  0.032915 -2.40150 3 2.40172
> > > >4 1986 14  0.492743 -2.49216 3 2.54041
> > > >5 1986 15  0.585106 -2.76866 3 2.82981
> > > >6 1986 16  0.665013 -3.13883 3 3.20851
> > > >
> > > >and here my specific date:
> > > >> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05"
> > > >"1990-10-26" "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
> > > >[10] "1

Re: [R] Subset a data frame with specific date

2020-01-13 Thread ani jaya

Thank you Bert.
And yes another topic to study.

On Tue, Jan 14, 2020 at 4:10 PM Bert Gunter  wrote:
>
> That's fine, but do note that the which() function is wholly unnecessary in 
> your last line as R allows logical indexing. Perhaps another topic you need 
> to study.
>
> -- Bert
>
>
>
> On Mon, Jan 13, 2020 at 10:56 PM ani jaya  wrote:
>>
>> Dear Jeff and Bert,
>>
>> Thank you very much for your correction and explanation.
>> And yes, I need to study about date format more.
>> Sorry for HTML mail, don't realize.
>>
>> I was able to subset the data that I want.
>>
>> mjo30<-read.table("rmm.txt", header=FALSE, skip=4234, nrows=10957)
>> mjo30$V8<-NULL
>> names(mjo30)<-c("year","month","day", "rmm1","rmm2","phase","amp")
>> mjo3<-as.Date(with(mjo30,paste(year,month, day, sep="-")),"%Y-%m-%d")
>> mjo<-mjo30[which(mjo3%in%date),]
>>
>> head(mjo)
>>  year month day  rmm1  rmm2 phase  amp
>> 115  1986 4  25 -0.319090 -0.363030 2 0.483332
>> 526  1987 6  10  1.662870  0.291632 5 1.688250
>> 977  1988 9   3 -0.604950 -0.299850 1 0.675181
>> 1374 198910   5  0.972298 -0.461030 4 1.076060
>> 1760 199010  26 -1.183110 -1.589810 2 1.981730
>> 1953 1991 5   7 -0.317180  0.953061 7 1.004450
>>
>>
>> Best,
>> Ani
>>
>>
>> On Tue, Jan 14, 2020 at 3:20 PM Jeff Newmiller  
>> wrote:
>> >
>> > The dput function is for re-creating an R object in another R workspace, 
>> > so it uses fundamental base types to define objects. A Date is really the 
>> > number of days since a specific date (typically 1970-01-01) that get 
>> > converted to look like dates whenever you display or print them, so what 
>> > you are seiing are those numbers. If we enter the R code returned by dput 
>> > into our R session we will be able to see the dates.
>> >
>> > Your mjo30 table seems to call the day of the month the "date"... which is 
>> > confusing. I would combine those three columns into one like
>> >
>> > mjo30$Dt <- as.Date( ISOdate( mjo30$year, mjo30$month, mjo30$date ) )
>> >
>> > You could then use indexing
>> >
>> > mjo30[ date[1] == mjo30$Dt, ]
>> >
>> > or
>> >
>> > mjo30[ mjo30$Dt %in% date, ]
>> >
>> > but the subset function would not work in this case because you have two 
>> > different objects (a column in mjo30 and a vector in your global 
>> > environment) both referred to as 'date'.
>> >
>> > On January 13, 2020 8:53:38 PM PST, ani jaya  wrote:
>> > >Good morning R-Help,
>> > >
>> > >I have a dataframe with 7 columns and 1+ rows. I want to
>> > >subset/extract
>> > >those data frame with specific date (not in order). Here the head of my
>> > >data frame:
>> > >
>> > >head(mjo30)  year month date  rmm1 rmm2 phase amp
>> > >1 1986 11 -0.326480 -1.55895 2 1.59277
>> > >2 1986 12 -0.417700 -1.82689 2 1.87403
>> > >3 1986 13  0.032915 -2.40150 3 2.40172
>> > >4 1986 14  0.492743 -2.49216 3 2.54041
>> > >5 1986 15  0.585106 -2.76866 3 2.82981
>> > >6 1986 16  0.665013 -3.13883 3 3.20851
>> > >
>> > >and here my specific date:
>> > >> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05"
>> > >"1990-10-26" "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
>> > >[10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
>> > >"2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
>> > >[19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
>> > >"2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
>> > >[28] "2013-04-07" "2014-05-07" "2015-07-26"
>> > >
>> > >And also I was confused when I dput my date, it show like this:
>> > >> dput(date)structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423,
>> > >9103,
>> > >9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
>> > >13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
>> > >16197, 16642), class = "Date")
>> > >
>> > >what is that mean? I mean why it is not recall the dates but some
>> > >values (5958,6369,7217,..)?
>> > >
>> > >Any comment and recommendation is appreciate.  Thank you.
>> > >
>> > >Best,
>> > >
>> > >Ani
>> > >
>> > >   [[alternative HTML version deleted]]
>> > >
>> > >__
>> > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > >https://stat.ethz.ch/mailman/listinfo/r-help
>> > >PLEASE do read the posting guide
>> > >http://www.R-project.org/posting-guide.html
>> > >and provide commented, minimal, self-contained, reproducible code.
>> >
>> > --
>> > Sent from my phone. Please excuse my brevity.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing

Re: [R] Subset a data frame with specific date

2020-01-13 Thread Bert Gunter

That's fine, but do note that the which() function is wholly unnecessary in
your last line as R allows logical indexing. Perhaps another topic you need
to study.

-- Bert



On Mon, Jan 13, 2020 at 10:56 PM ani jaya  wrote:

> Dear Jeff and Bert,
>
> Thank you very much for your correction and explanation.
> And yes, I need to study about date format more.
> Sorry for HTML mail, don't realize.
>
> I was able to subset the data that I want.
>
> mjo30<-read.table("rmm.txt", header=FALSE, skip=4234, nrows=10957)
> mjo30$V8<-NULL
> names(mjo30)<-c("year","month","day", "rmm1","rmm2","phase","amp")
> mjo3<-as.Date(with(mjo30,paste(year,month, day, sep="-")),"%Y-%m-%d")
> mjo<-mjo30[which(mjo3%in%date),]
>
> head(mjo)
>  year month day  rmm1  rmm2 phase  amp
> 115  1986 4  25 -0.319090 -0.363030 2 0.483332
> 526  1987 6  10  1.662870  0.291632 5 1.688250
> 977  1988 9   3 -0.604950 -0.299850 1 0.675181
> 1374 198910   5  0.972298 -0.461030 4 1.076060
> 1760 199010  26 -1.183110 -1.589810 2 1.981730
> 1953 1991 5   7 -0.317180  0.953061 7 1.004450
>
>
> Best,
> Ani
>
>
> On Tue, Jan 14, 2020 at 3:20 PM Jeff Newmiller 
> wrote:
> >
> > The dput function is for re-creating an R object in another R workspace,
> so it uses fundamental base types to define objects. A Date is really the
> number of days since a specific date (typically 1970-01-01) that get
> converted to look like dates whenever you display or print them, so what
> you are seiing are those numbers. If we enter the R code returned by dput
> into our R session we will be able to see the dates.
> >
> > Your mjo30 table seems to call the day of the month the "date"... which
> is confusing. I would combine those three columns into one like
> >
> > mjo30$Dt <- as.Date( ISOdate( mjo30$year, mjo30$month, mjo30$date ) )
> >
> > You could then use indexing
> >
> > mjo30[ date[1] == mjo30$Dt, ]
> >
> > or
> >
> > mjo30[ mjo30$Dt %in% date, ]
> >
> > but the subset function would not work in this case because you have two
> different objects (a column in mjo30 and a vector in your global
> environment) both referred to as 'date'.
> >
> > On January 13, 2020 8:53:38 PM PST, ani jaya  wrote:
> > >Good morning R-Help,
> > >
> > >I have a dataframe with 7 columns and 1+ rows. I want to
> > >subset/extract
> > >those data frame with specific date (not in order). Here the head of my
> > >data frame:
> > >
> > >head(mjo30)  year month date  rmm1 rmm2 phase amp
> > >1 1986 11 -0.326480 -1.55895 2 1.59277
> > >2 1986 12 -0.417700 -1.82689 2 1.87403
> > >3 1986 13  0.032915 -2.40150 3 2.40172
> > >4 1986 14  0.492743 -2.49216 3 2.54041
> > >5 1986 15  0.585106 -2.76866 3 2.82981
> > >6 1986 16  0.665013 -3.13883 3 3.20851
> > >
> > >and here my specific date:
> > >> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05"
> > >"1990-10-26" "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
> > >[10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
> > >"2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
> > >[19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
> > >"2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
> > >[28] "2013-04-07" "2014-05-07" "2015-07-26"
> > >
> > >And also I was confused when I dput my date, it show like this:
> > >> dput(date)structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423,
> > >9103,
> > >9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
> > >13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
> > >16197, 16642), class = "Date")
> > >
> > >what is that mean? I mean why it is not recall the dates but some
> > >values (5958,6369,7217,..)?
> > >
> > >Any comment and recommendation is appreciate.  Thank you.
> > >
> > >Best,
> > >
> > >Ani
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > >__
> > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Sent from my phone. Please excuse my brevity.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] Subset a data frame with specific date

2020-01-13 Thread ani jaya

Dear Jeff and Bert,

Thank you for your correction and explanation.
Yes, I need more study regarding date format and
sorry for HTML mail.

I was able to subset data that I want.

mjo30<-read.table("rmm.txt", header=FALSE, skip=4234, nrows=10957)
mjo30$V8<-NULL
names(mjo30)<-c("year","month","day", "rmm1","rmm2","phase","amp")
mjo3<-as.Date(with(mjo30,paste(year,month, day, sep="-")),"%Y-%m-%d")
mjo<-mjo30[which(mjo3%in%date),]

head(mjo)
 year month day  rmm1  rmm2 phase  amp
115  1986 4  25 -0.319090 -0.363030 2 0.483332
526  1987 6  10  1.662870  0.291632 5 1.688250
977  1988 9   3 -0.604950 -0.299850 1 0.675181
1374 198910   5  0.972298 -0.461030 4 1.076060
1760 199010  26 -1.183110 -1.589810 2 1.981730
1953 1991 5   7 -0.317180  0.953061 7 1.004450

Best,
Ani

On Tue, Jan 14, 2020 at 3:56 PM ani jaya  wrote:
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset a data frame with specific date

2020-01-13 Thread ani jaya

Dear Jeff and Bert,

Thank you very much for your correction and explanation.
And yes, I need to study about date format more.
Sorry for HTML mail, don't realize.

I was able to subset the data that I want.

mjo30<-read.table("rmm.txt", header=FALSE, skip=4234, nrows=10957)
mjo30$V8<-NULL
names(mjo30)<-c("year","month","day", "rmm1","rmm2","phase","amp")
mjo3<-as.Date(with(mjo30,paste(year,month, day, sep="-")),"%Y-%m-%d")
mjo<-mjo30[which(mjo3%in%date),]

head(mjo)
 year month day  rmm1  rmm2 phase  amp
115  1986 4  25 -0.319090 -0.363030 2 0.483332
526  1987 6  10  1.662870  0.291632 5 1.688250
977  1988 9   3 -0.604950 -0.299850 1 0.675181
1374 198910   5  0.972298 -0.461030 4 1.076060
1760 199010  26 -1.183110 -1.589810 2 1.981730
1953 1991 5   7 -0.317180  0.953061 7 1.004450


Best,
Ani


On Tue, Jan 14, 2020 at 3:20 PM Jeff Newmiller  wrote:
>
> The dput function is for re-creating an R object in another R workspace, so 
> it uses fundamental base types to define objects. A Date is really the number 
> of days since a specific date (typically 1970-01-01) that get converted to 
> look like dates whenever you display or print them, so what you are seiing 
> are those numbers. If we enter the R code returned by dput into our R session 
> we will be able to see the dates.
>
> Your mjo30 table seems to call the day of the month the "date"... which is 
> confusing. I would combine those three columns into one like
>
> mjo30$Dt <- as.Date( ISOdate( mjo30$year, mjo30$month, mjo30$date ) )
>
> You could then use indexing
>
> mjo30[ date[1] == mjo30$Dt, ]
>
> or
>
> mjo30[ mjo30$Dt %in% date, ]
>
> but the subset function would not work in this case because you have two 
> different objects (a column in mjo30 and a vector in your global environment) 
> both referred to as 'date'.
>
> On January 13, 2020 8:53:38 PM PST, ani jaya  wrote:
> >Good morning R-Help,
> >
> >I have a dataframe with 7 columns and 1+ rows. I want to
> >subset/extract
> >those data frame with specific date (not in order). Here the head of my
> >data frame:
> >
> >head(mjo30)  year month date  rmm1 rmm2 phase amp
> >1 1986 11 -0.326480 -1.55895 2 1.59277
> >2 1986 12 -0.417700 -1.82689 2 1.87403
> >3 1986 13  0.032915 -2.40150 3 2.40172
> >4 1986 14  0.492743 -2.49216 3 2.54041
> >5 1986 15  0.585106 -2.76866 3 2.82981
> >6 1986 16  0.665013 -3.13883 3 3.20851
> >
> >and here my specific date:
> >> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05"
> >"1990-10-26" "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
> >[10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
> >"2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
> >[19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
> >"2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
> >[28] "2013-04-07" "2014-05-07" "2015-07-26"
> >
> >And also I was confused when I dput my date, it show like this:
> >> dput(date)structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423,
> >9103,
> >9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
> >13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
> >16197, 16642), class = "Date")
> >
> >what is that mean? I mean why it is not recall the dates but some
> >values (5958,6369,7217,..)?
> >
> >Any comment and recommendation is appreciate.  Thank you.
> >
> >Best,
> >
> >Ani
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset a data frame with specific date

2020-01-13 Thread Bert Gunter

Inline.

Bert Gunter




On Mon, Jan 13, 2020 at 8:54 PM ani jaya  wrote:

> Good morning R-Help,
>
> I have a dataframe with 7 columns and 1+ rows. I want to subset/extract
> those data frame with specific date (not in order). Here the head of my
> data frame:
>
> head(mjo30)



> year month date  rmm1 rmm2 phase amp
> 1 1986 11 -0.326480 -1.55895 2 1.59277
> 2 1986 12 -0.417700 -1.82689 2 1.87403
> 3 1986 13  0.032915 -2.40150 3 2.40172
> 4 1986 14  0.492743 -2.49216 3 2.54041
> 5 1986 15  0.585106 -2.76866 3 2.82981
> 6 1986 16  0.665013 -3.13883 3 3.20851
>

These are columns of numeric values. That you label them as year, month,
date is irrelevant,.

>
> and here my specific date:
> > date



> [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05" "1990-10-26"
> "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
> [10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
> "2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
> [19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
> "2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
> [28] "2013-04-07" "2014-05-07" "2015-07-26"
>
> This is how the print method for Date objects prints the dates. See ?Dates

And also I was confused when I dput my date, it show like this:
> > dput(date)



> structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423, 9103,
> 9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
> 13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
> 16197, 16642), class = "Date")
>

These are how objects of class date are represented internally, as
integers. See ?Dates.
Use ?str to see the structure of an object, not dput()
I think you need to go through a tutorial or two on dates in R. And
probably also on S3 methods in R.


> what is that mean? I mean why it is not recall the dates but some
> values (5958,6369,7217,..)?
>
> Any comment and recommendation is appreciate.  Thank you.
>
> Extended tutorials on these topics are inappropriate here. There are many
places they can be found on the web.
But here's an example for one simple way to do it:

> d <- as.Date("2004-10-5") ## create object of class "Date"
## This is what you want to subset with
> d  ## how they are printed
[1] "2004-10-05"
> str(d)
 Date[1:1], format: "2004-10-05"
> class(d)
[1] "Date"
> dput(d) ## the internal representation of Date objects
structure(12696, class = "Date")
>
>
> ## Now create a data frame that you want to subset with d
> df <- data.frame (year = c(2004,2005),
+   month = c(10,2),
+   date = c(5,15))
> df
  year month date
1 2004105
2 2005 2   15
> ## convert to a formatted character column of dates
> alldates <- with(df,paste(year,month,date, sep ="-"))
> alldates ## vector of formatted character strings.
[1] "2004-10-5" "2005-2-15"
> class(alldates)
[1] "character"
> ## convert it to "Date" class
> alldates <- as.Date(alldates)
> class(alldates)
[1] "Date"
> ## Now use this to subset the data frame
> df[alldates %in% d, ]
  year month date
1 2004105


## And please post in **plain text** not HTML in future.

Cheers,
Bert




Best,
>
> Ani
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset a data frame with specific date

2020-01-13 Thread Jeff Newmiller

The dput function is for re-creating an R object in another R workspace, so it 
uses fundamental base types to define objects. A Date is really the number of 
days since a specific date (typically 1970-01-01) that get converted to look 
like dates whenever you display or print them, so what you are seiing are those 
numbers. If we enter the R code returned by dput into our R session we will be 
able to see the dates.

Your mjo30 table seems to call the day of the month the "date"... which is 
confusing. I would combine those three columns into one like

mjo30$Dt <- as.Date( ISOdate( mjo30$year, mjo30$month, mjo30$date ) )

You could then use indexing

mjo30[ date[1] == mjo30$Dt, ]

or

mjo30[ mjo30$Dt %in% date, ]

but the subset function would not work in this case because you have two 
different objects (a column in mjo30 and a vector in your global environment) 
both referred to as 'date'.

On January 13, 2020 8:53:38 PM PST, ani jaya  wrote:
>Good morning R-Help,
>
>I have a dataframe with 7 columns and 1+ rows. I want to
>subset/extract
>those data frame with specific date (not in order). Here the head of my
>data frame:
>
>head(mjo30)  year month date  rmm1 rmm2 phase amp
>1 1986 11 -0.326480 -1.55895 2 1.59277
>2 1986 12 -0.417700 -1.82689 2 1.87403
>3 1986 13  0.032915 -2.40150 3 2.40172
>4 1986 14  0.492743 -2.49216 3 2.54041
>5 1986 15  0.585106 -2.76866 3 2.82981
>6 1986 16  0.665013 -3.13883 3 3.20851
>
>and here my specific date:
>> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05"
>"1990-10-26" "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
>[10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
>"2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
>[19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
>"2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
>[28] "2013-04-07" "2014-05-07" "2015-07-26"
>
>And also I was confused when I dput my date, it show like this:
>> dput(date)structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423,
>9103,
>9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
>13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
>16197, 16642), class = "Date")
>
>what is that mean? I mean why it is not recall the dates but some
>values (5958,6369,7217,..)?
>
>Any comment and recommendation is appreciate.  Thank you.
>
>Best,
>
>Ani
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset a data frame with specific date

2020-01-13 Thread ani jaya

Good morning R-Help,

I have a dataframe with 7 columns and 1+ rows. I want to subset/extract
those data frame with specific date (not in order). Here the head of my
data frame:

head(mjo30)  year month date  rmm1 rmm2 phase amp
1 1986 11 -0.326480 -1.55895 2 1.59277
2 1986 12 -0.417700 -1.82689 2 1.87403
3 1986 13  0.032915 -2.40150 3 2.40172
4 1986 14  0.492743 -2.49216 3 2.54041
5 1986 15  0.585106 -2.76866 3 2.82981
6 1986 16  0.665013 -3.13883 3 3.20851

and here my specific date:
> date [1] "1986-04-25" "1987-06-10" "1988-09-03" "1989-10-05" "1990-10-26" 
> "1991-05-07" "1992-11-19" "1993-01-23" "1994-12-04"
[10] "1995-05-11" "1996-10-04" "1997-04-29" "1998-04-08" "1999-01-16"
"2000-08-01" "2001-10-02" "2002-05-08" "2003-04-01"
[19] "2004-05-07" "2005-09-02" "2006-12-30" "2007-09-03" "2008-10-24"
"2009-11-14" "2010-07-05" "2011-04-30" "2012-05-21"
[28] "2013-04-07" "2014-05-07" "2015-07-26"

And also I was confused when I dput my date, it show like this:
> dput(date)structure(c(5958, 6369, 6820, 7217, 7603, 7796, 8358, 8423, 9103,
9261, 9773, 9980, 10324, 10607, 11170, 11597, 11815, 12143, 12545,
13028, 13512, 13759, 14176, 14562, 14795, 15094, 15481, 15802,
16197, 16642), class = "Date")

what is that mean? I mean why it is not recall the dates but some
values (5958,6369,7217,..)?

Any comment and recommendation is appreciate.  Thank you.

Best,

Ani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset English language using textcat package

2018-11-19 Thread Robert David Burbidge via R-help

Look at the help docs and examples for textcat and sapply:

print(as.character(data$x[sapply(data$x, textcat)=="english"]))

Although textcat defaults classify "This book is amazing" as dutch, so 
you may want to read the help for textcat and change the profile db 
("p") or "method".

On 19/11/2018 09:48, Elahe chalabi via R-help wrote:

Hi all,

How is it possible to subset English text from a df containing German and 
English texts using textcat package?

 > library(textcat)
 > dput(data)
 structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label = c("Dieses Buch 
ist erstaunlich",
 "I love this book", "ich liebe dieses Buch", "mehrere bücher in prozess",
 "several books in proccess", "This book is amazing"), class = "factor")), 
row.names = c(NA,
 -6L), class = "data.frame")

I want the output to be like the following:

 "I love this book"  "This book is amazing"  "several books in proccess"

Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset English language using textcat package

2018-11-19 Thread Elahe chalabi via R-help

Hi all, 

How is it possible to subset English text from a df containing German and 
English texts using textcat package?



> library(textcat)
> dput(data) 
structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label = c("Dieses 
Buch ist erstaunlich", 
"I love this book", "ich liebe dieses Buch", "mehrere bücher in prozess", 
"several books in proccess", "This book is amazing"), class = "factor")), 
row.names = c(NA, 
-6L), class = "data.frame")

I want the output to be like the following:


"I love this book"  "This book is amazing"  "several books in proccess"


Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset only if f.e a column is successive for more than 3 values

2018-09-28 Thread William Dunlap via R-help

Do you also want lines 38 and 39 (in addition to 40:44), or do I
misunderstand your problem?

When you deal with runs of data, think of the rle (run-length encoding)
function.  E.g. here is
a barely tested function to find runs of a given minimum length and a given
difference between
successive values.  It also returns a 'runNumber' so you can split the
result into runs.

findRuns <- function(x, minRunLength=3, difference=1) {
 # for integral x, find runs of length at least 'minRunLength'
 # with 'difference' between succesive values
 d <- diff(x)
 dRle <- rle(d)
 w <- rep(dRle$lengths>=minRunLength-1 & dRle$values==difference,
dRle$lengths)
 values <- x[c(FALSE,w) | c(w,FALSE)]
 runNumber <- cumsum(c(TRUE, diff(values)!=difference))
 data.frame(values=values, runNumber=runNumber)
}

> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20))
  values runNumber
1  1 1
2  2 1
3  3 1
4 17 2
5 18 2
6 19 2
7 20 2
> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), minRunLength=4)
  values runNumber
1 17 1
2 18 1
3 19 1
4 20 1
> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), difference=-2)
  values runNumber
1 10 1
2  8 1
3  6 1
4  4 1


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Sep 27, 2018 at 7:48 AM, Knut Krueger 
wrote:

> Hi to all
>
> I need a subset for values if there are f.e 3 values successive in a
> column of a Data Frame:
> Example from the subset help page:
>
> subset(airquality, Temp > 80, select = c(Ozone, Temp))
> 29 45   81
> 35 NA   84
> 36 NA   85
> 38 29   82
> 39 NA   87
> 40 71   90
> 41 39   87
> 42 NA   93
> 43 NA   92
> 44 23   82
> .
>
> I would like to get only
>
> ...
> 40 71   90
> 41 39   87
> 42 NA   93
> 43 NA   92
> 44 23   82
> 
>
> because the left column is ascending more than f.e three times without gap
>
> Any hints for a package or do I need to build a own function?
>
> Kind Regards Knut
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset only if f.e a column is successive for more than 3 values

2018-09-28 Thread Knut Krueger


Hi Jim,
thank's it is working with the given example,
but whats the difference when using

testdata=data.frame(TIME=c("17:11:20", "17:11:21", "17:11:22", 
"17:11:23", "17:11:24", "17:11:25", "17:11:26", "17:11:27", "17:11:28", 
"17:21:43",
"17:22:16", "17:22:19", "18:04:48", "18:04:49", 
"18:04:50", "18:04:51", "18:04:52", "19:50:09", "00:59:27", "00:59:28",

"00:59:29", "04:13:40", "04:13:43", "04:13:44"),

index=c(8960,8961,8962,8963,8964,8965,8966,8967,8968,9583,9616,9619,12168,12169,12170,12171,12172,18489
  ,37047,37048,37049,48700,48701,48702))

seqindx<-rle(diff(testdata$index)==1)
runsel<-seqindx$lengths >= 3 & seqindx$values
# get the indices for the starts of the runs
starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
# and the ends
ends<-cumsum(seqindx$lengths)[runsel]+1

eval(parse(text=paste0("testdata[c(",paste(starts,ends,sep=":",collapse=","),"),]")))

the result (index)  is 
12168,9619,9616,9583,8968,12168,12169,12170,12171,12172



maybe the gaps between .. 8967,8968,9583,9616,9619,12168,12169 ..?

Regards Knut

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset only if f.e a column is successive for more than 3 values

2018-09-27 Thread Jim Lemon

Bugger! It's

eval(parse(text=paste0("kkdf[c(",paste(starts,ends,sep=":",collapse=","),"),]")))

What a mess!

Jim
On Fri, Sep 28, 2018 at 8:35 AM Jim Lemon  wrote:
>
> Hi Knut,
> As Bert said, you can start with diff and work from there. I can
> easily get the text for the subset, but despite fooling around with
> "parse", "eval" and "expression", I couldn't get it to work:
>
> # use a bigger subset to test whether multiple runs can be extracted
> kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp"))
> kkdf$index<-as.numeric(rownames(kkdf))
> # get the run length encoding
> seqindx<-rle(diff(kkdf$index)==1)
> # get a logical vector of the starts of the runs
> runsel<-seqindx$lengths >= 3 & seqindx$values
> # get the indices for the starts of the runs
> starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
> # and the ends
> ends<-cumsum(seqindx$lengths)[runsel]+1
> # the character representation of the subset as indices is
> paste0("c(",paste(starts,ends,sep=":",collapse=","),")")
>
> I expect there will be a lightning response from someone who knows
> about converting the resulting string into whatever is needed.
>
> Jim
> On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter  wrote:
> >
> > 1. I assume the values are integers, not floats/numerics (which woud make
> > it more complicated).
> >
> > 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a
> > row.
> >
> > I don't have time to work out details, but perhaps that helps.
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger 
> > wrote:
> >
> > > Hi to all
> > >
> > > I need a subset for values if there are f.e 3 values successive in a
> > > column of a Data Frame:
> > > Example from the subset help page:
> > >
> > > subset(airquality, Temp > 80, select = c(Ozone, Temp))
> > > 29 45   81
> > > 35 NA   84
> > > 36 NA   85
> > > 38 29   82
> > > 39 NA   87
> > > 40 71   90
> > > 41 39   87
> > > 42 NA   93
> > > 43 NA   92
> > > 44 23   82
> > > .
> > >
> > > I would like to get only
> > >
> > > ...
> > > 40 71   90
> > > 41 39   87
> > > 42 NA   93
> > > 43 NA   92
> > > 44 23   82
> > > 
> > >
> > > because the left column is ascending more than f.e three times without gap
> > >
> > > Any hints for a package or do I need to build a own function?
> > >
> > > Kind Regards Knut
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset only if f.e a column is successive for more than 3 values

2018-09-27 Thread Jim Lemon

Hi Knut,
As Bert said, you can start with diff and work from there. I can
easily get the text for the subset, but despite fooling around with
"parse", "eval" and "expression", I couldn't get it to work:

# use a bigger subset to test whether multiple runs can be extracted
kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp"))
kkdf$index<-as.numeric(rownames(kkdf))
# get the run length encoding
seqindx<-rle(diff(kkdf$index)==1)
# get a logical vector of the starts of the runs
runsel<-seqindx$lengths >= 3 & seqindx$values
# get the indices for the starts of the runs
starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
# and the ends
ends<-cumsum(seqindx$lengths)[runsel]+1
# the character representation of the subset as indices is
paste0("c(",paste(starts,ends,sep=":",collapse=","),")")

I expect there will be a lightning response from someone who knows
about converting the resulting string into whatever is needed.

Jim
On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter  wrote:
>
> 1. I assume the values are integers, not floats/numerics (which woud make
> it more complicated).
>
> 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a
> row.
>
> I don't have time to work out details, but perhaps that helps.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger 
> wrote:
>
> > Hi to all
> >
> > I need a subset for values if there are f.e 3 values successive in a
> > column of a Data Frame:
> > Example from the subset help page:
> >
> > subset(airquality, Temp > 80, select = c(Ozone, Temp))
> > 29 45   81
> > 35 NA   84
> > 36 NA   85
> > 38 29   82
> > 39 NA   87
> > 40 71   90
> > 41 39   87
> > 42 NA   93
> > 43 NA   92
> > 44 23   82
> > .
> >
> > I would like to get only
> >
> > ...
> > 40 71   90
> > 41 39   87
> > 42 NA   93
> > 43 NA   92
> > 44 23   82
> > 
> >
> > because the left column is ascending more than f.e three times without gap
> >
> > Any hints for a package or do I need to build a own function?
> >
> > Kind Regards Knut
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset only if f.e a column is successive for more than 3 values

2018-09-27 Thread Bert Gunter

1. I assume the values are integers, not floats/numerics (which woud make
it more complicated).

2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a
row.

I don't have time to work out details, but perhaps that helps.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger 
wrote:

> Hi to all
>
> I need a subset for values if there are f.e 3 values successive in a
> column of a Data Frame:
> Example from the subset help page:
>
> subset(airquality, Temp > 80, select = c(Ozone, Temp))
> 29 45   81
> 35 NA   84
> 36 NA   85
> 38 29   82
> 39 NA   87
> 40 71   90
> 41 39   87
> 42 NA   93
> 43 NA   92
> 44 23   82
> .
>
> I would like to get only
>
> ...
> 40 71   90
> 41 39   87
> 42 NA   93
> 43 NA   92
> 44 23   82
> 
>
> because the left column is ascending more than f.e three times without gap
>
> Any hints for a package or do I need to build a own function?
>
> Kind Regards Knut
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset only if f.e a column is successive for more than 3 values

2018-09-27 Thread Knut Krueger


Hi to all

I need a subset for values if there are f.e 3 values successive in a 
column of a Data Frame:

Example from the subset help page:

subset(airquality, Temp > 80, select = c(Ozone, Temp))
29 45   81
35 NA   84
36 NA   85
38 29   82
39 NA   87
40 71   90
41 39   87
42 NA   93
43 NA   92
44 23   82
.

I would like to get only

...
40 71   90
41 39   87
42 NA   93
43 NA   92
44 23   82


because the left column is ascending more than f.e three times without gap

Any hints for a package or do I need to build a own function?

Kind Regards Knut

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset Rasterbrick by time

2018-06-19 Thread Miluji Sb

Dear David,

Subsetting works but the 'date' information is lost in the new file.

Thanks, Mike. I was not aware of the bug but will work on learning
about (getZ) and (setZ). Thanks again!

Sincerely,

Milu

On Tue, Jun 19, 2018 at 7:32 AM, Michael Sumner  wrote:

>
>
> On Mon, 18 Jun 2018, 22:09 David Winsemius, 
> wrote:
>
>>
>>
>> > On Jun 18, 2018, at 7:21 AM, Miluji Sb  wrote:
>> >
>> > Dear all,
>> >
>> > I have a rasterbrick with the date/time information provided which I
>> would
>> > like to subset by year.
>> >
>> > However, when I use the following code for sub-setting;
>> >
>> > new_brick <- subset(original, which(getZ( original ) >=
>> as.Date("2000-01-01
>> > 10:30:00") & getZ(original ) <= as.Date("2014-12-31 10:30:00")))
>> >
>> > The date/time information seems to be lost.
>> >
>>
>
> This is a bug, I tend to extract (getZ) the dates, do the subset logic on
> both and restore (setZ).
>
> It takes a bit of learning and practice, good luck. I can't expand more at
> the moment. See R-Sig-Geo for more specific discussion forum, and #rstats
> on twitter is really good.
>
> Cheers, Mike
>
>> > Furthermore, the class of the date/time seems to be character;
>> >
>> > ##
>> > class(getZ( original ))
>> > [1] "character"
>> >
>> > Is it possible to convert this string to date before sub-setting or
>> retain
>> > the date/time information after sub-setting?
>>
>> Yes, it is certainly possible, but why bother? R's Comparison operators
>> work on character values so you should be able to do this (if the
>> subsetting is syntactically correct:
>>
>>  new_brick <- subset(original, which(getZ( original ) >= "2000-01-01
>> 10:30:00" & getZ(original ) <= "2014-12-31 10:30:00") )
>>
>>
>> As always if you had presented the output of dput(head(original))
>> assuming that head is a meaningful operation on such an object, the
>> demonstration would have been possible. An alternate would be to offer a
>> library call to a package and then load a relevant example.
>>
>>
>> Best;
>> David
>> >
>> > ### original RasterBrick ###
>> > class   : RasterBrick
>> > dimensions  : 600, 1440, 864000, 11320  (nrow, ncol, ncell, nlayers)
>> > resolution  : 0.25, 0.25  (x, y)
>> > extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
>> > coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>> > data source :
>> > /work/mm01117/GLDAS_025_deg/daily/gldas_tavg_tmin_tmax_
>> precip_windspd_sphum_daily_1986_2016.nc4
>> > names   : X1986.01.01.10.30.00, X1986.01.02.10.30.00,
>> > X1986.01.03.10.30.00, X1986.01.04.10.30.00, X1986.01.05.10.30.00,
>> > X1986.01.06.10.30.00, X1986.01.07.10.30.00, X1986.01.08.10.30.00,
>> > X1986.01.09.10.30.00, X1986.01.10.10.30.00, X1986.01.11.10.30.00,
>> > X1986.01.12.10.30.00, X1986.01.13.10.30.00, X1986.01.14.10.30.00,
>> > X1986.01.15.10.30.00, ...
>> > Date/time   : 1986-01-01 10:30:00, 2016-12-31 10:30:00 (min, max)
>> > varname : v1
>> >
>> > ### new RasterBrick ###
>> > class   : RasterStack
>> > dimensions  : 600, 1440, 864000, 5477  (nrow, ncol, ncell, nlayers)
>> > resolution  : 0.25, 0.25  (x, y)
>> > extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
>> > coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>> > names   : X2000.01.01.10.30.00, X2000.01.02.10.30.00,
>> > X2000.01.03.10.30.00, X2000.01.04.10.30.00, X2000.01.05.10.30.00,
>> > X2000.01.06.10.30.00, X2000.01.07.10.30.00, X2000.01.08.10.30.00,
>> > X2000.01.09.10.30.00, X2000.01.10.10.30.00, X2000.01.11.10.30.00,
>> > X2000.01.12.10.30.00, X2000.01.13.10.30.00, X2000.01.14.10.30.00,
>> > X2000.01.15.10.30.00, ...
>> >
>> > Any help will be greatly appreciated.
>> >
>> > Sincerely,
>> >
>> > Milu
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> --
> Dr. Michael Sumner
> Software and Database Engineer
> Australian Antarctic Division
> 203 Channel Highway
> 
> Kingston Tasmania 7050 Australia
> 
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] Subset Rasterbrick by time

2018-06-18 Thread Michael Sumner

On Mon, 18 Jun 2018, 22:09 David Winsemius,  wrote:

>
>
> > On Jun 18, 2018, at 7:21 AM, Miluji Sb  wrote:
> >
> > Dear all,
> >
> > I have a rasterbrick with the date/time information provided which I
> would
> > like to subset by year.
> >
> > However, when I use the following code for sub-setting;
> >
> > new_brick <- subset(original, which(getZ( original ) >=
> as.Date("2000-01-01
> > 10:30:00") & getZ(original ) <= as.Date("2014-12-31 10:30:00")))
> >
> > The date/time information seems to be lost.
> >
>

This is a bug, I tend to extract (getZ) the dates, do the subset logic on
both and restore (setZ).

It takes a bit of learning and practice, good luck. I can't expand more at
the moment. See R-Sig-Geo for more specific discussion forum, and #rstats
on twitter is really good.

Cheers, Mike

> > Furthermore, the class of the date/time seems to be character;
> >
> > ##
> > class(getZ( original ))
> > [1] "character"
> >
> > Is it possible to convert this string to date before sub-setting or
> retain
> > the date/time information after sub-setting?
>
> Yes, it is certainly possible, but why bother? R's Comparison operators
> work on character values so you should be able to do this (if the
> subsetting is syntactically correct:
>
>  new_brick <- subset(original, which(getZ( original ) >= "2000-01-01
> 10:30:00" & getZ(original ) <= "2014-12-31 10:30:00") )
>
>
> As always if you had presented the output of dput(head(original)) assuming
> that head is a meaningful operation on such an object, the demonstration
> would have been possible. An alternate would be to offer a library call to
> a package and then load a relevant example.
>
>
> Best;
> David
> >
> > ### original RasterBrick ###
> > class   : RasterBrick
> > dimensions  : 600, 1440, 864000, 11320  (nrow, ncol, ncell, nlayers)
> > resolution  : 0.25, 0.25  (x, y)
> > extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
> > coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
> > data source :
> >
> /work/mm01117/GLDAS_025_deg/daily/gldas_tavg_tmin_tmax_precip_windspd_sphum_daily_1986_2016.nc4
> > names   : X1986.01.01.10.30.00, X1986.01.02.10.30.00,
> > X1986.01.03.10.30.00, X1986.01.04.10.30.00, X1986.01.05.10.30.00,
> > X1986.01.06.10.30.00, X1986.01.07.10.30.00, X1986.01.08.10.30.00,
> > X1986.01.09.10.30.00, X1986.01.10.10.30.00, X1986.01.11.10.30.00,
> > X1986.01.12.10.30.00, X1986.01.13.10.30.00, X1986.01.14.10.30.00,
> > X1986.01.15.10.30.00, ...
> > Date/time   : 1986-01-01 10:30:00, 2016-12-31 10:30:00 (min, max)
> > varname : v1
> >
> > ### new RasterBrick ###
> > class   : RasterStack
> > dimensions  : 600, 1440, 864000, 5477  (nrow, ncol, ncell, nlayers)
> > resolution  : 0.25, 0.25  (x, y)
> > extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
> > coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
> > names   : X2000.01.01.10.30.00, X2000.01.02.10.30.00,
> > X2000.01.03.10.30.00, X2000.01.04.10.30.00, X2000.01.05.10.30.00,
> > X2000.01.06.10.30.00, X2000.01.07.10.30.00, X2000.01.08.10.30.00,
> > X2000.01.09.10.30.00, X2000.01.10.10.30.00, X2000.01.11.10.30.00,
> > X2000.01.12.10.30.00, X2000.01.13.10.30.00, X2000.01.14.10.30.00,
> > X2000.01.15.10.30.00, ...
> >
> > Any help will be greatly appreciated.
> >
> > Sincerely,
> >
> > Milu
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset Rasterbrick by time

2018-06-18 Thread David Winsemius




> On Jun 18, 2018, at 7:21 AM, Miluji Sb  wrote:
> 
> Dear all,
> 
> I have a rasterbrick with the date/time information provided which I would
> like to subset by year.
> 
> However, when I use the following code for sub-setting;
> 
> new_brick <- subset(original, which(getZ( original ) >= as.Date("2000-01-01
> 10:30:00") & getZ(original ) <= as.Date("2014-12-31 10:30:00")))
> 
> The date/time information seems to be lost.
> 
> Furthermore, the class of the date/time seems to be character;
> 
> ##
> class(getZ( original ))
> [1] "character"
> 
> Is it possible to convert this string to date before sub-setting or retain
> the date/time information after sub-setting?

Yes, it is certainly possible, but why bother? R's Comparison operators work on 
character values so you should be able to do this (if the subsetting is 
syntactically correct:

 new_brick <- subset(original, which(getZ( original ) >= "2000-01-01
10:30:00" & getZ(original ) <= "2014-12-31 10:30:00") )


As always if you had presented the output of dput(head(original)) assuming that 
head is a meaningful operation on such an object, the demonstration would have 
been possible. An alternate would be to offer a library call to a package and 
then load a relevant example.


Best;
David
> 
> ### original RasterBrick ###
> class   : RasterBrick
> dimensions  : 600, 1440, 864000, 11320  (nrow, ncol, ncell, nlayers)
> resolution  : 0.25, 0.25  (x, y)
> extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
> data source :
> /work/mm01117/GLDAS_025_deg/daily/gldas_tavg_tmin_tmax_precip_windspd_sphum_daily_1986_2016.nc4
> names   : X1986.01.01.10.30.00, X1986.01.02.10.30.00,
> X1986.01.03.10.30.00, X1986.01.04.10.30.00, X1986.01.05.10.30.00,
> X1986.01.06.10.30.00, X1986.01.07.10.30.00, X1986.01.08.10.30.00,
> X1986.01.09.10.30.00, X1986.01.10.10.30.00, X1986.01.11.10.30.00,
> X1986.01.12.10.30.00, X1986.01.13.10.30.00, X1986.01.14.10.30.00,
> X1986.01.15.10.30.00, ...
> Date/time   : 1986-01-01 10:30:00, 2016-12-31 10:30:00 (min, max)
> varname : v1
> 
> ### new RasterBrick ###
> class   : RasterStack
> dimensions  : 600, 1440, 864000, 5477  (nrow, ncol, ncell, nlayers)
> resolution  : 0.25, 0.25  (x, y)
> extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
> names   : X2000.01.01.10.30.00, X2000.01.02.10.30.00,
> X2000.01.03.10.30.00, X2000.01.04.10.30.00, X2000.01.05.10.30.00,
> X2000.01.06.10.30.00, X2000.01.07.10.30.00, X2000.01.08.10.30.00,
> X2000.01.09.10.30.00, X2000.01.10.10.30.00, X2000.01.11.10.30.00,
> X2000.01.12.10.30.00, X2000.01.13.10.30.00, X2000.01.14.10.30.00,
> X2000.01.15.10.30.00, ...
> 
> Any help will be greatly appreciated.
> 
> Sincerely,
> 
> Milu
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset Rasterbrick by time

2018-06-18 Thread Miluji Sb

 Dear all,

I have a rasterbrick with the date/time information provided which I would
like to subset by year.

However, when I use the following code for sub-setting;

new_brick <- subset(original, which(getZ( original ) >= as.Date("2000-01-01
10:30:00") & getZ(original ) <= as.Date("2014-12-31 10:30:00")))

The date/time information seems to be lost.

Furthermore, the class of the date/time seems to be character;

##
class(getZ( original ))
[1] "character"

Is it possible to convert this string to date before sub-setting or retain
the date/time information after sub-setting?

### original RasterBrick ###
class   : RasterBrick
dimensions  : 600, 1440, 864000, 11320  (nrow, ncol, ncell, nlayers)
resolution  : 0.25, 0.25  (x, y)
extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source :
/work/mm01117/GLDAS_025_deg/daily/gldas_tavg_tmin_tmax_precip_windspd_sphum_daily_1986_2016.nc4
names   : X1986.01.01.10.30.00, X1986.01.02.10.30.00,
X1986.01.03.10.30.00, X1986.01.04.10.30.00, X1986.01.05.10.30.00,
X1986.01.06.10.30.00, X1986.01.07.10.30.00, X1986.01.08.10.30.00,
X1986.01.09.10.30.00, X1986.01.10.10.30.00, X1986.01.11.10.30.00,
X1986.01.12.10.30.00, X1986.01.13.10.30.00, X1986.01.14.10.30.00,
X1986.01.15.10.30.00, ...
Date/time   : 1986-01-01 10:30:00, 2016-12-31 10:30:00 (min, max)
varname : v1

### new RasterBrick ###
class   : RasterStack
dimensions  : 600, 1440, 864000, 5477  (nrow, ncol, ncell, nlayers)
resolution  : 0.25, 0.25  (x, y)
extent  : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
names   : X2000.01.01.10.30.00, X2000.01.02.10.30.00,
X2000.01.03.10.30.00, X2000.01.04.10.30.00, X2000.01.05.10.30.00,
X2000.01.06.10.30.00, X2000.01.07.10.30.00, X2000.01.08.10.30.00,
X2000.01.09.10.30.00, X2000.01.10.10.30.00, X2000.01.11.10.30.00,
X2000.01.12.10.30.00, X2000.01.13.10.30.00, X2000.01.14.10.30.00,
X2000.01.15.10.30.00, ...

Any help will be greatly appreciated.

Sincerely,

Milu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Bert Gunter

 You realize, do you not, that in fact there are no numbers in your "list"
(actually a vector).

It looks like you would do well to spend some time with an R tutorial or
two before posting further to this list. We can help, but cannot substitute
for the basic knowledge that you would gain from doing this.

Cheers,

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Sep 25, 2017 at 4:30 AM, Shane Carey  wrote:

> Hi,
>
> Lets say this was a dataframe where I had two columns
>
> a <- c("<0.1", NA, 0.3, 5, "Nil")
> b <- c("<0.1", 1, 0.3, 5, "Nil")
>
> And I just want to remove the rows from the dataframe where there were NAs
> in the b column, what is the syntax for doing that?
>
> Thanks in advance
>
> On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey  wrote:
>
> > Super,
> >
> > Thanks
> >
> > On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe 
> > wrote:
> >
> >> > a <- c("<0.1", NA, 0.3, 5, "Nil")
> >> > a
> >> [1] "<0.1" NA "0.3"  "5""Nil"
> >>
> >> > b <- as.numeric(a)
> >> Warning message:
> >> NAs introduced by coercion
> >> > b
> >> [1]  NA  NA 0.3 5.0  NA
> >>
> >> > b[! is.na(b)]
> >> [1] 0.3 5.0
> >>
> >>
> >> B.
> >>
> >>
> >> > On Sep 22, 2017, at 11:48 AM, Shane Carey 
> wrote:
> >> >
> >> > Hi,
> >> >
> >> > How do I extract just numbers from the following list:
> >> >
> >> > a=c("<0.1",NA,0.3,5,Nil)
> >> >
> >> > so I want to obtain: 0.3 and 5 from the above list
> >> >
> >> > Thanks
> >> >
> >> >
> >> > --
> >> > Le gach dea ghui,
> >> > *Shane Carey*
> >> > *GIS and Data Solutions Consultant*
> >> >
> >> >   [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posti
> >> ng-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> > --
> > Le gach dea ghui,
> > *Shane Carey*
> > *GIS and Data Solutions Consultant*
> >
>
>
>
> --
> Le gach dea ghui,
> *Shane Carey*
> *GIS and Data Solutions Consultant*
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Shane Carey

Super, thanks Boris. Top notch :-)

On Mon, Sep 25, 2017 at 1:05 PM, Boris Steipe 
wrote:

> Always via logical expressions. In this case you can use the logical
> expression
>
> myDF$b  != "0"
>
> to give you a vector of TRUE/FALSE
>
>
>
> B.
>
>
> > On Sep 25, 2017, at 8:00 AM, Shane Carey  wrote:
> >
> > This is super, really helpfull. Sorry, one final question, lets say I
> wanted to remove 0's rather than NAs , what would it be?
> >
> > Thanks
> >
> > On Mon, Sep 25, 2017 at 12:41 PM, Boris Steipe 
> wrote:
> > myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"),
> >b = c("<0.1", 1, 0.3, 5, "Nil"),
> >stringsAsFactors = FALSE)
> >
> > # you can subset the b-column in several ways
> >
> > myDF[ , 2]
> > myDF[ , "b"]
> > myDF$b
> >
> > # using the column, you make a logical vector
> > ! is.na(as.numeric(myDF$b))
> >
> >
> > # This can be used to select the rows you want
> >
> > myDF[! is.na(as.numeric(myDF$b)), ]
> >
> >
> >
> > B.
> >
> >
> > > On Sep 25, 2017, at 7:30 AM, Shane Carey  wrote:
> > >
> > > Hi,
> > >
> > > Lets say this was a dataframe where I had two columns
> > >
> > > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > > b <- c("<0.1", 1, 0.3, 5, "Nil")
> > >
> > > And I just want to remove the rows from the dataframe where there were
> NAs in the b column, what is the syntax for doing that?
> > >
> > > Thanks in advance
> > >
> > > On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey 
> wrote:
> > > Super,
> > >
> > > Thanks
> > >
> > > On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe <
> boris.ste...@utoronto.ca> wrote:
> > > > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > > > a
> > > [1] "<0.1" NA "0.3"  "5""Nil"
> > >
> > > > b <- as.numeric(a)
> > > Warning message:
> > > NAs introduced by coercion
> > > > b
> > > [1]  NA  NA 0.3 5.0  NA
> > >
> > > > b[! is.na(b)]
> > > [1] 0.3 5.0
> > >
> > >
> > > B.
> > >
> > >
> > > > On Sep 22, 2017, at 11:48 AM, Shane Carey 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > How do I extract just numbers from the following list:
> > > >
> > > > a=c("<0.1",NA,0.3,5,Nil)
> > > >
> > > > so I want to obtain: 0.3 and 5 from the above list
> > > >
> > > > Thanks
> > > >
> > > >
> > > > --
> > > > Le gach dea ghui,
> > > > *Shane Carey*
> > > > *GIS and Data Solutions Consultant*
> > > >
> > > >   [[alternative HTML version deleted]]
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > >
> > > --
> > > Le gach dea ghui,
> > > Shane Carey
> > > GIS and Data Solutions Consultant
> > >
> > >
> > >
> > > --
> > > Le gach dea ghui,
> > > Shane Carey
> > > GIS and Data Solutions Consultant
> >
> >
> >
> >
> > --
> > Le gach dea ghui,
> > Shane Carey
> > GIS and Data Solutions Consultant
>
>


-- 
Le gach dea ghui,
*Shane Carey*
*GIS and Data Solutions Consultant*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Boris Steipe

Always via logical expressions. In this case you can use the logical expression

myDF$b  != "0"

to give you a vector of TRUE/FALSE 



B.


> On Sep 25, 2017, at 8:00 AM, Shane Carey  wrote:
> 
> This is super, really helpfull. Sorry, one final question, lets say I wanted 
> to remove 0's rather than NAs , what would it be?
> 
> Thanks
> 
> On Mon, Sep 25, 2017 at 12:41 PM, Boris Steipe  
> wrote:
> myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"),
>b = c("<0.1", 1, 0.3, 5, "Nil"),
>stringsAsFactors = FALSE)
> 
> # you can subset the b-column in several ways
> 
> myDF[ , 2]
> myDF[ , "b"]
> myDF$b
> 
> # using the column, you make a logical vector
> ! is.na(as.numeric(myDF$b))
> 
> 
> # This can be used to select the rows you want
> 
> myDF[! is.na(as.numeric(myDF$b)), ]
> 
> 
> 
> B.
> 
> 
> > On Sep 25, 2017, at 7:30 AM, Shane Carey  wrote:
> >
> > Hi,
> >
> > Lets say this was a dataframe where I had two columns
> >
> > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > b <- c("<0.1", 1, 0.3, 5, "Nil")
> >
> > And I just want to remove the rows from the dataframe where there were NAs 
> > in the b column, what is the syntax for doing that?
> >
> > Thanks in advance
> >
> > On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey  wrote:
> > Super,
> >
> > Thanks
> >
> > On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe  
> > wrote:
> > > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > > a
> > [1] "<0.1" NA "0.3"  "5""Nil"
> >
> > > b <- as.numeric(a)
> > Warning message:
> > NAs introduced by coercion
> > > b
> > [1]  NA  NA 0.3 5.0  NA
> >
> > > b[! is.na(b)]
> > [1] 0.3 5.0
> >
> >
> > B.
> >
> >
> > > On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
> > >
> > > Hi,
> > >
> > > How do I extract just numbers from the following list:
> > >
> > > a=c("<0.1",NA,0.3,5,Nil)
> > >
> > > so I want to obtain: 0.3 and 5 from the above list
> > >
> > > Thanks
> > >
> > >
> > > --
> > > Le gach dea ghui,
> > > *Shane Carey*
> > > *GIS and Data Solutions Consultant*
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> > --
> > Le gach dea ghui,
> > Shane Carey
> > GIS and Data Solutions Consultant
> >
> >
> >
> > --
> > Le gach dea ghui,
> > Shane Carey
> > GIS and Data Solutions Consultant
> 
> 
> 
> 
> -- 
> Le gach dea ghui,
> Shane Carey
> GIS and Data Solutions Consultant

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Shane Carey

This is super, really helpfull. Sorry, one final question, lets say I
wanted to remove 0's rather than NAs , what would it be?

Thanks

On Mon, Sep 25, 2017 at 12:41 PM, Boris Steipe 
wrote:

> myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"),
>b = c("<0.1", 1, 0.3, 5, "Nil"),
>stringsAsFactors = FALSE)
>
> # you can subset the b-column in several ways
>
> myDF[ , 2]
> myDF[ , "b"]
> myDF$b
>
> # using the column, you make a logical vector
> ! is.na(as.numeric(myDF$b))
>
>
> # This can be used to select the rows you want
>
> myDF[! is.na(as.numeric(myDF$b)), ]
>
>
>
> B.
>
>
> > On Sep 25, 2017, at 7:30 AM, Shane Carey  wrote:
> >
> > Hi,
> >
> > Lets say this was a dataframe where I had two columns
> >
> > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > b <- c("<0.1", 1, 0.3, 5, "Nil")
> >
> > And I just want to remove the rows from the dataframe where there were
> NAs in the b column, what is the syntax for doing that?
> >
> > Thanks in advance
> >
> > On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey 
> wrote:
> > Super,
> >
> > Thanks
> >
> > On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe 
> wrote:
> > > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > > a
> > [1] "<0.1" NA "0.3"  "5""Nil"
> >
> > > b <- as.numeric(a)
> > Warning message:
> > NAs introduced by coercion
> > > b
> > [1]  NA  NA 0.3 5.0  NA
> >
> > > b[! is.na(b)]
> > [1] 0.3 5.0
> >
> >
> > B.
> >
> >
> > > On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
> > >
> > > Hi,
> > >
> > > How do I extract just numbers from the following list:
> > >
> > > a=c("<0.1",NA,0.3,5,Nil)
> > >
> > > so I want to obtain: 0.3 and 5 from the above list
> > >
> > > Thanks
> > >
> > >
> > > --
> > > Le gach dea ghui,
> > > *Shane Carey*
> > > *GIS and Data Solutions Consultant*
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> > --
> > Le gach dea ghui,
> > Shane Carey
> > GIS and Data Solutions Consultant
> >
> >
> >
> > --
> > Le gach dea ghui,
> > Shane Carey
> > GIS and Data Solutions Consultant
>
>


-- 
Le gach dea ghui,
*Shane Carey*
*GIS and Data Solutions Consultant*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Boris Steipe

myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"),
   b = c("<0.1", 1, 0.3, 5, "Nil"),
   stringsAsFactors = FALSE)

# you can subset the b-column in several ways

myDF[ , 2]
myDF[ , "b"]
myDF$b

# using the column, you make a logical vector
! is.na(as.numeric(myDF$b))


# This can be used to select the rows you want

myDF[! is.na(as.numeric(myDF$b)), ]



B.


> On Sep 25, 2017, at 7:30 AM, Shane Carey  wrote:
> 
> Hi,
> 
> Lets say this was a dataframe where I had two columns
> 
> a <- c("<0.1", NA, 0.3, 5, "Nil")
> b <- c("<0.1", 1, 0.3, 5, "Nil")
> 
> And I just want to remove the rows from the dataframe where there were NAs in 
> the b column, what is the syntax for doing that?
> 
> Thanks in advance
> 
> On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey  wrote:
> Super,
> 
> Thanks
> 
> On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe  
> wrote:
> > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > a
> [1] "<0.1" NA "0.3"  "5""Nil"
> 
> > b <- as.numeric(a)
> Warning message:
> NAs introduced by coercion
> > b
> [1]  NA  NA 0.3 5.0  NA
> 
> > b[! is.na(b)]
> [1] 0.3 5.0
> 
> 
> B.
> 
> 
> > On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
> >
> > Hi,
> >
> > How do I extract just numbers from the following list:
> >
> > a=c("<0.1",NA,0.3,5,Nil)
> >
> > so I want to obtain: 0.3 and 5 from the above list
> >
> > Thanks
> >
> >
> > --
> > Le gach dea ghui,
> > *Shane Carey*
> > *GIS and Data Solutions Consultant*
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> -- 
> Le gach dea ghui,
> Shane Carey
> GIS and Data Solutions Consultant
> 
> 
> 
> -- 
> Le gach dea ghui,
> Shane Carey
> GIS and Data Solutions Consultant

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-25 Thread Shane Carey

Hi,

Lets say this was a dataframe where I had two columns

a <- c("<0.1", NA, 0.3, 5, "Nil")
b <- c("<0.1", 1, 0.3, 5, "Nil")

And I just want to remove the rows from the dataframe where there were NAs
in the b column, what is the syntax for doing that?

Thanks in advance

On Fri, Sep 22, 2017 at 5:04 PM, Shane Carey  wrote:

> Super,
>
> Thanks
>
> On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe 
> wrote:
>
>> > a <- c("<0.1", NA, 0.3, 5, "Nil")
>> > a
>> [1] "<0.1" NA "0.3"  "5""Nil"
>>
>> > b <- as.numeric(a)
>> Warning message:
>> NAs introduced by coercion
>> > b
>> [1]  NA  NA 0.3 5.0  NA
>>
>> > b[! is.na(b)]
>> [1] 0.3 5.0
>>
>>
>> B.
>>
>>
>> > On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
>> >
>> > Hi,
>> >
>> > How do I extract just numbers from the following list:
>> >
>> > a=c("<0.1",NA,0.3,5,Nil)
>> >
>> > so I want to obtain: 0.3 and 5 from the above list
>> >
>> > Thanks
>> >
>> >
>> > --
>> > Le gach dea ghui,
>> > *Shane Carey*
>> > *GIS and Data Solutions Consultant*
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Le gach dea ghui,
> *Shane Carey*
> *GIS and Data Solutions Consultant*
>



-- 
Le gach dea ghui,
*Shane Carey*
*GIS and Data Solutions Consultant*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-22 Thread Shane Carey

Super,

Thanks

On Fri, Sep 22, 2017 at 4:57 PM, Boris Steipe 
wrote:

> > a <- c("<0.1", NA, 0.3, 5, "Nil")
> > a
> [1] "<0.1" NA "0.3"  "5""Nil"
>
> > b <- as.numeric(a)
> Warning message:
> NAs introduced by coercion
> > b
> [1]  NA  NA 0.3 5.0  NA
>
> > b[! is.na(b)]
> [1] 0.3 5.0
>
>
> B.
>
>
> > On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
> >
> > Hi,
> >
> > How do I extract just numbers from the following list:
> >
> > a=c("<0.1",NA,0.3,5,Nil)
> >
> > so I want to obtain: 0.3 and 5 from the above list
> >
> > Thanks
> >
> >
> > --
> > Le gach dea ghui,
> > *Shane Carey*
> > *GIS and Data Solutions Consultant*
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Le gach dea ghui,
*Shane Carey*
*GIS and Data Solutions Consultant*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset

2017-09-22 Thread Boris Steipe

> a <- c("<0.1", NA, 0.3, 5, "Nil")
> a
[1] "<0.1" NA "0.3"  "5""Nil" 

> b <- as.numeric(a)
Warning message:
NAs introduced by coercion 
> b
[1]  NA  NA 0.3 5.0  NA

> b[! is.na(b)]
[1] 0.3 5.0


B.


> On Sep 22, 2017, at 11:48 AM, Shane Carey  wrote:
> 
> Hi,
> 
> How do I extract just numbers from the following list:
> 
> a=c("<0.1",NA,0.3,5,Nil)
> 
> so I want to obtain: 0.3 and 5 from the above list
> 
> Thanks
> 
> 
> -- 
> Le gach dea ghui,
> *Shane Carey*
> *GIS and Data Solutions Consultant*
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset

2017-09-22 Thread Shane Carey

Hi,

How do I extract just numbers from the following list:

a=c("<0.1",NA,0.3,5,Nil)

so I want to obtain: 0.3 and 5 from the above list

Thanks


-- 
Le gach dea ghui,
*Shane Carey*
*GIS and Data Solutions Consultant*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset()

2017-01-21 Thread Jim Lemon

Hi Elise,
One of the quirks of POSIXt time values is that they are lists. This
should give you the plot:

plot(Soil_Temp~as.numeric(DateTime),eldf,xaxt="n",xlab="DateTime")

and this the x axis:

axis.POSIXct(1,eldf$DateTime)

If you want a different format for the date values on the axis, look
at the "format" argument.

Jim


On Sun, Jan 22, 2017 at 2:32 PM, Elise LIKILIKI
 wrote:
> Hi Jim,
>
> I'm really sorry to bother you, but finally I have another problem, then
> when I try to make a plot
>>plot(Soil_Temp_Avg~DateTime,eldf2)
>
> I have an error message saying :
> Error in (function (formula, data = NULL, subset = NULL, na.action =
> na.fail, : invalid type (list) for variable 'DateTime'
>
>
> 2017-01-22 3:42 GMT+01:00 Elise LIKILIKI :
>>
>> Hi Jim,
>>
>> Thank you so much, it works with your method !! I'm going to be able to
>> process my data, thanks again !
>>
>> Regards,
>>
>> Elise
>>
>> 2017-01-21 23:32 GMT+01:00 Jim Lemon :
>>>
>>> Hi Elise,
>>> If I create a CSV file like your example and read it into a data frame:
>>>
>>> eldf<-read.csv("el.csv")
>>>
>>> Then convert the first field to POSIXt dates:
>>>
>>> eldf$DateTime<-strptime(eldf$DateTime,"%Y-%m-%d %H:%M:%S")
>>> class(eldf$DateTime)
>>> [1] "POSIXlt" "POSIXt"
>>>
>>> I can subset the file like this:
>>>
>>> time_after<-strptime("2017-01-09 18:00:00","%Y-%m-%d %H:%M:%S")
>>> > time_after
>>> [1] "2017-01-09 18:00:00 AEDT"
>>> > eldf[eldf$DateTime >= time_after,]
>>> DateTime RECORD PTemp PPFD_Avg Air_Temp_Avg RH_avg Soil_Temp
>>> 7  2017-01-09 18:00:00  6 21.26   -48.83   -38.49 -0.415
>>> 79
>>> 8  2017-01-09 18:15:00  7 21.21   -52.23   -39.00 -0.642
>>> 79
>>> 9  2017-01-09 18:30:00  8 21.12   -54.68   -39.41 -0.805
>>> 79
>>> 10 2017-01-09 18:45:00  9 21.04   -56.44   .39.74 -0.939
>>> 79
>>> 11 2017-01-09 19:00:00 10 20.99   -57.71   -40.01 -1.046
>>> 79
>>> 12 2017-01-09 19:15:00 11 20.91   -58.66   -40.25 -1.137
>>> 79
>>> 13 2017-01-09 19:30:00 12 21.83   -59.39   -40.46 -1.208
>>> 79
>>>
>>> Perhaps this will do what you want.
>>>
>>> No need to apologize for your English, I could not make myself
>>> understood in French.
>>>
>>> Jim
>>>
>>> On Sun, Jan 22, 2017 at 4:15 AM, Elise LIKILIKI
>>>  wrote:
>>> > Hi Jim,
>>> >
>>> > Yes exactly it returns "POSIXct" "POSIXt"
>>> > Find attached a screenshot showing my data in "data" object.
>>> > I don't need the data before 2017-01-10 11:00:00 nor columns : Records
>>> > and
>>> > Ptemp.
>>> > I've tried with subset() and with [ ] but I still have some rows
>>> > containing
>>> > data before 2017-01-10 11:00:00.
>>> >
>>> > I'm french so I am really sorry about my english
>>> >
>>> > 2017-01-21 11:41 GMT+01:00 Jim Lemon :
>>> >>
>>> >> Hi Elise,.
>>> >> I would ask:
>>> >>
>>> >> class(data$DateTime)
>>> >>
>>> >> and see if it returns:
>>> >>
>>> >> "POSIXct" "POSIXt"
>>> >>
>>> >> Jim
>>> >>
>>> >>
>>> >> On Sat, Jan 21, 2017 at 3:02 AM, Elise LIKILIKI
>>> >>  wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I have a dataset containing Date Time, Air Temperature, PPFD, Sol
>>> >> > Temperature...
>>> >> > The first data are false so I would like to extract the other ones.
>>> >> > I've tried :
>>> >> >>data1<-subset(data,DateTime>=as.POSIXct("2017-01-10
>>> >> > 11:00:00",format="%Y-%m-%d
>>> >> >
>>> >> >
>>> >> > %H:%M:%S"),select=c(DateTime,PPFD_Avg,Air_Temp_Avg,RH_Avg,Soil_Temp_Avg))
>>> >> > But I still have 4 rows with data from 2017-01-10 10:00:00 to
>>> >> > 2017-01-10
>>> >> > 10:45:00 and I don't understand why.
>>> >> >
>>> >> > Does anyone could help me please.
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Elise LIKILIKI
>>> >> >
>>> >> > [[alternative HTML version deleted]]
>>> >> >
>>> >> > __
>>> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> > PLEASE do read the posting guide
>>> >> > http://www.R-project.org/posting-guide.html
>>> >> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>
>>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset()

2017-01-21 Thread Jim Lemon

Hi Elise,
If I create a CSV file like your example and read it into a data frame:

eldf<-read.csv("el.csv")

Then convert the first field to POSIXt dates:

eldf$DateTime<-strptime(eldf$DateTime,"%Y-%m-%d %H:%M:%S")
class(eldf$DateTime)
[1] "POSIXlt" "POSIXt"

I can subset the file like this:

time_after<-strptime("2017-01-09 18:00:00","%Y-%m-%d %H:%M:%S")
> time_after
[1] "2017-01-09 18:00:00 AEDT"
> eldf[eldf$DateTime >= time_after,]
DateTime RECORD PTemp PPFD_Avg Air_Temp_Avg RH_avg Soil_Temp
7  2017-01-09 18:00:00  6 21.26   -48.83   -38.49 -0.41579
8  2017-01-09 18:15:00  7 21.21   -52.23   -39.00 -0.64279
9  2017-01-09 18:30:00  8 21.12   -54.68   -39.41 -0.80579
10 2017-01-09 18:45:00  9 21.04   -56.44   .39.74 -0.93979
11 2017-01-09 19:00:00 10 20.99   -57.71   -40.01 -1.04679
12 2017-01-09 19:15:00 11 20.91   -58.66   -40.25 -1.13779
13 2017-01-09 19:30:00 12 21.83   -59.39   -40.46 -1.20879

Perhaps this will do what you want.

No need to apologize for your English, I could not make myself
understood in French.

Jim

On Sun, Jan 22, 2017 at 4:15 AM, Elise LIKILIKI
 wrote:
> Hi Jim,
>
> Yes exactly it returns "POSIXct" "POSIXt"
> Find attached a screenshot showing my data in "data" object.
> I don't need the data before 2017-01-10 11:00:00 nor columns : Records and
> Ptemp.
> I've tried with subset() and with [ ] but I still have some rows containing
> data before 2017-01-10 11:00:00.
>
> I'm french so I am really sorry about my english
>
> 2017-01-21 11:41 GMT+01:00 Jim Lemon :
>>
>> Hi Elise,.
>> I would ask:
>>
>> class(data$DateTime)
>>
>> and see if it returns:
>>
>> "POSIXct" "POSIXt"
>>
>> Jim
>>
>>
>> On Sat, Jan 21, 2017 at 3:02 AM, Elise LIKILIKI
>>  wrote:
>> > Hello,
>> >
>> > I have a dataset containing Date Time, Air Temperature, PPFD, Sol
>> > Temperature...
>> > The first data are false so I would like to extract the other ones.
>> > I've tried :
>> >>data1<-subset(data,DateTime>=as.POSIXct("2017-01-10
>> > 11:00:00",format="%Y-%m-%d
>> >
>> > %H:%M:%S"),select=c(DateTime,PPFD_Avg,Air_Temp_Avg,RH_Avg,Soil_Temp_Avg))
>> > But I still have 4 rows with data from 2017-01-10 10:00:00 to 2017-01-10
>> > 10:45:00 and I don't understand why.
>> >
>> > Does anyone could help me please.
>> >
>> > Thanks,
>> >
>> > Elise LIKILIKI
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset()

2017-01-21 Thread Jim Lemon

Hi Elise,.
I would ask:

class(data$DateTime)

and see if it returns:

"POSIXct" "POSIXt"

Jim


On Sat, Jan 21, 2017 at 3:02 AM, Elise LIKILIKI
 wrote:
> Hello,
>
> I have a dataset containing Date Time, Air Temperature, PPFD, Sol
> Temperature...
> The first data are false so I would like to extract the other ones.
> I've tried :
>>data1<-subset(data,DateTime>=as.POSIXct("2017-01-10
> 11:00:00",format="%Y-%m-%d
> %H:%M:%S"),select=c(DateTime,PPFD_Avg,Air_Temp_Avg,RH_Avg,Soil_Temp_Avg))
> But I still have 4 rows with data from 2017-01-10 10:00:00 to 2017-01-10
> 10:45:00 and I don't understand why.
>
> Does anyone could help me please.
>
> Thanks,
>
> Elise LIKILIKI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset()

2017-01-20 Thread David Winsemius

How are we supposed to help you if you don’t read the Posting Guide and don’t 
provide any information about the classes of columns in `data`.?

— David.



> On Jan 20, 2017, at 10:02 AM, Elise LIKILIKI  wrote:
> 
> Hello,
> 
> I have a dataset containing Date Time, Air Temperature, PPFD, Sol
> Temperature...
> The first data are false so I would like to extract the other ones.
> I've tried :
>> data1<-subset(data,DateTime>=as.POSIXct("2017-01-10
> 11:00:00",format="%Y-%m-%d
> %H:%M:%S"),select=c(DateTime,PPFD_Avg,Air_Temp_Avg,RH_Avg,Soil_Temp_Avg))
> But I still have 4 rows with data from 2017-01-10 10:00:00 to 2017-01-10
> 10:45:00 and I don't understand why.
> 
> Does anyone could help me please.
> 
> Thanks,
> 
> Elise LIKILIKI
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

^^


> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset()

2017-01-20 Thread Elise LIKILIKI

Hello,

I have a dataset containing Date Time, Air Temperature, PPFD, Sol
Temperature...
The first data are false so I would like to extract the other ones.
I've tried :
>data1<-subset(data,DateTime>=as.POSIXct("2017-01-10
11:00:00",format="%Y-%m-%d
%H:%M:%S"),select=c(DateTime,PPFD_Avg,Air_Temp_Avg,RH_Avg,Soil_Temp_Avg))
But I still have 4 rows with data from 2017-01-10 10:00:00 to 2017-01-10
10:45:00 and I don't understand why.

Does anyone could help me please.

Thanks,

Elise LIKILIKI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and sumerize

2016-10-14 Thread Mark Sharp

Ashta,

## I may have misunderstood your question and if so I apologize.

## I had to remove the extra line after "45" before
## the ",sep=" to use your code.
## You could have used dput(dat) to send a more reliable (robust) version.
dat <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
x1 = structure(c(1L, 5L, 5L, 5L, 2L, 5L, 3L, 4L, 5L, 5L), .Label = c("a",
"d", "g", "h", "x"), class = "factor"), x2 = structure(c(1L,
6L, 1L, 4L, 6L, 6L, 5L, 2L, 6L, 3L), .Label = c("b", "e",
"g", "k", "t", "z"), class = "factor"), y = c(15L, 21L, 16L,
25L, 31L, 28L, 41L, 32L, 38L, 45L)), .Names = c("ID", "x1",
"x2", "y"), class = "data.frame", row.names = c(NA, -10L))

# In your proposed solution "newdat" is never defined yet you are using it as 
if it were.

## It is my understanding that your goal is to define newdat as a
## subset of dat where x1 == "x" and x2 == "z".
## This can be done with one line.

newdat <- dat[dat$x1 == "x" & dat$x2 == "z", ]
newdat

> On Oct 14, 2016, at 1:26 PM, Ashta  wrote:
>
> Hi all,
>
> I am trying to summarize  big data set  by   selecting a row
> conditionally. and tried  to do it in a loop
>
> Here is  the sample of my data and my attempt
>
> dat<-read.table(text=" ID,x1,x2,y
> 1,a,b,15
> 1,x,z,21
> 1,x,b,16
> 1,x,k,25
> 2,d,z,31
> 2,x,z,28
> 2,g,t,41
> 3,h,e,32
> 3,x,z,38
> 3,x,g,45
> ",sep=",",header=TRUE)
>
> For  each unique ID,  I want to select  a data when x1= "x" and x2="z"
> Here is the selected data (newdat)
> ID,x1,x2,y
> 1,x,z,21
> 2,x,z,28
> 3,x,z,38
>
> Then I want summarize  Y values and out put as follows
> Summerize
> summary(newdat[i])
> ##
> ID   Min. 1st Qu.  MedianMean 3rd Qu.Max.
> 1
> 2
> 3
> .
> .
> .
> 28
> 
>
> Here is my attempt but did not work,
>
> trt=c(1:28)
> for(i  in 1:length (trt))
> {
>  day[i]= newdat[which(newdat$ID== trt[i] &  newdat$x1 =="x" &
> newdat$x2 =="z"),]
> NR[i]=dim(day[i])[1]
> print(paste("Number of Records  :", NR[i]))
> sm[i]=summary(day[i])
> }
>
> Thank you in advance
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and sumerize

2016-10-14 Thread Sarah Goslee

For the data you provide, it's simply:

summary(subset(dat, x1 == "x" & x2 == "z")$y)

Note that x1 and x2 are factors in your example.

We also don't know what you want to do if there are more than one
combination of that per ID, or if there ID values with no matching
rows.

Sarah

On Fri, Oct 14, 2016 at 2:26 PM, Ashta  wrote:
> Hi all,
>
> I am trying to summarize  big data set  by   selecting a row
> conditionally. and tried  to do it in a loop
>
> Here is  the sample of my data and my attempt
>
> dat<-read.table(text=" ID,x1,x2,y
> 1,a,b,15
> 1,x,z,21
> 1,x,b,16
> 1,x,k,25
> 2,d,z,31
> 2,x,z,28
> 2,g,t,41
> 3,h,e,32
> 3,x,z,38
> 3,x,g,45
> ",sep=",",header=TRUE)
>
> For  each unique ID,  I want to select  a data when x1= "x" and x2="z"
> Here is the selected data (newdat)
> ID,x1,x2,y
> 1,x,z,21
> 2,x,z,28
> 3,x,z,38
>
> Then I want summarize  Y values and out put as follows
> Summerize
> summary(newdat[i])
> ##
> ID   Min. 1st Qu.  MedianMean 3rd Qu.Max.
> 1
> 2
> 3
> .
> .
> .
> 28
> 
>
> Here is my attempt but did not work,
>
> trt=c(1:28)
> for(i  in 1:length (trt))
> {
>   day[i]= newdat[which(newdat$ID== trt[i] &  newdat$x1 =="x" &
> newdat$x2 =="z"),]
> NR[i]=dim(day[i])[1]
> print(paste("Number of Records  :", NR[i]))
> sm[i]=summary(day[i])
> }
>
> Thank you in advance
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset and sumerize

2016-10-14 Thread Ashta

Hi all,

I am trying to summarize  big data set  by   selecting a row
conditionally. and tried  to do it in a loop

Here is  the sample of my data and my attempt

dat<-read.table(text=" ID,x1,x2,y
1,a,b,15
1,x,z,21
1,x,b,16
1,x,k,25
2,d,z,31
2,x,z,28
2,g,t,41
3,h,e,32
3,x,z,38
3,x,g,45
",sep=",",header=TRUE)

For  each unique ID,  I want to select  a data when x1= "x" and x2="z"
Here is the selected data (newdat)
ID,x1,x2,y
1,x,z,21
2,x,z,28
3,x,z,38

Then I want summarize  Y values and out put as follows
Summerize
summary(newdat[i])
##
ID   Min. 1st Qu.  MedianMean 3rd Qu.Max.
1
2
3
.
.
.
28


Here is my attempt but did not work,

trt=c(1:28)
for(i  in 1:length (trt))
{
  day[i]= newdat[which(newdat$ID== trt[i] &  newdat$x1 =="x" &
newdat$x2 =="z"),]
NR[i]=dim(day[i])[1]
print(paste("Number of Records  :", NR[i]))
sm[i]=summary(day[i])
}

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data right

2016-05-27 Thread William Dunlap via R-help

>If you want to drop levels, use droplevels() either on the factor or on
the >subset of your data frame. Example:
>droplevels(f[1]) #One element, only one level

Calling factor() on a factor, as the OP did, also drops any unused levels,
as the examples showed.

> str(factor(factor(letters)[11:13]))
 Factor w/ 3 levels "k","l","m": 1 2 3
> str(droplevels(factor(letters)[11:13]))
 Factor w/ 3 levels "k","l","m": 1 2 3

Using droplevels instead of factor does make the intent clearer and
droplevels works on data.frames.





Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, May 27, 2016 at 3:37 AM, S Ellison  wrote:

> > You did not change df$quant - you made a new object called 'subdf'
> > containing a column called 'quant' that had only one level.  Changing
> subdf has
> > no effect on df.
>
> Also, subsetting a factor _intentionally_ does not change the number of
> levels. Example:
> f <- factor(sample(letters[1:3], 30, replace=TRUE))
> f[1]  #One element, still three levels
>
> If you want to drop levels, use droplevels() either on the factor or on
> the subset of your data frame. Example:
> droplevels(f[1]) #One element, only one level
>
>
> Also worth noting that df is a function.
>  > df <- data.frame(quant=factor(letters))
> looks very like you're assigning a data frame to the function 'df'
> (density for the F distribution)
> It doesn't, because R is clever. But it's really not good practice to use
> common function names as variable names. Too much potential for confusion.
>
> S Ellison
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:13}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data right

2016-05-27 Thread S Ellison

> You did not change df$quant - you made a new object called 'subdf'
> containing a column called 'quant' that had only one level.  Changing subdf 
> has
> no effect on df.

Also, subsetting a factor _intentionally_ does not change the number of levels. 
Example:
f <- factor(sample(letters[1:3], 30, replace=TRUE))
f[1]  #One element, still three levels

If you want to drop levels, use droplevels() either on the factor or on the 
subset of your data frame. Example: 
droplevels(f[1]) #One element, only one level


Also worth noting that df is a function.
 > df <- data.frame(quant=factor(letters))
looks very like you're assigning a data frame to the function 'df' (density for 
the F distribution)
It doesn't, because R is clever. But it's really not good practice to use 
common function names as variable names. Too much potential for confusion.

S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data right

2016-05-26 Thread William Dunlap via R-help

You did not change df$quant - you made a new object called 'subdf'
containing a column called 'quant' that had only one level.  Changing
subdf has no effect on df.

> df <- data.frame(quant=factor(letters))
> str(df)
'data.frame':   26 obs. of  1 variable:
 $ quant: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
> subdf <- subset(df, quant %in% "a")
> subdf$quant <- factor(subdf$quant)
> str(df)
'data.frame':   26 obs. of  1 variable:
 $ quant: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(subdf)
'data.frame':   1 obs. of  1 variable:
 $ quant: Factor w/ 1 level "a": 1



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 26, 2016 at 12:35 PM, ch.elahe via R-help 
wrote:

> Hi all,
> I have the following df and I want to know which Protocols are VeryFast,
> which are FAST, which are SLOW and also which ones are VerySLOW :
>
>
>   $ Protocol   : Factor w/ 48 levels "DP FS QTSE SAG",..: 5 5 28 5 5 5
> 7 7 47 5 ...
>
>   $ quant  : Factor w/ 4 levels "FAST","SLOW",..: 2 2 2 4 2 1 1 2 4
>
> I do the following subset but nothing is changed in my df:
>
>
>   subdf=subset(df,quant%in%c("VeryFast"))
>   subdf$quant=factor(subdf$quant)
> and when I get the str(df) again Protocol has 48 levels. Does anyone know
> how can I get these subsets right?
> Thanks for any help!
> Elahe
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset data right

2016-05-26 Thread ruipbarradas

Hello,

Don't use subset, use indexing.

subdf <- df[df$quant %in% "VeryFast", ]

By the way, instead of %in% you can use ==, since you're interested in  
just one value of quant.

Hope this helps,

Rui Barradas

Citando ch.elahe via R-help :

> Hi all,
> I have the following df and I want to know which Protocols are  
> VeryFast, which are FAST, which are SLOW and also which ones are  
> VerySLOW :
>
> $ Protocol       : Factor w/ 48 levels "DP FS QTSE SAG",..: 5 5 28 5  
> 5 5 7 7 47 5 ...
>
> $ quant          : Factor w/ 4 levels "FAST","SLOW",..: 2 2 2 4 2 1 1 2 4
>
> I do the following subset but nothing is changed in my df:
>
> subdf=subset(df,quant%in%c("VeryFast"))
> subdf$quant=factor(subdf$quant)
> and when I get the str(df) again Protocol has 48 levels. Does anyone  
> know how can I get these subsets right?
> Thanks for any help!
> Elahe
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset data right

2016-05-26 Thread ch.elahe via R-help

Hi all,
I have the following df and I want to know which Protocols are VeryFast, which 
are FAST, which are SLOW and also which ones are VerySLOW :


  $ Protocol   : Factor w/ 48 levels "DP FS QTSE SAG",..: 5 5 28 5 5 5 7 7 
47 5 ...

  $ quant  : Factor w/ 4 levels "FAST","SLOW",..: 2 2 2 4 2 1 1 2 4 

I do the following subset but nothing is changed in my df:


  subdf=subset(df,quant%in%c("VeryFast"))
  subdf$quant=factor(subdf$quant)
and when I get the str(df) again Protocol has 48 levels. Does anyone know how 
can I get these subsets right?
Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset by multiple letters condition

2016-04-23 Thread Adams, Jean

This is quite a different question.  I suggest you start a new post with a
new subject line for this.  And I suggest you include code for an example
plot that you want to use.

Otherwise, you might look here for some ideas on how to control colors in a
scatter plot using base r, http://stackoverflow.com/a/8475315

Jean

On Sat, Apr 23, 2016 at 11:09 AM,  wrote:

> Thanks Jean, Does anyone know how to set these [hast1] and [hast2] as the
> colors of a plot?
>
>
> On Friday, April 22, 2016 7:39 AM, "Adams, Jean"  wrote:
>
>
>
> You can use the grepl() function to give you logicals for each criterion,
> then combine them as needed.  For example:
>
> # example version of Command
> Command <- paste0("_localize_", c("PD","t2","t1_seq", "abc", "xyz",
> "PD_t1"))
>
> hasPD <- grepl("PD", Command, fixed=TRUE)
> hast1 <- grepl("t1", Command, fixed=TRUE)
> hast2 <- grepl("t2", Command, fixed=TRUE)
>
> > Command[hast1]
> [1] "_localize_t1_seq" "_localize_PD_t1"
>
> > Command[hasPD]
> [1] "_localize_PD""_localize_PD_t1"
>
> > Command[hast1 & hasPD]
> [1] "_localize_PD_t1"
>
> Jean
>
>
> On Fri, Apr 22, 2016 at 8:42 AM, ch.elahe via R-help 
> wrote:
>
>
> >Hi all,
> >
> >I have a data frame df and I want to do subset based on several
> conditions of letters of the names in Command.1)if the names contain PD
> 2)if the names contain t1 3)if the names contain t2 4)if the names contain
> t1 and PD 5)if the names contain t2 and PD 6)otherwise the names would be
> unknown. I don't know how to use grep for all these conditions.
> >
> >   'data.frame': 36919 obs. of 162 variables
> >   $TE:int 38,41,11,52,48,75,.
> >   $Command   :factor W/2229 levels
> "_localize_PD","_localize_tre_t2","_localize_t1_seq",...
> >
> >
> >Thanks for any help
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset by multiple letters condition

2016-04-23 Thread ch.elahe via R-help

Thanks Jean, Does anyone know how to set these [hast1] and [hast2] as the 
colors of a plot?  

On Friday, April 22, 2016 7:39 AM, "Adams, Jean"  wrote:

You can use the grepl() function to give you logicals for each criterion, then 
combine them as needed.  For example:

# example version of Command
Command <- paste0("_localize_", c("PD","t2","t1_seq", "abc", "xyz", "PD_t1"))

hasPD <- grepl("PD", Command, fixed=TRUE)
hast1 <- grepl("t1", Command, fixed=TRUE)
hast2 <- grepl("t2", Command, fixed=TRUE)

> Command[hast1]
[1] "_localize_t1_seq" "_localize_PD_t1" 

> Command[hasPD]
[1] "_localize_PD""_localize_PD_t1"

> Command[hast1 & hasPD]
[1] "_localize_PD_t1"

Jean

On Fri, Apr 22, 2016 at 8:42 AM, ch.elahe via R-help  
wrote:

>Hi all,
>
>I have a data frame df and I want to do subset based on several conditions of 
>letters of the names in Command.1)if the names contain PD 2)if the names 
>contain t1 3)if the names contain t2 4)if the names contain t1 and PD 5)if the 
>names contain t2 and PD 6)otherwise the names would be unknown. I don't know 
>how to use grep for all these conditions.
>
>   'data.frame': 36919 obs. of 162 variables
>   $TE:int 38,41,11,52,48,75,.
>   $Command   :factor W/2229 levels 
> "_localize_PD","_localize_tre_t2","_localize_t1_seq",...
>
>
>Thanks for any help
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset by multiple letters condition

2016-04-22 Thread Adams, Jean

You can use the grepl() function to give you logicals for each criterion,
then combine them as needed.  For example:

# example version of Command
Command <- paste0("_localize_", c("PD","t2","t1_seq", "abc", "xyz",
"PD_t1"))

hasPD <- grepl("PD", Command, fixed=TRUE)
hast1 <- grepl("t1", Command, fixed=TRUE)
hast2 <- grepl("t2", Command, fixed=TRUE)

> Command[hast1]
[1] "_localize_t1_seq" "_localize_PD_t1"

> Command[hasPD]
[1] "_localize_PD""_localize_PD_t1"

> Command[hast1 & hasPD]
[1] "_localize_PD_t1"

Jean

On Fri, Apr 22, 2016 at 8:42 AM, ch.elahe via R-help 
wrote:

>
> Hi all,
>
> I have a data frame df and I want to do subset based on several conditions
> of letters of the names in Command.1)if the names contain PD 2)if the names
> contain t1 3)if the names contain t2 4)if the names contain t1 and PD 5)if
> the names contain t2 and PD 6)otherwise the names would be unknown. I don't
> know how to use grep for all these conditions.
>
>'data.frame': 36919 obs. of 162 variables
>$TE:int 38,41,11,52,48,75,.
>$Command   :factor W/2229 levels
> "_localize_PD","_localize_tre_t2","_localize_t1_seq",...
>
>
> Thanks for any help
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset by multiple letters condition

2016-04-22 Thread Giorgio Garziano

You may investigate a solution based on regular expressions.

Some tutorials to help:

http://www.regular-expressions.info/rlanguage.html

http://www.endmemo.com/program/R/grep.php

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf

https://rstudio-pubs-static.s3.amazonaws.com/74603_76cd14d5983f47408fdf0b323550b846.html

http://stat545.com/block022_regular-expression.html

https://www.youtube.com/watch?v=q8SzNKib5-4


--

Best,

GG





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset by multiple letters condition

2016-04-22 Thread ch.elahe via R-help


Hi all, 

I have a data frame df and I want to do subset based on several conditions of 
letters of the names in Command.1)if the names contain PD 2)if the names 
contain t1 3)if the names contain t2 4)if the names contain t1 and PD 5)if the 
names contain t2 and PD 6)otherwise the names would be unknown. I don't know 
how to use grep for all these conditions. 
 
   'data.frame': 36919 obs. of 162 variables
   $TE:int 38,41,11,52,48,75,.
   $Command   :factor W/2229 levels 
"_localize_PD","_localize_tre_t2","_localize_t1_seq",...
 
 
Thanks for any help

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset with missing argument within a function

2016-02-05 Thread William Dunlap via R-help

R's subscripting operators do not "guess" the value of a missing
argument: a missing k'th subscript means seq_len(dim(x)[k]).
I bet that you use syntax like x[,1] (the entire first column of x)
all the time and that you don't want this syntax to go away.

Some languages use a placeholder like '.' or '*' to do this.  Perhaps
S should have, but it is now late to make such a change.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Feb 4, 2016 at 11:23 PM, Stefano de Pretis <ste.depo@gmail.com>
wrote:

> Thanks Bill,
>
> This is more clear.
>
> In any case, I find very inappropriate that a programming language tries
> to guess the value of a missing argument. It is unfair towards code
> developers and it promotes the production of bugged piece of software.
>
> I hope R will revise its policies sooner or later.
>
> Thanks for the discussion,
>
> Stefano
>
>
>
>
>
>
>
> 2016-02-04 18:19 GMT+01:00 William Dunlap <wdun...@tibco.com>:
>
>> The "missingness" of an argument gets passed down through nested function
>> calls.  E.g.,
>>   fOuter <- function(x) c(outerMissing=missing(x), innerMissing=fInner(x))
>>   fInner <- function(x) missing(x)
>>   fInner()
>>   #[1] TRUE
>>   fOuter()
>>   #outerMissing innerMissing
>>   #  TRUE TRUE
>> It is only when a function evaluates an argument that you get a message
>> like 'argument is missing, with no default'.  ('[' checks for missingness
>> before
>> evaluating a subscript argument so it will not give that error.)
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Feb 4, 2016 at 7:47 AM, Stefano de Pretis <ste.depo@gmail.com
>> > wrote:
>>
>>> Hi Petr,
>>>
>>> Thank you for your answer.
>>>
>>> I'm not sure how the empty index reflects what I'm showing in my example.
>>> If my function was
>>>
>>> emptySubset <- function(vec) vec[]
>>>
>>> I would then agree that this was the case. But I think it's different:
>>> I'm
>>> specifically telling my function that it should have two arguments ("vec"
>>> and "ix")
>>>
>>> subsettingFun <- function(vec, ix) vec[ix]
>>>
>>> and I guess why, within the function, it does not happen what happens on
>>> the command line:
>>>
>>> > ix
>>> Error: object 'ix' not found
>>> > letters[ix]
>>> Error: object 'ix' not found
>>>
>>> My "expectation" came from a matter of coherence, but probably I'm still
>>> missing something.
>>>
>>> Regards,
>>>
>>> Stefano
>>>
>>>
>>>
>>> 2016-02-04 15:39 GMT+01:00 PIKAL Petr <petr.pi...@precheza.cz>:
>>>
>>> > Hi
>>> >
>>> > Help page for ?"[" says
>>> >
>>> > An empty index selects all values: this is most often used to replace
>>> all
>>> > the entries but keep the attributes.
>>> >
>>> > and actually you function construction works with empty index
>>> >
>>> > > x<-c(1,2,5)
>>> > > letters[x]
>>> > [1] "a" "b" "e"
>>> > > letters[]
>>> >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
>>> "q"
>>> > "r" "s"
>>> > [20] "t" "u" "v" "w" "x" "y" "z"
>>> >
>>> > It is sometimes useful not "expect" the program behavior but "inspect"
>>> why
>>> > it behaves differently.
>>> >
>>> > If you want your function to throw error when some arguments are
>>> missing
>>> > you need to do the check yourself and not rely on programming language.
>>> >
>>> > And BTW I did not know an answer before I inspected docs.
>>> >
>>> > Cheers
>>> > Petr
>>> >
>>> >
>>> > > -Original Message-
>>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
>>> Stefano
>>> > > de Pretis
>>> > > Sent: Thursday, February 04, 2016 11:00 AM
>>> > > To: r-help@r-project.

Re: [R] Subset with missing argument within a function

2016-02-04 Thread Stefano de Pretis

Thanks Bill,

This is more clear.

In any case, I find very inappropriate that a programming language tries to
guess the value of a missing argument. It is unfair towards code developers
and it promotes the production of bugged piece of software.

I hope R will revise its policies sooner or later.

Thanks for the discussion,

Stefano







2016-02-04 18:19 GMT+01:00 William Dunlap <wdun...@tibco.com>:

> The "missingness" of an argument gets passed down through nested function
> calls.  E.g.,
>   fOuter <- function(x) c(outerMissing=missing(x), innerMissing=fInner(x))
>   fInner <- function(x) missing(x)
>   fInner()
>   #[1] TRUE
>   fOuter()
>   #outerMissing innerMissing
>   #  TRUE TRUE
> It is only when a function evaluates an argument that you get a message
> like 'argument is missing, with no default'.  ('[' checks for missingness
> before
> evaluating a subscript argument so it will not give that error.)
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Feb 4, 2016 at 7:47 AM, Stefano de Pretis <ste.depo@gmail.com>
> wrote:
>
>> Hi Petr,
>>
>> Thank you for your answer.
>>
>> I'm not sure how the empty index reflects what I'm showing in my example.
>> If my function was
>>
>> emptySubset <- function(vec) vec[]
>>
>> I would then agree that this was the case. But I think it's different: I'm
>> specifically telling my function that it should have two arguments ("vec"
>> and "ix")
>>
>> subsettingFun <- function(vec, ix) vec[ix]
>>
>> and I guess why, within the function, it does not happen what happens on
>> the command line:
>>
>> > ix
>> Error: object 'ix' not found
>> > letters[ix]
>> Error: object 'ix' not found
>>
>> My "expectation" came from a matter of coherence, but probably I'm still
>> missing something.
>>
>> Regards,
>>
>> Stefano
>>
>>
>>
>> 2016-02-04 15:39 GMT+01:00 PIKAL Petr <petr.pi...@precheza.cz>:
>>
>> > Hi
>> >
>> > Help page for ?"[" says
>> >
>> > An empty index selects all values: this is most often used to replace
>> all
>> > the entries but keep the attributes.
>> >
>> > and actually you function construction works with empty index
>> >
>> > > x<-c(1,2,5)
>> > > letters[x]
>> > [1] "a" "b" "e"
>> > > letters[]
>> >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
>> > "r" "s"
>> > [20] "t" "u" "v" "w" "x" "y" "z"
>> >
>> > It is sometimes useful not "expect" the program behavior but "inspect"
>> why
>> > it behaves differently.
>> >
>> > If you want your function to throw error when some arguments are missing
>> > you need to do the check yourself and not rely on programming language.
>> >
>> > And BTW I did not know an answer before I inspected docs.
>> >
>> > Cheers
>> > Petr
>> >
>> >
>> > > -Original Message-
>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
>> Stefano
>> > > de Pretis
>> > > Sent: Thursday, February 04, 2016 11:00 AM
>> > > To: r-help@r-project.org
>> > > Subject: [R] Subset with missing argument within a function
>> > >
>> > > Hi all,
>> > >
>> > > I'm guessing what's the rationale behind this:
>> > >
>> > > > subsettingFun <- function(vec, ix) vec[ix]
>> > > > subsettingFun(letters, c(1,2,5))
>> > > [1] "a" "b" "e"
>> > > > subsettingFun(letters)
>> > >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
>> > > "q"
>> > > "r" "s"
>> > > [20] "t" "u" "v" "w" "x" "y" "z"
>> > >
>> > > If the argument "ix" is missing, I'm expecting an error not to return

[R] Subset with missing argument within a function

2016-02-04 Thread Stefano de Pretis

Hi all,

I'm guessing what's the rationale behind this:

> subsettingFun <- function(vec, ix) vec[ix]
> subsettingFun(letters, c(1,2,5))
[1] "a" "b" "e"
> subsettingFun(letters)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
"r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

If the argument "ix" is missing, I'm expecting an error not to return the
variable "vec" as it is.

I think this is VERY dangerous and does not help the development of
reliable code and the debugging.

Cheers,

Stefano

*Center for Genomic Science of IIT@SEMM*

Stefano de Pretis, PhD

*Postdoctoral fellow *

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset with missing argument within a function

2016-02-04 Thread PIKAL Petr

Hi

Help page for ?"[" says

An empty index selects all values: this is most often used to replace all the 
entries but keep the attributes.

and actually you function construction works with empty index

> x<-c(1,2,5)
> letters[x]
[1] "a" "b" "e"
> letters[]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

It is sometimes useful not "expect" the program behavior but "inspect" why it 
behaves differently.

If you want your function to throw error when some arguments are missing you 
need to do the check yourself and not rely on programming language.

And BTW I did not know an answer before I inspected docs.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Stefano
> de Pretis
> Sent: Thursday, February 04, 2016 11:00 AM
> To: r-help@r-project.org
> Subject: [R] Subset with missing argument within a function
>
> Hi all,
>
> I'm guessing what's the rationale behind this:
>
> > subsettingFun <- function(vec, ix) vec[ix]
> > subsettingFun(letters, c(1,2,5))
> [1] "a" "b" "e"
> > subsettingFun(letters)
>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
> "q"
> "r" "s"
> [20] "t" "u" "v" "w" "x" "y" "z"
>
> If the argument "ix" is missing, I'm expecting an error not to return
> the
> variable "vec" as it is.
>
> I think this is VERY dangerous and does not help the development of
> reliable code and the debugging.
>
> Cheers,
>
> Stefano
>
> *Center for Genomic Science of IIT@SEMM*
>
> Stefano de Pretis, PhD
>
> *Postdoctoral fellow *
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately

Re: [R] Subset with missing argument within a function

2016-02-04 Thread William Dunlap via R-help

The "missingness" of an argument gets passed down through nested function
calls.  E.g.,
  fOuter <- function(x) c(outerMissing=missing(x), innerMissing=fInner(x))
  fInner <- function(x) missing(x)
  fInner()
  #[1] TRUE
  fOuter()
  #outerMissing innerMissing
  #  TRUE TRUE
It is only when a function evaluates an argument that you get a message
like 'argument is missing, with no default'.  ('[' checks for missingness
before
evaluating a subscript argument so it will not give that error.)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Feb 4, 2016 at 7:47 AM, Stefano de Pretis <ste.depo@gmail.com>
wrote:

> Hi Petr,
>
> Thank you for your answer.
>
> I'm not sure how the empty index reflects what I'm showing in my example.
> If my function was
>
> emptySubset <- function(vec) vec[]
>
> I would then agree that this was the case. But I think it's different: I'm
> specifically telling my function that it should have two arguments ("vec"
> and "ix")
>
> subsettingFun <- function(vec, ix) vec[ix]
>
> and I guess why, within the function, it does not happen what happens on
> the command line:
>
> > ix
> Error: object 'ix' not found
> > letters[ix]
> Error: object 'ix' not found
>
> My "expectation" came from a matter of coherence, but probably I'm still
> missing something.
>
> Regards,
>
> Stefano
>
>
>
> 2016-02-04 15:39 GMT+01:00 PIKAL Petr <petr.pi...@precheza.cz>:
>
> > Hi
> >
> > Help page for ?"[" says
> >
> > An empty index selects all values: this is most often used to replace all
> > the entries but keep the attributes.
> >
> > and actually you function construction works with empty index
> >
> > > x<-c(1,2,5)
> > > letters[x]
> > [1] "a" "b" "e"
> > > letters[]
> >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
> > "r" "s"
> > [20] "t" "u" "v" "w" "x" "y" "z"
> >
> > It is sometimes useful not "expect" the program behavior but "inspect"
> why
> > it behaves differently.
> >
> > If you want your function to throw error when some arguments are missing
> > you need to do the check yourself and not rely on programming language.
> >
> > And BTW I did not know an answer before I inspected docs.
> >
> > Cheers
> > Petr
> >
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> Stefano
> > > de Pretis
> > > Sent: Thursday, February 04, 2016 11:00 AM
> > > To: r-help@r-project.org
> > > Subject: [R] Subset with missing argument within a function
> > >
> > > Hi all,
> > >
> > > I'm guessing what's the rationale behind this:
> > >
> > > > subsettingFun <- function(vec, ix) vec[ix]
> > > > subsettingFun(letters, c(1,2,5))
> > > [1] "a" "b" "e"
> > > > subsettingFun(letters)
> > >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
> > > "q"
> > > "r" "s"
> > > [20] "t" "u" "v" "w" "x" "y" "z"
> > >
> > > If the argument "ix" is missing, I'm expecting an error not to return
> > > the
> > > variable "vec" as it is.
> > >
> > > I think this is VERY dangerous and does not help the development of
> > > reliable code and the debugging.
> > >
> > > Cheers,
> > >
> > > Stefano
> > >
> > > *Center for Genomic Science of IIT@SEMM*
> > >
> > > Stefano de Pretis, PhD
> > >
> > > *Postdoctoral fellow *
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html
>

Re: [R] Subset with missing argument within a function

2016-02-04 Thread Stefano de Pretis

Hi Petr,

Thank you for your answer.

I'm not sure how the empty index reflects what I'm showing in my example.
If my function was

emptySubset <- function(vec) vec[]

I would then agree that this was the case. But I think it's different: I'm
specifically telling my function that it should have two arguments ("vec"
and "ix")

subsettingFun <- function(vec, ix) vec[ix]

and I guess why, within the function, it does not happen what happens on
the command line:

> ix
Error: object 'ix' not found
> letters[ix]
Error: object 'ix' not found

My "expectation" came from a matter of coherence, but probably I'm still
missing something.

Regards,

Stefano



2016-02-04 15:39 GMT+01:00 PIKAL Petr <petr.pi...@precheza.cz>:

> Hi
>
> Help page for ?"[" says
>
> An empty index selects all values: this is most often used to replace all
> the entries but keep the attributes.
>
> and actually you function construction works with empty index
>
> > x<-c(1,2,5)
> > letters[x]
> [1] "a" "b" "e"
> > letters[]
>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
> "r" "s"
> [20] "t" "u" "v" "w" "x" "y" "z"
>
> It is sometimes useful not "expect" the program behavior but "inspect" why
> it behaves differently.
>
> If you want your function to throw error when some arguments are missing
> you need to do the check yourself and not rely on programming language.
>
> And BTW I did not know an answer before I inspected docs.
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Stefano
> > de Pretis
> > Sent: Thursday, February 04, 2016 11:00 AM
> > To: r-help@r-project.org
> > Subject: [R] Subset with missing argument within a function
> >
> > Hi all,
> >
> > I'm guessing what's the rationale behind this:
> >
> > > subsettingFun <- function(vec, ix) vec[ix]
> > > subsettingFun(letters, c(1,2,5))
> > [1] "a" "b" "e"
> > > subsettingFun(letters)
> >  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
> > "q"
> > "r" "s"
> > [20] "t" "u" "v" "w" "x" "y" "z"
> >
> > If the argument "ix" is missing, I'm expecting an error not to return
> > the
> > variable "vec" as it is.
> >
> > I think this is VERY dangerous and does not help the development of
> > reliable code and the debugging.
> >
> > Cheers,
> >
> > Stefano
> >
> > *Center for Genomic Science of IIT@SEMM*
> >
> > Stefano de Pretis, PhD
> >
> > *Postdoctoral fellow *
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout;
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležit

Re: [R] subset data using a vector

2015-11-24 Thread DIGHE, NILESH [AG/2362]

Jim & Michael:  I really appreciate your guidance in creating the function I 
wanted.  I took suggestions from both of you and was able to complete this 
function.  I had to split the process into two functions as listed below.
I just thought to send the results to the list in case someone might be 
interested in doing similar task in the future.
Thanks.
Nilesh

getcheckmeans<- function (dataset)
{
row_check_mean <- c()
dat1 <- data.frame()
check_mean <- c()
x <- length(dataset$plotid)
for (i in (1:x)) {
r1 <- dataset[i, 1]
r2 <- r1 - 1
r3 <- r1 + 1
r4 <- c(r1, r2, r3)
dat1 <- split(dataset, dataset$rows %in% r4)[[2]]
row_check_mean[i] <- tapply(dat1$yield, dat1$linecode,
mean, na.rm = TRUE)[1]
check_mean <- round(unlist(row_check_mean)[1:x], digits = 2)
}
check_mean
}


adjustdata<- function (dataset, trait, control)

{

check_mean <- getcheckmeans(dataset)

dat_check_mean <- as.data.frame(check_mean)

dataset <- cbind(dataset, dat_check_mean)

adj_yield <- c()

x <- length(trait)

for (i in 1:x) {

adj_yield[i] <- ifelse(control[i] == "variety", 
round(trait[i]/dataset$check_mean[i],

digits = 3), round(trait[i]/trait[i], digits = 3))

}

data.frame(dataset, adj_yield)

}


dat<- structure(list(rows = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,

4L, 4L, 4L, 4L, 4L, 4L), cols = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,

8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L,

12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 3L,

4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L,

15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,

1L), plotid = c(289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,

297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 369L, 370L, 371L,

372L, 373L, 374L, 375L, 376L, 377L, 378L, 379L, 380L, 381L, 382L,

383L, 384L, 385L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,

394L, 395L, 396L, 397L, 398L, 399L, 400L, 465L, 466L, 467L, 468L,

469L, 470L, 471L, 472L, 473L, 474L, 475L, 476L, 477L, 478L, 479L,

480L), yield = c(5.1, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5,

3.9, 4.6, 5, 4.4, 5.1, 4.3, 4.4, 4.2, 3.9, 4.6, 4.8, 5.4, 4.7,

5.5, 5.3, 4.8, 5.8, 4.6, 5.8, 5.5, 5.3, 5.6, 5.6, 5, 4.8, 4.9,

5.2, 5.3, 4.6, 4.8, 5.3, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5.4,

4.8, 4.6, 5.2, 4.9, 5.1, 4.5, 5.8, 5.2, 4.7, 4.8, 5.3, 5.8, 4.9,

5.9, 4.5), line = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,

9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,

1L, 21L, 22L, 1L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,

32L, 33L, 1L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 1L,

43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1L, 51L, 52L, 53L, 54L,

1L, 55L, 56L, 57L), .Label = c("CHK", "V002", "V003", "V004",

"V005", "V006", "V007", "V008", "V009", "V010", "V011", "V012",

"V013", "V014", "V015", "V016", "V017", "V018", "V019", "V020",

"V021", "V022", "V023", "V024", "V025", "V026", "V027", "V028",

"V029", "V030", "V031", "V032", "V033", "V034", "V035", "V036",

"V037", "V038", "V039", "V040", "V041", "V042", "V043", "V044",

"V045", "V046", "V047", "V048", "V049", "V050", "V051", "V052",

"V053", "V054", "V055", "V056", "V057"), class = "factor"), linecode = 
structure(c(1L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("check",

"variety"), class = "factor")), .Names = c("rows", "cols", "plotid",

"yield", "line", "linecode"), class = "data.frame", row.names = c(NA,

-64L))

From: Jim Lemon [mailto:drjimle...@gmail.com]
Sent: Tuesday, November 24, 2015 2:53 AM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] subset data using a vector

Hi Nilesh,
I simplified your code a bit:

fun1<-function (dataset, plot.id<http://plot.id>, ranges2use, cont

Re: [R] subset data using a vector

2015-11-24 Thread Jim Lemon

Hi Nilesh,
I simplified your code a bit:

fun1<-function (dataset, plot.id, ranges2use, control) {
 m1 <- strsplit(as.character(ranges2use), ",")
 dat1 <- data.frame()
 row_check_mean <- NA
 row_check_adj_yield <- NA
 x <- length(plot.id)
 for (i in 1:x) {
  cat(i,"\n")
  dat1 <- dataset[dataset$ranges %in% m1[[i]], ]
  row_check_mean[i] <- tapply(unlist(dat1$trait),unlist(dat1$control),
   mean, na.rm = TRUE)[1]
  row_check_adj_yield[i] <- ifelse(control[i] == "variety",
  trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])
 }
 data.frame(dataset, row_check_adj_yield)
}

 and got it to run down to this line:

row_check_mean[i]<-tapply(dat1$trait,dat1$control,mean,na.rm=TRUE)[1]

which generates the error:

Error in split.default(X, group) : first argument must be a vector

As far as I can see, there is no element in "mydata" named "trait" and
"control" is not an element of the local variable "dat1". I can't get past
this, but perhaps it will help you to sort it out.

Jim


On Tue, Nov 24, 2015 at 10:10 AM, DIGHE, NILESH [AG/2362] <
nilesh.di...@monsanto.com> wrote:

> Michael:  I tried using your suggestion of using length and still get the
> same error:
> Error in m1[[i]] : subscript out of bounds
>
> I also checked the length of m1 and x and they both are of same length
> (64).
>
> After trying several things, I was able to extract the list but this was
> done outside the function I am trying to create.
> Code that worked is listed below:
>
> for(i in (1:length(mydata$plotid))){
> v1<-as.numeric(strsplit(as.character(mydata$rangestouse),
> ",")[[i]])
> print(head(v1))}
>
> However, when I try to get this code in a function (fun3) listed below, I
> get the following error:
> Error in strsplit(as.character(dataset$ranges2use), ",")[[i]] :
>   subscript out of bounds
>
> fun3<- function (dataset, plot.id, ranges2use, control)
> {
> m1 <- c()
> x <- length(plot.id)
> for (i in (1:x)) {
> m1 <- as.numeric(strsplit(as.character(dataset$ranges2use),
> ",")[[i]])
> }
> m2
> }
>
> I am not sure where I am making a mistake.
> Thanks.
> Nilesh
>
> -Original Message-
> From: Michael Dewey [mailto:li...@dewey.myzen.co.uk]
> Sent: Monday, November 23, 2015 12:11 PM
> To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
> Subject: Re: [R] subset data using a vector
>
> Try looking at your function and work through what happens if the length
> is what I suggested.
>
>  >>   x <- length(plot.id)
>  >>
>  >>   for (i in (1:x)) {
>  >>
>  >>   m2[i] <- m1[[i]]
>
> So unless m1 has length at least x you are doomed.
>
> On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote:
> > Michael:  I like to use the actual range id's listed in column
> "rangestouse" to subset my data and not the length of that vector.
> >
> > Thanks.
> > Nilesh
> >
> > -Original Message-
> > From: Michael Dewey [mailto:li...@dewey.myzen.co.uk]
> > Sent: Monday, November 23, 2015 10:17 AM
> > To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
> > Subject: Re: [R] subset data using a vector
> >
> > length(strsplit(as.character(mydata$ranges2use), ","))
> >
> > was that what you expected? I think not.
> >
> > On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
> >> Dear R users,
> >>   I like to split my data by a vector created by using
> variable "ranges".  This vector will have the current range (ranges),
> preceding range (ranges - 1), and post range (ranges + 1) for a given
> plotid.  If the preceding or post ranges in this vector are outside the
> levels of ranges in the data set then I like to drop those ranges and only
> include the ranges that are available.  Variable "rangestouse" includes all
> the desired ranges I like to subset a given plotid.  After I subset these
> dataset using these desired ranges, then I like to extract the yield data
> for checks in those desired ranges and adjust yield of my data by dividing
> yield of a given plotid with the check average for the desired ranges.
> >>
> >> I have created this function (fun1) but when I run it, I get the
> following error:
> >>
> >> Error in m1[[i]] : subscript out of bounds
> >>
> >> Any help will be highly appreciated!
> >> Thanks, Nilesh
> >>
> >> Dataset:
> >> dput(mydata)
> >> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> >

Re: [R] subset data using a vector

2015-11-23 Thread Michael Dewey

Try looking at your function and work through what happens if the length 
is what I suggested.


>>   x <- length(plot.id)
>>
>>   for (i in (1:x)) {
>>
>>   m2[i] <- m1[[i]]

So unless m1 has length at least x you are doomed.

On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote:

Michael:  I like to use the actual range id's listed in column "rangestouse" to 
subset my data and not the length of that vector.

Thanks.
Nilesh

-Original Message-
From: Michael Dewey [mailto:li...@dewey.myzen.co.uk]
Sent: Monday, November 23, 2015 10:17 AM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] subset data using a vector

length(strsplit(as.character(mydata$ranges2use), ","))

was that what you expected? I think not.

On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:

Dear R users,
  I like to split my data by a vector created by using variable "ranges". 
 This vector will have the current range (ranges), preceding range (ranges - 1), and post range 
(ranges + 1) for a given plotid.  If the preceding or post ranges in this vector are outside the 
levels of ranges in the data set then I like to drop those ranges and only include the ranges that 
are available.  Variable "rangestouse" includes all the desired ranges I like to subset a 
given plotid.  After I subset these dataset using these desired ranges, then I like to extract the 
yield data for checks in those desired ranges and adjust yield of my data by dividing yield of a 
given plotid with the check average for the desired ranges.

I have created this function (fun1) but when I run it, I get the following 
error:

Error in m1[[i]] : subscript out of bounds

Any help will be highly appreciated!
Thanks, Nilesh

Dataset:
dput(mydata)
structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
  plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
  290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
  374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
  379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
  399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
  480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
  477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
  4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
  5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
  5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
  5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
  5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
  ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
  1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
  "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
  ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
  "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
"cols", "plotid", "yield", "linecode", "ranges", "rangestouse"

), class = "data.frame", row.names = c(NA, -64L))

Function:

fun1<- f

Re: [R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]

Michael:  I tried using your suggestion of using length and still get the same 
error:
Error in m1[[i]] : subscript out of bounds

I also checked the length of m1 and x and they both are of same length (64).

After trying several things, I was able to extract the list but this was done 
outside the function I am trying to create.
Code that worked is listed below:

for(i in (1:length(mydata$plotid))){
v1<-as.numeric(strsplit(as.character(mydata$rangestouse), ",")[[i]])
print(head(v1))}

However, when I try to get this code in a function (fun3) listed below, I get 
the following error:
Error in strsplit(as.character(dataset$ranges2use), ",")[[i]] : 
  subscript out of bounds

fun3<- function (dataset, plot.id, ranges2use, control) 
{
m1 <- c()
x <- length(plot.id)
for (i in (1:x)) {
m1 <- as.numeric(strsplit(as.character(dataset$ranges2use), 
",")[[i]])
}
m2
}

I am not sure where I am making a mistake.
Thanks.
Nilesh
 
-Original Message-
From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] 
Sent: Monday, November 23, 2015 12:11 PM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] subset data using a vector

Try looking at your function and work through what happens if the length is 
what I suggested.

 >>   x <- length(plot.id)
 >>
 >>   for (i in (1:x)) {
 >>
 >>   m2[i] <- m1[[i]]

So unless m1 has length at least x you are doomed.

On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote:
> Michael:  I like to use the actual range id's listed in column "rangestouse" 
> to subset my data and not the length of that vector.
>
> Thanks.
> Nilesh
>
> -Original Message-
> From: Michael Dewey [mailto:li...@dewey.myzen.co.uk]
> Sent: Monday, November 23, 2015 10:17 AM
> To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
> Subject: Re: [R] subset data using a vector
>
> length(strsplit(as.character(mydata$ranges2use), ","))
>
> was that what you expected? I think not.
>
> On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
>> Dear R users,
>>   I like to split my data by a vector created by using 
>> variable "ranges".  This vector will have the current range (ranges), 
>> preceding range (ranges - 1), and post range (ranges + 1) for a given 
>> plotid.  If the preceding or post ranges in this vector are outside the 
>> levels of ranges in the data set then I like to drop those ranges and only 
>> include the ranges that are available.  Variable "rangestouse" includes all 
>> the desired ranges I like to subset a given plotid.  After I subset these 
>> dataset using these desired ranges, then I like to extract the yield data 
>> for checks in those desired ranges and adjust yield of my data by dividing 
>> yield of a given plotid with the check average for the desired ranges.
>>
>> I have created this function (fun1) but when I run it, I get the following 
>> error:
>>
>> Error in m1[[i]] : subscript out of bounds
>>
>> Any help will be highly appreciated!
>> Thanks, Nilesh
>>
>> Dataset:
>> dput(mydata)
>> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
>> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
>> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
>> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
>> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 
>> 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 
>> 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
>> 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", 
>> "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = 
>> "factor"),
>>   plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
>>   290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
>>   374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
>>   379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
>>   399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
>>   480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
>>   477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
&g

Re: [R] subset data using a vector

2015-11-23 Thread Michael Dewey


length(strsplit(as.character(mydata$ranges2use), ","))

was that what you expected? I think not.

On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:

Dear R users,
 I like to split my data by a vector created by using variable "ranges".  
This vector will have the current range (ranges), preceding range (ranges - 1), and post range 
(ranges + 1) for a given plotid.  If the preceding or post ranges in this vector are outside the 
levels of ranges in the data set then I like to drop those ranges and only include the ranges that 
are available.  Variable "rangestouse" includes all the desired ranges I like to subset a 
given plotid.  After I subset these dataset using these desired ranges, then I like to extract the 
yield data for checks in those desired ranges and adjust yield of my data by dividing yield of a 
given plotid with the check average for the desired ranges.

I have created this function (fun1) but when I run it, I get the following 
error:

Error in m1[[i]] : subscript out of bounds

Any help will be highly appreciated!
Thanks, Nilesh

Dataset:
dput(mydata)
structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
"4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
 plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
 374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
 379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
 399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
 480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
 477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
 4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
 5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
 5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
 5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
 5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
 ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
 "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
 ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
 "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
"cols", "plotid", "yield", "linecode", "ranges", "rangestouse"

), class = "data.frame", row.names = c(NA, -64L))

Function:

fun1<- function (dataset, plot.id, ranges2use, control)

{

 m1 <- strsplit(as.character(dataset$ranges2use), ",")

 dat1 <- data.frame()

 m2 <- c()

 row_check_mean <- c()

 row_check_adj_yield <- c()

 x <- length(plot.id)

 for (i in (1:x)) {

 m2[i] <- m1[[i]]

 dat1 <- dataset[dataset$ranges %in% m2[i], ]

 row_check_mean[i] <- tapply(dat1$trait, dat1$control,

 mean, na.rm = TRUE)[1]

 row_check_adj_yield[i] <- ifelse(control[i] == "variety",

 trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])

 }

 data.frame(dataset, row_check_adj_yield)

}

Apply function:
fun1(mydata, plot.id=mydata$plotid, ranges2use = 
mydata$rangestouse,control=mydata$linecode)

Error:

Error in m1[[i]] : subscript out of bounds

Session info:

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1]

[R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]

Dear R users,
I like to split my data by a vector created by using variable 
"ranges".  This vector will have the current range (ranges), preceding range 
(ranges - 1), and post range (ranges + 1) for a given plotid.  If the preceding 
or post ranges in this vector are outside the levels of ranges in the data set 
then I like to drop those ranges and only include the ranges that are 
available.  Variable "rangestouse" includes all the desired ranges I like to 
subset a given plotid.  After I subset these dataset using these desired 
ranges, then I like to extract the yield data for checks in those desired 
ranges and adjust yield of my data by dividing yield of a given plotid with the 
check average for the desired ranges.

I have created this function (fun1) but when I run it, I get the following 
error:

Error in m1[[i]] : subscript out of bounds

Any help will be highly appreciated!
Thanks, Nilesh

Dataset:
dput(mydata)
structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
"4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
"variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
"1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
"cols", "plotid", "yield", "linecode", "ranges", "rangestouse"

), class = "data.frame", row.names = c(NA, -64L))

Function:

fun1<- function (dataset, plot.id, ranges2use, control)

{

m1 <- strsplit(as.character(dataset$ranges2use), ",")

dat1 <- data.frame()

m2 <- c()

row_check_mean <- c()

row_check_adj_yield <- c()

x <- length(plot.id)

for (i in (1:x)) {

m2[i] <- m1[[i]]

dat1 <- dataset[dataset$ranges %in% m2[i], ]

row_check_mean[i] <- tapply(dat1$trait, dat1$control,

mean, na.rm = TRUE)[1]

row_check_adj_yield[i] <- ifelse(control[i] == "variety",

trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])

}

data.frame(dataset, row_check_adj_yield)

}

Apply function:
fun1(mydata, plot.id=mydata$plotid, ranges2use = 
mydata$rangestouse,control=mydata$linecode)

Error:

Error in m1[[i]] : subscript out of bounds

Session info:

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



loaded via a namespace (and not attached):

 [1] magrittr_1.5plyr_1.8.3  tools_3.2.1 reshape2_1.4.1  
Rcpp_0.12.1

Re: [R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]

Michael:  I like to use the actual range id's listed in column "rangestouse" to 
subset my data and not the length of that vector.

Thanks.
Nilesh

-Original Message-
From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] 
Sent: Monday, November 23, 2015 10:17 AM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] subset data using a vector

length(strsplit(as.character(mydata$ranges2use), ","))

was that what you expected? I think not.

On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
> Dear R users,
>  I like to split my data by a vector created by using 
> variable "ranges".  This vector will have the current range (ranges), 
> preceding range (ranges - 1), and post range (ranges + 1) for a given plotid. 
>  If the preceding or post ranges in this vector are outside the levels of 
> ranges in the data set then I like to drop those ranges and only include the 
> ranges that are available.  Variable "rangestouse" includes all the desired 
> ranges I like to subset a given plotid.  After I subset these dataset using 
> these desired ranges, then I like to extract the yield data for checks in 
> those desired ranges and adjust yield of my data by dividing yield of a given 
> plotid with the check average for the desired ranges.
>
> I have created this function (fun1) but when I run it, I get the following 
> error:
>
> Error in m1[[i]] : subscript out of bounds
>
> Any help will be highly appreciated!
> Thanks, Nilesh
>
> Dataset:
> dput(mydata)
> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
> "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
>  plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
>  290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
>  374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
>  379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
>  399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
>  480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
>  477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
>  4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
>  5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
>  5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
>  5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
>  5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
>  ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>  1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
>  "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
>  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
>  ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
>  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
>  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
>  "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows", 
> "cols", "plotid", "yield", "linecode", "ranges", "rangestouse"
>
> ), class = "data.frame", row.names = c(NA, -64L))
>
> Function:
>
> fun1<- function (dataset, plot.id, ranges2use, c

Re: [R] Subset() within function: logical error

2015-06-29 Thread Rich Shepard


On Mon, 29 Jun 2015, David Winsemius wrote:


No. A pointer to the correct use of [ is needed.


  Thanks, David. This puts me on the the right path.

Much appreciated,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread Steve Taylor

Using return() within a for loop makes no sense: only the first one will be 
returned.

How about:
alldf.B = subset(alldf, stream=='B')  # etc...

Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you want 
to use a for loop on each unique value.

cheers,
Steve

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard
Sent: Tuesday, 30 June 2015 12:04p
To: r-help@r-project.org
Subject: [R] Subset() within function: logical error

   Moving from interactive use of R to scripts and functions and have bumped
into what I believe is a problem with variable names. Did not see a solution
in the two R programming books I have or from my Web searches. Inexperience
with ess-tracebug keeps me from refining my bug tracking.

   Here's a test data set (cleverly called 'testset.dput'):

structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), .Label = c(B, J, S), class = factor),
 sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
 8257, 8257, 8301, 8301, 8301), class = Date), param = structure(c(2L,
 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
 ), .Label = c(Ca, Cl, K, Mg, Na, SO4, pH), class = factor),
 quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
 7.73)), .Names = c(stream, sampdate, param, quant
), row.names = c(NA, -61L), class = data.frame)

   I want to subset that data.frame on each of the stream names: B, J, and S.
This is the function that has the naming error (eda.R):

extstream = function(alldf) {
 sname = alldf$stream
 sdate = alldf$sampdate
 comp = alldf$param
 value = alldf$quant
 for (i in sname) {
 sname - subset(alldf, alldf$stream, select = c(sdate, comp, value))
 return(sname)
 }
}

   This is the result of running source('eda.R') followed by

 extstream(testset)
Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,  :
   'subset' must be logical

   I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.

   A pointer to the correct syntax for subset() is needed.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread Rolf Turner



If you want a pointer to the correct syntax for subset(), try 
help(subset)!!!


The syntax of your extstream function is totally screwed up, 
convoluted and over-complicated. Note that even if you had your subset 
argument specified correctly, the return() call will give you only the 
result from the *first* pass through the for loop.


That aside, the error message is perfectly clear: 'subset' must be 
logical.  Your subset argument is stream which is a factor.


You *could* redefine your extstream function as follows:

function(alldf) {
sname - levels(alldf$stream)
rslt - vector(list,length(sname))
names(rslt) - sname
for (i in sname) {
   rslt[[i]] - subset(alldf, alldf$stream==i, sampdate:quant)
}
rslt
}

However you don't need to go through such contortions:

split(testset,testset$stream)

will give essentially what you want.  If you wish to strip out the 
redundant stream column from the data frames in the resulting list, 
you could do that using lapply()


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

On 30/06/15 12:03, Rich Shepard wrote:

   Moving from interactive use of R to scripts and functions and have
bumped
into what I believe is a problem with variable names. Did not see a
solution
in the two R programming books I have or from my Web searches. Inexperience
with ess-tracebug keeps me from refining my bug tracking.

   Here's a test data set (cleverly called 'testset.dput'):

structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label
= c(B, J, S), class = factor),
 sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
 8257, 8257, 8301, 8301, 8301), class = Date), param =
structure(c(2L,
 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
 ), .Label = c(Ca, Cl, K, Mg, Na, SO4, pH), class =
factor),
 quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
 7.73)), .Names = c(stream, sampdate, param, quant
), row.names = c(NA, -61L), class = data.frame)

   I want to subset that data.frame on each of the stream names: B, J,
and S.
This is the function that has the naming error (eda.R):

extstream = function(alldf) {
 sname = alldf$stream
 sdate = alldf$sampdate
 comp = alldf$param
 value = alldf$quant
 for (i in sname) {
 sname - subset(alldf, alldf$stream, select = c(sdate, comp,
value))
 return(sname)
 }
}

   This is the result of running source('eda.R') followed by


extstream(testset)

Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,  :
   'subset' must be logical

   I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.

   A pointer to the correct syntax for subset() is needed.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset() within function: logical error

2015-06-29 Thread Rich Shepard


  Moving from interactive use of R to scripts and functions and have bumped
into what I believe is a problem with variable names. Did not see a solution
in the two R programming books I have or from my Web searches. Inexperience
with ess-tracebug keeps me from refining my bug tracking.

  Here's a test data set (cleverly called 'testset.dput'):

structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), .Label = c(B, J, S), class = factor),

sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8301, 8301, 8301), class = Date), param = structure(c(2L,
6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
), .Label = c(Ca, Cl, K, Mg, Na, SO4, pH), class = factor),
quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
7.73)), .Names = c(stream, sampdate, param, quant
), row.names = c(NA, -61L), class = data.frame)

  I want to subset that data.frame on each of the stream names: B, J, and S.
This is the function that has the naming error (eda.R):

extstream = function(alldf) {
sname = alldf$stream
sdate = alldf$sampdate
comp = alldf$param
value = alldf$quant
for (i in sname) {
sname - subset(alldf, alldf$stream, select = c(sdate, comp, value))
return(sname)
}
}

  This is the result of running source('eda.R') followed by


extstream(testset)

Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,  :
  'subset' must be logical

  I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.

  A pointer to the correct syntax for subset() is needed.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread Rich Shepard


On Tue, 30 Jun 2015, Steve Taylor wrote:


Using return() within a for loop makes no sense: only the first one will be 
returned.


Steve,

  Mea culpa. Didn't catch that.


How about:
alldf.B = subset(alldf, stream=='B')  # etc...


  I used to do each stream manually, like the above, and want to learn how
to loop through all of them ...


Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you
want to use a for loop on each unique value.


  ... which unique() and levels() will probably do. Will test these tomorrow
after rading the man pages.

Many thanks,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread David Winsemius


On Jun 29, 2015, at 5:03 PM, Rich Shepard wrote:

  Moving from interactive use of R to scripts and functions and have bumped
 into what I believe is a problem with variable names. Did not see a solution
 in the two R programming books I have or from my Web searches. Inexperience
 with ess-tracebug keeps me from refining my bug tracking.
 
  Here's a test data set (cleverly called 'testset.dput'):
 
 structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(B, J, 
 S), class = factor),
sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8301, 8301, 8301), class = Date), param = structure(c(2L,
6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
), .Label = c(Ca, Cl, K, Mg, Na, SO4, pH), class = factor),
quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
7.73)), .Names = c(stream, sampdate, param, quant
 ), row.names = c(NA, -61L), class = data.frame)
 
  I want to subset that data.frame on each of the stream names: B, J, and S.
 This is the function that has the naming error (eda.R):
 
 extstream = function(alldf) {
sname = alldf$stream
sdate = alldf$sampdate
comp = alldf$param
value = alldf$quant
for (i in sname) {
sname - subset(alldf, alldf$stream, select = c(sdate, comp, value))


Never use the form dfrm$colname as the argument to the subset argument of 
subset. You can see that 'stream' is a factor, right? Perhaps 

Furthermore, by inspection you can see that there is no colname =='sdate', so I 
would guess that would be your next error. Or 'comp' or 'value' for that 
matter. Oh now I see, you made them outside of `alldf`. Then how is that 
supposed to work. The subset function is supposed to be looking inside `alldf` 
to find those column names.


Perhaps:

subset(alldf, stream %in% c('B', 'J', 'S'),   

   but have not figured out why you used 'subset' if you wanted: select = 
c(sdate, comp, value))


Furthermore, it is generally error prone to use `subset` inside functions. The 
help page warns against the practice. Better to use [.

return(sname)
}
 }
 
  This is the result of running source('eda.R') followed by
 
 extstream(testset)
 Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,  :
  'subset' must be logical
 
  I've tried using sname for the rows to select, but that produces a
 different error of trying to select undefined columns.

Right. Those are not column names in any dataframe.
 
  A pointer to the correct syntax for subset() is needed.

No. A pointer to the correct use of [ is needed.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread Rich Shepard


On Tue, 30 Jun 2015, Rolf Turner wrote:

If you want a pointer to the correct syntax for subset(), try 
help(subset)!!!


The syntax of your extstream function is totally screwed up, convoluted and 
over-complicated. Note that even if you had your subset argument specified 
correctly, the return() call will give you only the result from the *first* 
pass through the for loop.


That aside, the error message is perfectly clear: 'subset' must be logical. 
Your subset argument is stream which is a factor.


You *could* redefine your extstream function as follows:

function(alldf) {
   sname - levels(alldf$stream)
   rslt - vector(list,length(sname))
   names(rslt) - sname
   for (i in sname) {
  rslt[[i]] - subset(alldf, alldf$stream==i, sampdate:quant)
   }
   rslt
}

However you don't need to go through such contortions:

   split(testset,testset$stream)

will give essentially what you want.  If you wish to strip out the redundant 
stream column from the data frames in the resulting list, you could do that 
using lapply()


Rolf,

  I did re-read the subset man page, but did not associate the error message
with the problem.

  Thanks very much for the lesson. I will read the split() man page; simple
is always better.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset() within function: logical error

2015-06-29 Thread Jeff Newmiller

Well, your code is, ah, too incorrect to convey what you want out of this 
effort. If I were to guess based on your description, you want all of the data, 
not a subset. An example data frame containing what you hope to extract might 
be helpful.

However, extracting subsets is rarely done for just one subset... usually you 
want to process the data in groups. Base functions such as ave, aggregate, or 
split work at a higher level than you seem to be thinking. Packages such as 
plyr and dplyr handle this breaking and recombining more succinctly, leaving 
you to think more about what you want to do with the pieces and less about 
making pieces.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On June 29, 2015 5:03:38 PM PDT, Rich Shepard rshep...@appl-ecosys.com wrote:
Moving from interactive use of R to scripts and functions and have
bumped
into what I believe is a problem with variable names. Did not see a
solution
in the two R programming books I have or from my Web searches.
Inexperience
with ess-tracebug keeps me from refining my bug tracking.

   Here's a test data set (cleverly called 'testset.dput'):

structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), .Label = c(B, J, S), class = factor),
 sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8301, 8301, 8301), class = Date), param = structure(c(2L,
 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
), .Label = c(Ca, Cl, K, Mg, Na, SO4, pH), class =
factor),
 quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
 7.73)), .Names = c(stream, sampdate, param, quant
), row.names = c(NA, -61L), class = data.frame)

I want to subset that data.frame on each of the stream names: B, J, and
S.
This is the function that has the naming error (eda.R):

extstream = function(alldf) {
 sname = alldf$stream
 sdate = alldf$sampdate
 comp = alldf$param
 value = alldf$quant
 for (i in sname) {
   sname - subset(alldf, alldf$stream, select = c(sdate, comp, value))
 return(sname)
 }
}

   This is the result of running source('eda.R') followed by

 extstream(testset)
Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,
 :
   'subset' must be logical

   I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.

   A pointer to the correct syntax for subset() is needed.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset svydesign problem

2015-06-03 Thread Daniela Droguett

Hi,

it seems not possible to susbset a svydesign object (DBI svydesign) and use
variables to make the subset expression


uff-c(14,15)

for (i in 1:length(uff))
{

subpnad-subset(pnad, uf==uff[i]  v0302=='4')

}

the error is the following
Error in sqliteSendQuery(con, statement, bind.data) :
  error in statement: no such column: uff

I have tried to use eval to make the expression without sucess, then

for (i in 1:length(uff))
{

expr-eval(paste0(uf==,uff[i],  v0302=='4'))

subpnad-subset(pnad, expr)

}

complains that no such column: expr exists.

how to solve that?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and 0 replace?

2015-05-21 Thread William Dunlap

(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2,
 2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083,
 0.0043574552083, 0.0043574552083, 0.0043574552083,
 0.0043574552083, 0.0043574552083, 0.0043574552083,
 0.0043574552083, 0.0043574552083, 0.0043574552083
 ), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L,
 15L, 15L, 13L, 14L)), .Names = c(WgtBand, Wgt,
 SPCLORatingValue














 --
 From: wdun...@tibco.com
 Date: Wed, 20 May 2015 22:12:01 -0700
 Subject: Re: [R] Subset and 0 replace?
 To: newrnew...@hotmail.com
 CC: r-help@r-project.org


 Can you show a small self-contained example of you data and expected
 results?
 I tried to make one and your expression returned a single number in a 1 by
 1 matrix.

 library(doBy)
 Generation-list(
data.frame(Wgt=c(1,2,4), SPCLORatingValue=c(10,11,12)),
data.frame(Wgt=c(8,16), SPCLORatingValue=c(15,17)),
data.frame(Wgt=c(32,64), SPCLORatingValue=c(19,20)))
  t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x)
 summaryBy(Wgt ~ SPCLORatingValue, data=x,
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
 #  1
 #Wgt.sum.sum 112
 str(.Last.value)
 # num [1, 1] 112
 # - attr(*, dimnames)=List of 2
 #  ..$ : chr Wgt.sum.sum
 #  ..$ : chr 1

 Two ways of dealing with the problem you verbally described are
 (a) determine which elements of the input you can process (e.g., which
 have some values16) and use subscripting on both the left and right
 side of the assignment operator to put the results in the right place.
 E.g.,
 x - c(-1, 1, 2)
 ok - x0
 x[ok] - log(x[ok])
 (b) make your function handle any case so you don't have to do any
 subsetting on either side.  In your case it may be easy since
 sum(zeroLongNumericVector) is 0. In other cases you may want to use ifelse,
 as in
x - c(-1, 1, 2)
x - ifelse(x0, log(x), x)



 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Wed, May 20, 2015 at 4:13 PM, Vin Cheng newrnew...@hotmail.com wrote:

 Hi,

 I'm trying to group rows in a dataframe with SPCLORatingValue factor 16
 and summing the Wgt's that correspond to this condition.  There are 100
 dataframes in a list.

 Some of the dataframes won't have any rows that have this condition
 SPCLORatingValue16 and therefore no corresponding weight.

 My problem is that I need to have a corresponding value for each dataframe
 in the list - so 100 values.

 If dataframe 44 doesn't have any SPCLORatingValue16, then I end up
 getting a vector that's 99 long vs. 100.  putting value 45 into 44's slot
 and so on.

 Is there either an if/else statement or argument I can place into subset
 to put a 0 for the data frames that don't have SPCLORatingValue16?

 GenEval[18,1:100] -
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x)
 summaryBy(Wgt ~ SPCLORatingValue, data=x,
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))

 Any help or guidance would be greatly appreciated!
 Many Thanks,
 Vince



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and 0 replace?

2015-05-21 Thread Vin Cheng

), SPCLORatingValue = c(14L, 15L, 15L, 
 12L, 15L, 12L, 13L, 15L, 14L, 15L, 14L)), .Names = c(WgtBand, 
 Wgt, SPCLORatingValue), row.names = 12:22, class = data.frame), 
 V10 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 
 2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
 0.0043574552083, 0.0043574552083, 0.0043574552083, 
 0.0043574552083, 0.0043574552083, 0.0043574552083, 
 0.0043574552083, 0.0043574552083, 0.0043574552083
 ), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L, 
 15L, 15L, 13L, 14L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
  
  
  
  
  
  
  
  
  
  
  
  
 
  
 From: wdun...@tibco.com
 Date: Wed, 20 May 2015 22:12:01 -0700
 Subject: Re: [R] Subset and 0 replace?
 To: newrnew...@hotmail.com
 CC: r-help@r-project.org
 
 
 Can you show a small self-contained example of you data and expected results?
 I tried to make one and your expression returned a single number in a 1 by 1 
 matrix.
 
 library(doBy)
 Generation-list(
data.frame(Wgt=c(1,2,4), SPCLORatingValue=c(10,11,12)),
data.frame(Wgt=c(8,16), SPCLORatingValue=c(15,17)),
data.frame(Wgt=c(32,64), SPCLORatingValue=c(19,20)))
  
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) 
 summaryBy(Wgt ~ SPCLORatingValue, data=x, 
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
 #  1
 #Wgt.sum.sum 112
 str(.Last.value)
 # num [1, 1] 112
 # - attr(*, dimnames)=List of 2
 #  ..$ : chr Wgt.sum.sum
 #  ..$ : chr 1
 
 Two ways of dealing with the problem you verbally described are
 (a) determine which elements of the input you can process (e.g., which
 have some values16) and use subscripting on both the left and right
 side of the assignment operator to put the results in the right place.  E.g.,
 x - c(-1, 1, 2)
 ok - x0
 x[ok] - log(x[ok])
 (b) make your function handle any case so you don't have to do any
 subsetting on either side.  In your case it may be easy since 
 sum(zeroLongNumericVector) is 0. In other cases you may want to use ifelse,
 as in
x - c(-1, 1, 2)
x - ifelse(x0, log(x), x)
 
 
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 On Wed, May 20, 2015 at 4:13 PM, Vin Cheng newrnew...@hotmail.com wrote:
 Hi,
 
 I'm trying to group rows in a dataframe with SPCLORatingValue factor 16 and 
 summing the Wgt's that correspond to this condition.  There are 100 
 dataframes in a list.
 
 Some of the dataframes won't have any rows that have this condition 
 SPCLORatingValue16 and therefore no corresponding weight.
 
 My problem is that I need to have a corresponding value for each dataframe 
 in the list - so 100 values.
 
 If dataframe 44 doesn't have any SPCLORatingValue16, then I end up getting 
 a vector that's 99 long vs. 100.  putting value 45 into 44's slot and so on.
 
 Is there either an if/else statement or argument I can place into subset to 
 put a 0 for the data frames that don't have SPCLORatingValue16?
 
 GenEval[18,1:100] - 
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) 
 summaryBy(Wgt ~ SPCLORatingValue, data=x, 
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
 
 Any help or guidance would be greatly appreciated!
 Many Thanks,
 Vince
 
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and 0 replace?

2015-05-21 Thread Vin Cheng

Thanks William/Duncan!
 
Duncan - Yes - I am using the doBy package.
 
running this line on the sample data below gives weights for V5,V44,  V2.  
Ideally I would like 0's for V8 and V10 in the output.
 
So it would look like:
e-structure(matrix(c(V5, 0.008714910, V8, 0, V10, 0, V44, 
0.004357455, V2, 0.008714910),nrow = 2))
 
 
This is far as I've gotten by subsetting and  summing:
a-t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(c,function(x) 
summaryBy(Wgt ~ SPCLORatingValue, data=x, 
FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
 
All help/guidance is much appreciated!  Thanks Vince!
 
Sample data example:
c-structure(list(V5 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083
), SPCLORatingValue = c(11L, 15L, 14L, 15L, 14L, 14L, 16L, 19L, 
13L, 17L, 11L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
), row.names = 12:22, class = data.frame), V8 = structure(list(
WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = c(0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083), SPCLORatingValue = c(14L, 15L, 15L, 
12L, 15L, 12L, 13L, 15L, 14L, 15L, 14L)), .Names = c(WgtBand, 
Wgt, SPCLORatingValue), row.names = 12:22, class = data.frame), 
V10 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083
), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L, 
15L, 15L, 13L, 14L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
), row.names = 12:22, class = data.frame), V44 = structure(list(
WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = 
c(0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083), SPCLORatingValue = c(13L, 14L, 
16L, 15L, 14L, 14L, 18L, 13L, 16L, 15L, 11L)), .Names = c(WgtBand, 
Wgt, SPCLORatingValue), row.names = 12:22, class = data.frame), 
V2 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083
), SPCLORatingValue = c(13L, 14L, 15L, 15L, 15L, 14L, 12L, 
16L, 17L, 15L, 19L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
), row.names = 12:22, class = data.frame)), .Names = c(V5, 
V8, V10, V44, V2))
structure(list(V5 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083
), SPCLORatingValue = c(11L, 15L, 14L, 15L, 14L, 14L, 16L, 19L, 
13L, 17L, 11L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
), row.names = 12:22, class = data.frame), V8 = structure(list(
WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = c(0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083), SPCLORatingValue = c(14L, 15L, 15L, 
12L, 15L, 12L, 13L, 15L, 14L, 15L, 14L)), .Names = c(WgtBand, 
Wgt, SPCLORatingValue), row.names = 12:22, class = data.frame), 
V10 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2), Wgt = c(0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083, 
0.0043574552083, 0.0043574552083, 0.0043574552083
), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L, 
15L, 15L, 13L, 14L)), .Names = c(WgtBand, Wgt, SPCLORatingValue
 
 
 
 
 
 
 
 
 
 
 
 

 
From: wdun...@tibco.com
Date: Wed, 20 May 2015 22:12:01 -0700
Subject: Re: [R] Subset and 0 replace?
To: newrnew...@hotmail.com
CC: r-help@r-project.org

Can you show a small self-contained example of you data and expected results?I 
tried to make one and your expression returned a single number in a 1

Re: [R] Subset and 0 replace?

2015-05-20 Thread Duncan Murdoch

On 20/05/2015 7:13 PM, Vin Cheng wrote:
 Hi,
  
 I'm trying to group rows in a dataframe with SPCLORatingValue factor 16 and 
 summing the Wgt's that correspond to this condition.  There are 100 
 dataframes in a list.  
  
 Some of the dataframes won't have any rows that have this condition 
 SPCLORatingValue16 and therefore no corresponding weight.  
  
 My problem is that I need to have a corresponding value for each dataframe in 
 the list - so 100 values. 
  
 If dataframe 44 doesn't have any SPCLORatingValue16, then I end up getting a 
 vector that's 99 long vs. 100.  putting value 45 into 44's slot and so on.
  
 Is there either an if/else statement or argument I can place into subset to 
 put a 0 for the data frames that don't have SPCLORatingValue16?
  
 GenEval[18,1:100] - 
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) 
 summaryBy(Wgt ~ SPCLORatingValue, data=x, 
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
  

The summaryBy function is not in base R.  There's a function with that
name in the doBy package; is that the one you're using?

You doing say how to do the grouping, and I can't read your code to
figure it out, but this code will do what you want with suitable inputs:

by(df, group, function(subset) with(subset, sum(Wgt[SPCLORatingValue 
16])))

where df is your dataframe, and group is a variable that defines the groups.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset and 0 replace?

2015-05-20 Thread Vin Cheng

Hi,
 
I'm trying to group rows in a dataframe with SPCLORatingValue factor 16 and 
summing the Wgt's that correspond to this condition.  There are 100 dataframes 
in a list.  
 
Some of the dataframes won't have any rows that have this condition 
SPCLORatingValue16 and therefore no corresponding weight.  
 
My problem is that I need to have a corresponding value for each dataframe in 
the list - so 100 values. 
 
If dataframe 44 doesn't have any SPCLORatingValue16, then I end up getting a 
vector that's 99 long vs. 100.  putting value 45 into 44's slot and so on.
 
Is there either an if/else statement or argument I can place into subset to put 
a 0 for the data frames that don't have SPCLORatingValue16?
 
GenEval[18,1:100] - 
t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) 
summaryBy(Wgt ~ SPCLORatingValue, data=x, 
FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
 
Any help or guidance would be greatly appreciated!
Many Thanks,
Vince
 
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset and 0 replace?

2015-05-20 Thread William Dunlap

Can you show a small self-contained example of you data and expected
results?
I tried to make one and your expression returned a single number in a 1 by
1 matrix.

library(doBy)
Generation-list(
   data.frame(Wgt=c(1,2,4), SPCLORatingValue=c(10,11,12)),
   data.frame(Wgt=c(8,16), SPCLORatingValue=c(15,17)),
   data.frame(Wgt=c(32,64), SPCLORatingValue=c(19,20)))
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x)
summaryBy(Wgt ~ SPCLORatingValue, data=x,
FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))
#  1
#Wgt.sum.sum 112
str(.Last.value)
# num [1, 1] 112
# - attr(*, dimnames)=List of 2
#  ..$ : chr Wgt.sum.sum
#  ..$ : chr 1

Two ways of dealing with the problem you verbally described are
(a) determine which elements of the input you can process (e.g., which
have some values16) and use subscripting on both the left and right
side of the assignment operator to put the results in the right place.
E.g.,
x - c(-1, 1, 2)
ok - x0
x[ok] - log(x[ok])
(b) make your function handle any case so you don't have to do any
subsetting on either side.  In your case it may be easy since
sum(zeroLongNumericVector) is 0. In other cases you may want to use ifelse,
as in
   x - c(-1, 1, 2)
   x - ifelse(x0, log(x), x)



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 20, 2015 at 4:13 PM, Vin Cheng newrnew...@hotmail.com wrote:

 Hi,

 I'm trying to group rows in a dataframe with SPCLORatingValue factor 16
 and summing the Wgt's that correspond to this condition.  There are 100
 dataframes in a list.

 Some of the dataframes won't have any rows that have this condition
 SPCLORatingValue16 and therefore no corresponding weight.

 My problem is that I need to have a corresponding value for each dataframe
 in the list - so 100 values.

 If dataframe 44 doesn't have any SPCLORatingValue16, then I end up
 getting a vector that's 99 long vs. 100.  putting value 45 into 44's slot
 and so on.

 Is there either an if/else statement or argument I can place into subset
 to put a 0 for the data frames that don't have SPCLORatingValue16?

 GenEval[18,1:100] -
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x)
 summaryBy(Wgt ~ SPCLORatingValue, data=x,
 FUN=c(sum))),SPCLORatingValue16),FUN=c(sum),order=FALSE))

 Any help or guidance would be greatly appreciated!
 Many Thanks,
 Vince



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset a data frame by largest frequencies of factors

2015-03-06 Thread S Ellison



 -Original Message-
 A consulting client has a large data set with a binary response
 (negative) and two factors (ctry and member) which have many levels, but
 many occur with very small frequencies.  It is far too sparse with a model 
 like
 glm(negative ~ ctry+member, family=binomial).
 
 For analysis, we'd like to subset the data to include only those that occur 
 with
 frequency greater than a given value

ave() helps with this kind of thing. 

Something like

freq - ave(1:length(ctry), factor(ctry:member), FUN=length)

gives the count for each ctry:member call. Then you can subset a data frame 
using, for example

dfr.subset - dfr[freq10, ]

The 1:length(ctry) in the ave call is simply because ave wants a numeric there. 
If all we're doing with it is counting the number, it just has to be a numeric 
of the same length as your data. in a data frame it can be 1:nrow(dfr) etc.

S Ellison



***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset a data frame by largest frequencies of factors

2015-03-05 Thread David L Carlson

These two commands will compute the cell frequencies and then sort them:

e - as.data.frame(xtabs(~ctry+member, Dataset))
f - e[order(e$Freq, decreasing=TRUE),]

Then draw your subset

g - head(f, 10)

or

g - f[cumsum(f$Freq)/sum(f$Freq) .8,]

Finally merge the sample with the original data and delete the unused factor 
levels:

sample - merge(Dataset, g[,-3])
sample$ctry - factor(sample$ctry)
sample$member - factor(sample$member)

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Friendly
Sent: Thursday, March 5, 2015 12:45 PM
To: R-help
Subject: [R] subset a data frame by largest frequencies of factors

A consulting client has a large data set with a binary response 
(negative) and two factors (ctry and member) which have many levels,
but many occur with very small frequencies.  It is far too sparse with a 
model like glm(negative ~ ctry+member, family=binomial).

  str(Dataset)
'data.frame':   10672 obs. of  5 variables:
  $ ctry: Factor w/ 31 levels Barbados,Belize,..: 21 21 5 22 18 
18 18 18 26 18 ...
  $ member  : Factor w/ 163 levels ,ADHOPIA, PREETI ,..: 150 19 19 
111 120 1 1 4 55 18 ...
  $ negative: int  0 1 0 1 1 1 1 0 0 0 ...
 

For analysis, we'd like to subset the data to include only those that 
occur with frequency greater than a given
value, or the top 10 (say) in frequency, or the highest frequency 
categories accounting for 80% (say) of the
total.  I'm not sure how to do any of these in R.  Can anyone help?

-- 
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset a data frame by largest frequencies of factors

2015-03-05 Thread Michael Friendly

A consulting client has a large data set with a binary response 
(negative) and two factors (ctry and member) which have many levels,
but many occur with very small frequencies.  It is far too sparse with a 
model like glm(negative ~ ctry+member, family=binomial).


 str(Dataset)
'data.frame':   10672 obs. of  5 variables:
 $ ctry: Factor w/ 31 levels Barbados,Belize,..: 21 21 5 22 18 
18 18 18 26 18 ...
 $ member  : Factor w/ 163 levels ,ADHOPIA, PREETI ,..: 150 19 19 
111 120 1 1 4 55 18 ...

 $ negative: int  0 1 0 1 1 1 1 0 0 0 ...


For analysis, we'd like to subset the data to include only those that 
occur with frequency greater than a given
value, or the top 10 (say) in frequency, or the highest frequency 
categories accounting for 80% (say) of the

total.  I'm not sure how to do any of these in R.  Can anyone help?

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset drops S3 classes?

2014-11-12 Thread Murat Tasan

Hi all --- I've stumbled upon some pretty annoying behavior, and I'm
curious how others may have gotten around it.
When using subset(...) on a data frame that contains a custom S3
field, the class is dropped in the result:

 MyClass - function(x) structure(x, class = MyClass)

 df - data.frame(x = 1:10, y = 10:1)
 df$x - MyClass(df$x)
 str(df)
 'data.frame':   10 obs. of  2 variables:
  $ x:Class 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
  $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
 'data.frame':   5 obs. of  2 variables:
  $ x: int  2 4 6 8 10
  $ y: int  9 7 5 3 1

And so, any generic functions hooked to MyClass suddenly don't work on
the subset results, but do work on the original data frame.
I think I could write a custom as.data.frame.MyClass for all such
classes, but this is annoying, indeed (and I don't know for sure if
that's a robust solution)
Wrapping in I(...) doesn't work, either:

 df - data.frame(x = 1:10, y = 10:1)
 df$x - I(MyClass(df$x))
 str(df)
 'data.frame':   10 obs. of  2 variables:
  $ x:Classes 'AsIs', 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
  $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
  'data.frame':   5 obs. of  2 variables:
  $ x:Class 'AsIs'  int [1:5] 2 4 6 8 10
  $ y: int  9 7 5 3 1

(note that while 'AsIs' is kept, 'MyClass' has been removed in $x)

Cheers!

-Murat

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset drops S3 classes?

2014-11-12 Thread Murat Tasan

And as a follow-up, I implemented a barebones as.data.frame.MyClass(...).
It works when dealing with non-subsetted data frames, but fails upon a
subset(...) call:

 as.data.frame.MyClass - function(x, ...) as.data.frame.vector(x, ...)

This works for a single column, e.g.:

 str(data.frame(MyClass(1:10)))
 'data.frame':   10 obs. of  1 variable:
  $ MyClass.1.10.:Class 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10

But not during a subset:

 str(subset(data.frame(x = MyClass(1:10)), x %% 2 == 0))
'data.frame':   5 obs. of  1 variable:
 $ x: int  2 4 6 8 10

-Murat


On Wed, Nov 12, 2014 at 10:02 PM, Murat Tasan mmu...@gmail.com wrote:
 Hi all --- I've stumbled upon some pretty annoying behavior, and I'm
 curious how others may have gotten around it.
 When using subset(...) on a data frame that contains a custom S3
 field, the class is dropped in the result:

 MyClass - function(x) structure(x, class = MyClass)

 df - data.frame(x = 1:10, y = 10:1)
 df$x - MyClass(df$x)
 str(df)
  'data.frame':   10 obs. of  2 variables:
   $ x:Class 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
   $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
  'data.frame':   5 obs. of  2 variables:
   $ x: int  2 4 6 8 10
   $ y: int  9 7 5 3 1

 And so, any generic functions hooked to MyClass suddenly don't work on
 the subset results, but do work on the original data frame.
 I think I could write a custom as.data.frame.MyClass for all such
 classes, but this is annoying, indeed (and I don't know for sure if
 that's a robust solution)
 Wrapping in I(...) doesn't work, either:

 df - data.frame(x = 1:10, y = 10:1)
 df$x - I(MyClass(df$x))
 str(df)
  'data.frame':   10 obs. of  2 variables:
   $ x:Classes 'AsIs', 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
   $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
   'data.frame':   5 obs. of  2 variables:
   $ x:Class 'AsIs'  int [1:5] 2 4 6 8 10
   $ y: int  9 7 5 3 1

 (note that while 'AsIs' is kept, 'MyClass' has been removed in $x)

 Cheers!

 -Murat

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset drops S3 classes?

2014-11-12 Thread Murat Tasan

... nd nevermind, figured it out (from the final example on the
Extract.data.frame page):

`[.MyClass` - function(x, i, ...) {
NextMethod([)
mostattributes(RV) - attribute(x)
RV
}

cheers,

-m

On Wed, Nov 12, 2014 at 11:02 PM, Murat Tasan mmu...@gmail.com wrote:
 And as a follow-up, I implemented a barebones as.data.frame.MyClass(...).
 It works when dealing with non-subsetted data frames, but fails upon a
 subset(...) call:

 as.data.frame.MyClass - function(x, ...) as.data.frame.vector(x, ...)

 This works for a single column, e.g.:

 str(data.frame(MyClass(1:10)))
  'data.frame':   10 obs. of  1 variable:
   $ MyClass.1.10.:Class 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10

 But not during a subset:

 str(subset(data.frame(x = MyClass(1:10)), x %% 2 == 0))
 'data.frame':   5 obs. of  1 variable:
  $ x: int  2 4 6 8 10

 -Murat


 On Wed, Nov 12, 2014 at 10:02 PM, Murat Tasan mmu...@gmail.com wrote:
 Hi all --- I've stumbled upon some pretty annoying behavior, and I'm
 curious how others may have gotten around it.
 When using subset(...) on a data frame that contains a custom S3
 field, the class is dropped in the result:

 MyClass - function(x) structure(x, class = MyClass)

 df - data.frame(x = 1:10, y = 10:1)
 df$x - MyClass(df$x)
 str(df)
  'data.frame':   10 obs. of  2 variables:
   $ x:Class 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
   $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
  'data.frame':   5 obs. of  2 variables:
   $ x: int  2 4 6 8 10
   $ y: int  9 7 5 3 1

 And so, any generic functions hooked to MyClass suddenly don't work on
 the subset results, but do work on the original data frame.
 I think I could write a custom as.data.frame.MyClass for all such
 classes, but this is annoying, indeed (and I don't know for sure if
 that's a robust solution)
 Wrapping in I(...) doesn't work, either:

 df - data.frame(x = 1:10, y = 10:1)
 df$x - I(MyClass(df$x))
 str(df)
  'data.frame':   10 obs. of  2 variables:
   $ x:Classes 'AsIs', 'MyClass'  int [1:10] 1 2 3 4 5 6 7 8 9 10
   $ y: int  10 9 8 7 6 5 4 3 2 1
 str(subset(df, x %% 2 == 0))
   'data.frame':   5 obs. of  2 variables:
   $ x:Class 'AsIs'  int [1:5] 2 4 6 8 10
   $ y: int  9 7 5 3 1

 (note that while 'AsIs' is kept, 'MyClass' has been removed in $x)

 Cheers!

 -Murat

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset ffdf does not accept bit vector anymore (package ffbase)

2014-09-25 Thread christian.kamenik

Hi everyone

Since I updated package 'ffbase', subset.ffdf does not work with bit vectors 
anymore. Here is a short example:

data(iris)

library(ffbase)
iris.ffdf - as.ffdf(iris)
index - sample(c(FALSE,TRUE), nrow(iris), TRUE)
index.bit - as.bit(index)

subset(iris.ffdf, subset=index.bit)

results in the error message:
Error in which(eval(e, nl, envir)) : argument to 'which' is not logical


My code was working prior to the update...
and help on subset.ffdf sais:

subset: an expression, ri, bit or logical ff vector that can be 
used to index x

Any help would be highly appreciated.

Many thanks
Christian



 R.Version()



$platform

[1] i386-w64-mingw32



$arch

[1] i386



$os

[1] mingw32



$system

[1] i386, mingw32



$status

[1] 



$major

[1] 3



$minor

[1] 1.1



$year

[1] 2014



$month

[1] 07



$day

[1] 10



$`svn rev`

[1] 66115



$language

[1] R



$version.string

[1] R version 3.1.1 (2014-07-10)



$nickname

[1] Sock it to Me



 sessionInfo()



R version 3.1.1 (2014-07-10)

Platform: i386-w64-mingw32/i386 (32-bit)



locale:

[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
LC_MONETARY=German_Switzerland.1252

[4] LC_NUMERIC=CLC_TIME=German_Switzerland.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



other attached packages:

[1] stringr_0.6.2 ffbase_0.11.3 ff_2.2-13 bit_1.1-12track_1.0-15



loaded via a namespace (and not attached):

[1] fastmatch_1.0-4 tools_3.1.1



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset a column with specific characters

2014-09-04 Thread Kuma Raj

This post has NOT been accepted by the mailing list yet.
I would like to subset a column based on the contents of a column with
specific character. In the sample data I wish to do the following:

First keep the data based on column prog if prog contains ca, and
secondly to drop if race contains ic

Thanks

library(foreign)
hsb2 - read.dta('http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset a column with specific characters

2014-09-04 Thread David Winsemius


On Sep 4, 2014, at 2:58 PM, Kuma Raj wrote:

 This post has NOT been accepted by the mailing list yet.

Well, it has now. Were you earlier posting from Nabble? (Not an efficient 
strategy.)

 I would like to subset a column based on the contents of a column with
 specific character. In the sample data I wish to do the following:
 
 First keep the data based on column prog if prog contains ca, and
 secondly to drop if race contains ic
 
 Thanks
 
 library(foreign)
 hsb2 - read.dta('http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta')

 NROW( hsb2[ grepl(ca, hsb2$prog)  !grepl(ic, hsb2$race) , ] )
[1] 120

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 >

1 - 100 of 513 matches

Mail list logo