Re: [R] Subsetting a vector using an index with all missing values

2022-07-02 Thread Peter Langfelder
Ah, thanks, that makes sense.

Peter

On Fri, Jul 1, 2022 at 10:01 PM Bill Dunlap  wrote:
>
> This has to do with the mode of the subscript - logical subscripts are 
> repeated to the length of x and integer/numeric ones are not.  NA is logical, 
> NA_integer_ is integer, so we get
>
> > x <- 1:10
> > x[ rep(NA_integer_, 3) ]
> [1] NA NA NA
> > x[ rep(NA, 3) ]
>  [1] NA NA NA NA NA NA NA NA NA NA
>
> -Bill
>
>
> On Fri, Jul 1, 2022 at 8:31 PM Peter Langfelder  
> wrote:
>>
>> Hi all,
>>
>> I stumbled on subsetting behavior that seems counterintuitive and
>> perhaps is a bug. Here's a simple example:
>>
>> > x = 1:10
>> > x[ rep(NA, 3)]
>>  [1] NA NA NA NA NA NA NA NA NA NA
>>
>> I would have expected 3 NAs (the length of the index), not 10 (all
>> values in x). Looked at the documentation for the subsetting operator
>> `[` but found nothing indicating that if the index contains all
>> missing data, the result is the entire vector.
>>
>> I can work around the issue for a general 'index' using a somewhat
>> clunky but straightforward construct along the lines of
>>
>> > index = rep(NA, 3)
>> > x[c(1, index)][-1]
>> [1] NA NA NA
>>
>> but I'm wondering if the behaviour above is intended.
>>
>> Thanks,
>>
>> Peter
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting/slicing xml2 nodesets

2019-08-21 Thread Tobias Fellinger

Dear R-help members,

I'm working with the xml2 package to parse an xml document, and I don't 
understand how subsetting / slicing of xml_nodesets works. I'd expect 
xml_find_all to only return children of the nodes I selected with [ or 
[[ but it returns all nodes found in the whole document. I did not find 
any documentation on the [ and [[ operators for xml_nodeset. Below is a 
small example and the sessionInfo.


thanks in advance, Tobias Fellinger



# load package
require(xml2)

# test document as text
test_chr <- "


paragraph 1
paragraph 2


"

# parse test document
test_doc <- read_xml(test_chr)

# extract nodeset
test_nodeset <- xml_find_all(test_doc, "//p")

# subset nodeset (working as expected)
test_nodeset[1]
# {xml_nodeset (1)}
# [1] paragraph 1
test_nodeset[[1]]
# {xml_node}
# 

# extract from subset (not working as expected)
xml_find_all(test_nodeset[1], "//p")
# {xml_nodeset (2)}
# [1] paragraph 1
# [2] paragraph 2
xml_find_all(test_nodeset[[1]], "//p")
# {xml_nodeset (2)}
# [1] paragraph 1
# [2] paragraph 2

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
#   [1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252
LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
LC_TIME=German_Austria.1252

#
# attached base packages:
#   [1] stats graphics  grDevices utils datasets  methods   
base

#
# other attached packages:
#   [1] xml2_1.2.2
#
# loaded via a namespace (and not attached):
#   [1] compiler_3.6.0 tools_3.6.0Rcpp_1.0.2 packrat_0.5.0

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting Data from a Dataframe

2019-05-24 Thread Rui Barradas

Hello,

Maybe something like the following is what you want.
The code first creates a logical index of columns with at least one NA 
or "NULL" (character string, not NULL) values. Then extracts only those 
columns from the dataframe.


inx <- sapply(datos, function(x) any(x == "NULL" | is.na(x)))
datos2 <- datos[inx]


Hope this helps,

Rui Barradas


Às 14:10 de 24/05/19, Paul Bernal escreveu:

Dear friends,

Hope you are all doing well. I would like to know how to retrieve a
complete dataframe (all the columns), except for the cases when one of the
columns have either nulls or NAs.

In this case, I´d like to retrieve all the columns but only the cases
(rows) where Var5 has values different than 0, null or NAs.

I am attaching the dput() of my dataset as a reference:


dput(datos)

structure(list(VarDate = structure(c(355L, 321L, 86L, 155L, 121L,
255L, 20L, 288L, 221L, 188L, 53L, 389L, 356L, 322L, 87L, 156L,
122L, 256L, 21L, 289L, 222L, 189L, 54L, 390L, 357L, 323L, 88L,
157L, 123L, 257L, 22L, 290L, 223L, 190L, 55L, 391L, 358L, 324L,
89L, 158L, 124L, 258L, 23L, 291L, 224L, 191L, 56L, 392L, 359L,
325L, 90L, 159L, 125L, 259L, 24L, 292L, 225L, 192L, 57L, 393L,
360L, 326L, 91L, 160L, 126L, 260L, 25L, 293L, 226L, 193L, 58L,
394L, 361L, 327L, 92L, 161L, 127L, 261L, 26L, 294L, 227L, 194L,
59L, 395L, 362L, 328L, 93L, 162L, 128L, 262L, 27L, 295L, 228L,
195L, 60L, 396L, 363L, 329L, 94L, 163L, 129L, 263L, 28L, 296L,
229L, 196L, 61L, 397L, 364L, 330L, 95L, 164L, 130L, 264L, 29L,
297L, 230L, 197L, 62L, 398L, 365L, 331L, 96L, 165L, 131L, 265L,
30L, 298L, 231L, 198L, 63L, 399L, 366L, 332L, 97L, 166L, 132L,
266L, 31L, 299L, 232L, 199L, 64L, 400L, 367L, 333L, 98L, 167L,
133L, 267L, 32L, 300L, 233L, 200L, 65L, 401L, 368L, 334L, 99L,
168L, 134L, 268L, 33L, 301L, 234L, 201L, 66L, 402L, 369L, 335L,
100L, 135L, 101L, 235L, 1L, 269L, 202L, 169L, 34L, 370L, 336L,
302L, 67L, 136L, 102L, 236L, 2L, 270L, 203L, 170L, 35L, 371L,
337L, 303L, 68L, 137L, 103L, 237L, 3L, 271L, 204L, 171L, 36L,
372L, 338L, 304L, 69L, 138L, 104L, 238L, 4L, 272L, 205L, 172L,
37L, 373L, 339L, 305L, 70L, 139L, 105L, 239L, 5L, 273L, 206L,
173L, 38L, 374L, 340L, 306L, 71L, 140L, 106L, 240L, 6L, 274L,
207L, 174L, 39L, 375L, 341L, 307L, 72L, 141L, 107L, 241L, 7L,
275L, 208L, 175L, 40L, 376L, 342L, 308L, 73L, 142L, 108L, 242L,
8L, 276L, 209L, 176L, 41L, 377L, 343L, 309L, 74L, 143L, 109L,
243L, 9L, 277L, 210L, 177L, 42L, 378L, 344L, 310L, 75L, 144L,
110L, 244L, 10L, 278L, 211L, 178L, 43L, 379L, 345L, 311L, 76L,
145L, 111L, 245L, 11L, 279L, 212L, 179L, 44L, 380L, 346L, 312L,
77L, 146L, 112L, 246L, 12L, 280L, 213L, 180L, 45L, 381L, 347L,
313L, 78L, 147L, 113L, 247L, 13L, 281L, 214L, 181L, 46L, 382L,
348L, 314L, 79L, 148L, 114L, 248L, 14L, 282L, 215L, 182L, 47L,
383L, 349L, 315L, 80L, 149L, 115L, 249L, 15L, 283L, 216L, 183L,
48L, 384L, 350L, 316L, 81L, 150L, 116L, 250L, 16L, 284L, 217L,
184L, 49L, 385L, 351L, 317L, 82L, 151L, 117L, 251L, 17L, 285L,
218L, 185L, 50L, 386L, 352L, 318L, 83L, 152L, 118L, 252L, 18L,
286L, 219L, 186L, 51L, 387L, 353L, 319L, 84L, 153L, 119L, 253L,
19L, 287L, 220L, 187L, 52L, 388L, 354L, 320L, 85L, 154L, 120L,
254L), .Label = c("1-Apr-00", "1-Apr-01", "1-Apr-02", "1-Apr-03",
"1-Apr-04", "1-Apr-05", "1-Apr-06", "1-Apr-07", "1-Apr-08", "1-Apr-09",
"1-Apr-10", "1-Apr-11", "1-Apr-12", "1-Apr-13", "1-Apr-14", "1-Apr-15",
"1-Apr-16", "1-Apr-17", "1-Apr-18", "1-Apr-86", "1-Apr-87", "1-Apr-88",
"1-Apr-89", "1-Apr-90", "1-Apr-91", "1-Apr-92", "1-Apr-93", "1-Apr-94",
"1-Apr-95", "1-Apr-96", "1-Apr-97", "1-Apr-98", "1-Apr-99", "1-Aug-00",
"1-Aug-01", "1-Aug-02", "1-Aug-03", "1-Aug-04", "1-Aug-05", "1-Aug-06",
"1-Aug-07", "1-Aug-08", "1-Aug-09", "1-Aug-10", "1-Aug-11", "1-Aug-12",
"1-Aug-13", "1-Aug-14", "1-Aug-15", "1-Aug-16", "1-Aug-17", "1-Aug-18",
"1-Aug-86", "1-Aug-87", "1-Aug-88", "1-Aug-89", "1-Aug-90", "1-Aug-91",
"1-Aug-92", "1-Aug-93", "1-Aug-94", "1-Aug-95", "1-Aug-96", "1-Aug-97",
"1-Aug-98", "1-Aug-99", "1-Dec-00", "1-Dec-01", "1-Dec-02", "1-Dec-03",
"1-Dec-04", "1-Dec-05", "1-Dec-06", "1-Dec-07", "1-Dec-08", "1-Dec-09",
"1-Dec-10", "1-Dec-11", "1-Dec-12", "1-Dec-13", "1-Dec-14", "1-Dec-15",
"1-Dec-16", "1-Dec-17", "1-Dec-18", "1-Dec-85", "1-Dec-86", "1-Dec-87",
"1-Dec-88", "1-Dec-89", "1-Dec-90", "1-Dec-91", "1-Dec-92", "1-Dec-93",
"1-Dec-94", "1-Dec-95", "1-Dec-96", "1-Dec-97", "1-Dec-98", "1-Dec-99",
"1-Feb-00", "1-Feb-01", "1-Feb-02", "1-Feb-03", "1-Feb-04", "1-Feb-05",
"1-Feb-06", "1-Feb-07", "1-Feb-08", "1-Feb-09", "1-Feb-10", "1-Feb-11",
"1-Feb-12", "1-Feb-13", "1-Feb-14", "1-Feb-15", "1-Feb-16", "1-Feb-17",
"1-Feb-18", "1-Feb-19", "1-Feb-86", "1-Feb-87", "1-Feb-88", "1-Feb-89",
"1-Feb-90", "1-Feb-91", "1-Feb-92", "1-Feb-93", "1-Feb-94", "1-Feb-95",
"1-Feb-96", "1-Feb-97", "1-Feb-98", "1-Feb-99", "1-Jan-00", "1-Jan-01",
"1-Jan-02", "1-Jan-03", "1-Jan-04", "1-Jan-05", "1-Jan-06", "1-Jan-07",
"1-Jan-08", "1-Jan-09", "1-Jan-10", "1-Jan-11", "1-Jan-12", "1-Jan-13",
"1-Jan-14", "1-Jan-15", "1-Jan-16", "1-Jan-17", "1-Jan-18", "1-Jan-19",

Re: [R] Subsetting Data from a Dataframe

2019-05-24 Thread Sarah Goslee
Hi Paul,

Thanks for the reproducible data. You really only need to provide
enough to illustrate your question, but this works.

I suspect you have a data import problem - I doubt you really want so
many columns to be factors! Probably you need to specify that NULL
means something specific, rather than being an ordinary string.

> str(datos)
'data.frame': 402 obs. of  9 variables:
 $ VarDate: Factor w/ 402 levels "1-Apr-00","1-Apr-01",..: 355 321 86
155 121 255 20 288 221 188 ...
 $ Var1   : int  150 140 148 157 105 132 123 139 128 174 ...
 $ Var2   : int  821273 625955 809990 729112 532151 725098 619868
704282 580952 975656 ...
 $ Var3   : int  1023833 924771 1021634 1043681 752874 947427 859879
999681 835358 1252862 ...
 $ Var4   : int  1842916 1650947 1826792 1868854 1349682 1700705
1553206 1789866 1481587 2260812 ...
 $ Var5   : Factor w/ 167 levels "0","1004","1017",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Var6   : Factor w/ 2 levels "0","NULL": 2 2 2 2 2 2 2 2 2 2 ...
 $ Var7   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Var8   : Factor w/ 169 levels "0","139884","159100",..: 1 1 1 1 1 1
1 1 1 1 ...

Regardless, you want the subset() command.

Here's one option, to give you the idea. Note that the values are in
quotes because your Var5 is a factor. If it isn't supposed to be, you
may want is.null() in the same way. I included NA because you asked,
but there aren't any NA values in Var5.

datos.new <- subset(datos, !(Var5 %in% c("0", "NULL")) & !is.na(Var5))

You can read the help for subset for more options, but basically you
need to construct something that returns a logical value.

Sarah

On Fri, May 24, 2019 at 9:11 AM Paul Bernal  wrote:
>
> Dear friends,
>
> Hope you are all doing well. I would like to know how to retrieve a
> complete dataframe (all the columns), except for the cases when one of the
> columns have either nulls or NAs.
>
> In this case, I´d like to retrieve all the columns but only the cases
> (rows) where Var5 has values different than 0, null or NAs.
>
> I am attaching the dput() of my dataset as a reference:
>
> > dput(datos)
> structure(list(VarDate = structure(c(355L, 321L, 86L, 155L, 121L,
> 255L, 20L, 288L, 221L, 188L, 53L, 389L, 356L, 322L, 87L, 156L,
> 122L, 256L, 21L, 289L, 222L, 189L, 54L, 390L, 357L, 323L, 88L,
> 157L, 123L, 257L, 22L, 290L, 223L, 190L, 55L, 391L, 358L, 324L,
> 89L, 158L, 124L, 258L, 23L, 291L, 224L, 191L, 56L, 392L, 359L,
> 325L, 90L, 159L, 125L, 259L, 24L, 292L, 225L, 192L, 57L, 393L,
> 360L, 326L, 91L, 160L, 126L, 260L, 25L, 293L, 226L, 193L, 58L,
> 394L, 361L, 327L, 92L, 161L, 127L, 261L, 26L, 294L, 227L, 194L,
> 59L, 395L, 362L, 328L, 93L, 162L, 128L, 262L, 27L, 295L, 228L,
> 195L, 60L, 396L, 363L, 329L, 94L, 163L, 129L, 263L, 28L, 296L,
> 229L, 196L, 61L, 397L, 364L, 330L, 95L, 164L, 130L, 264L, 29L,
> 297L, 230L, 197L, 62L, 398L, 365L, 331L, 96L, 165L, 131L, 265L,
> 30L, 298L, 231L, 198L, 63L, 399L, 366L, 332L, 97L, 166L, 132L,
> 266L, 31L, 299L, 232L, 199L, 64L, 400L, 367L, 333L, 98L, 167L,
> 133L, 267L, 32L, 300L, 233L, 200L, 65L, 401L, 368L, 334L, 99L,
> 168L, 134L, 268L, 33L, 301L, 234L, 201L, 66L, 402L, 369L, 335L,
> 100L, 135L, 101L, 235L, 1L, 269L, 202L, 169L, 34L, 370L, 336L,
> 302L, 67L, 136L, 102L, 236L, 2L, 270L, 203L, 170L, 35L, 371L,
> 337L, 303L, 68L, 137L, 103L, 237L, 3L, 271L, 204L, 171L, 36L,
> 372L, 338L, 304L, 69L, 138L, 104L, 238L, 4L, 272L, 205L, 172L,
> 37L, 373L, 339L, 305L, 70L, 139L, 105L, 239L, 5L, 273L, 206L,
> 173L, 38L, 374L, 340L, 306L, 71L, 140L, 106L, 240L, 6L, 274L,
> 207L, 174L, 39L, 375L, 341L, 307L, 72L, 141L, 107L, 241L, 7L,
> 275L, 208L, 175L, 40L, 376L, 342L, 308L, 73L, 142L, 108L, 242L,
> 8L, 276L, 209L, 176L, 41L, 377L, 343L, 309L, 74L, 143L, 109L,
> 243L, 9L, 277L, 210L, 177L, 42L, 378L, 344L, 310L, 75L, 144L,
> 110L, 244L, 10L, 278L, 211L, 178L, 43L, 379L, 345L, 311L, 76L,
> 145L, 111L, 245L, 11L, 279L, 212L, 179L, 44L, 380L, 346L, 312L,
> 77L, 146L, 112L, 246L, 12L, 280L, 213L, 180L, 45L, 381L, 347L,
> 313L, 78L, 147L, 113L, 247L, 13L, 281L, 214L, 181L, 46L, 382L,
> 348L, 314L, 79L, 148L, 114L, 248L, 14L, 282L, 215L, 182L, 47L,
> 383L, 349L, 315L, 80L, 149L, 115L, 249L, 15L, 283L, 216L, 183L,
> 48L, 384L, 350L, 316L, 81L, 150L, 116L, 250L, 16L, 284L, 217L,
> 184L, 49L, 385L, 351L, 317L, 82L, 151L, 117L, 251L, 17L, 285L,
> 218L, 185L, 50L, 386L, 352L, 318L, 83L, 152L, 118L, 252L, 18L,
> 286L, 219L, 186L, 51L, 387L, 353L, 319L, 84L, 153L, 119L, 253L,
> 19L, 287L, 220L, 187L, 52L, 388L, 354L, 320L, 85L, 154L, 120L,
> 254L), .Label = c("1-Apr-00", "1-Apr-01", "1-Apr-02", "1-Apr-03",
> "1-Apr-04", "1-Apr-05", "1-Apr-06", "1-Apr-07", "1-Apr-08", "1-Apr-09",
> "1-Apr-10", "1-Apr-11", "1-Apr-12", "1-Apr-13", "1-Apr-14", "1-Apr-15",
> "1-Apr-16", "1-Apr-17", "1-Apr-18", "1-Apr-86", "1-Apr-87", "1-Apr-88",
> "1-Apr-89", "1-Apr-90", "1-Apr-91", "1-Apr-92", "1-Apr-93", "1-Apr-94",
> "1-Apr-95", "1-Apr-96", "1-Apr-97", "1-Apr-98", "1-Apr-99", "1-Aug-00",
> "1-Aug-01", "1-Aug-02", "1-Aug-03", "1-Aug-04", "1-Aug-05", "1-Aug-06",
> 

[R] Subsetting Data from a Dataframe

2019-05-24 Thread Paul Bernal
Dear friends,

Hope you are all doing well. I would like to know how to retrieve a
complete dataframe (all the columns), except for the cases when one of the
columns have either nulls or NAs.

In this case, I´d like to retrieve all the columns but only the cases
(rows) where Var5 has values different than 0, null or NAs.

I am attaching the dput() of my dataset as a reference:

> dput(datos)
structure(list(VarDate = structure(c(355L, 321L, 86L, 155L, 121L,
255L, 20L, 288L, 221L, 188L, 53L, 389L, 356L, 322L, 87L, 156L,
122L, 256L, 21L, 289L, 222L, 189L, 54L, 390L, 357L, 323L, 88L,
157L, 123L, 257L, 22L, 290L, 223L, 190L, 55L, 391L, 358L, 324L,
89L, 158L, 124L, 258L, 23L, 291L, 224L, 191L, 56L, 392L, 359L,
325L, 90L, 159L, 125L, 259L, 24L, 292L, 225L, 192L, 57L, 393L,
360L, 326L, 91L, 160L, 126L, 260L, 25L, 293L, 226L, 193L, 58L,
394L, 361L, 327L, 92L, 161L, 127L, 261L, 26L, 294L, 227L, 194L,
59L, 395L, 362L, 328L, 93L, 162L, 128L, 262L, 27L, 295L, 228L,
195L, 60L, 396L, 363L, 329L, 94L, 163L, 129L, 263L, 28L, 296L,
229L, 196L, 61L, 397L, 364L, 330L, 95L, 164L, 130L, 264L, 29L,
297L, 230L, 197L, 62L, 398L, 365L, 331L, 96L, 165L, 131L, 265L,
30L, 298L, 231L, 198L, 63L, 399L, 366L, 332L, 97L, 166L, 132L,
266L, 31L, 299L, 232L, 199L, 64L, 400L, 367L, 333L, 98L, 167L,
133L, 267L, 32L, 300L, 233L, 200L, 65L, 401L, 368L, 334L, 99L,
168L, 134L, 268L, 33L, 301L, 234L, 201L, 66L, 402L, 369L, 335L,
100L, 135L, 101L, 235L, 1L, 269L, 202L, 169L, 34L, 370L, 336L,
302L, 67L, 136L, 102L, 236L, 2L, 270L, 203L, 170L, 35L, 371L,
337L, 303L, 68L, 137L, 103L, 237L, 3L, 271L, 204L, 171L, 36L,
372L, 338L, 304L, 69L, 138L, 104L, 238L, 4L, 272L, 205L, 172L,
37L, 373L, 339L, 305L, 70L, 139L, 105L, 239L, 5L, 273L, 206L,
173L, 38L, 374L, 340L, 306L, 71L, 140L, 106L, 240L, 6L, 274L,
207L, 174L, 39L, 375L, 341L, 307L, 72L, 141L, 107L, 241L, 7L,
275L, 208L, 175L, 40L, 376L, 342L, 308L, 73L, 142L, 108L, 242L,
8L, 276L, 209L, 176L, 41L, 377L, 343L, 309L, 74L, 143L, 109L,
243L, 9L, 277L, 210L, 177L, 42L, 378L, 344L, 310L, 75L, 144L,
110L, 244L, 10L, 278L, 211L, 178L, 43L, 379L, 345L, 311L, 76L,
145L, 111L, 245L, 11L, 279L, 212L, 179L, 44L, 380L, 346L, 312L,
77L, 146L, 112L, 246L, 12L, 280L, 213L, 180L, 45L, 381L, 347L,
313L, 78L, 147L, 113L, 247L, 13L, 281L, 214L, 181L, 46L, 382L,
348L, 314L, 79L, 148L, 114L, 248L, 14L, 282L, 215L, 182L, 47L,
383L, 349L, 315L, 80L, 149L, 115L, 249L, 15L, 283L, 216L, 183L,
48L, 384L, 350L, 316L, 81L, 150L, 116L, 250L, 16L, 284L, 217L,
184L, 49L, 385L, 351L, 317L, 82L, 151L, 117L, 251L, 17L, 285L,
218L, 185L, 50L, 386L, 352L, 318L, 83L, 152L, 118L, 252L, 18L,
286L, 219L, 186L, 51L, 387L, 353L, 319L, 84L, 153L, 119L, 253L,
19L, 287L, 220L, 187L, 52L, 388L, 354L, 320L, 85L, 154L, 120L,
254L), .Label = c("1-Apr-00", "1-Apr-01", "1-Apr-02", "1-Apr-03",
"1-Apr-04", "1-Apr-05", "1-Apr-06", "1-Apr-07", "1-Apr-08", "1-Apr-09",
"1-Apr-10", "1-Apr-11", "1-Apr-12", "1-Apr-13", "1-Apr-14", "1-Apr-15",
"1-Apr-16", "1-Apr-17", "1-Apr-18", "1-Apr-86", "1-Apr-87", "1-Apr-88",
"1-Apr-89", "1-Apr-90", "1-Apr-91", "1-Apr-92", "1-Apr-93", "1-Apr-94",
"1-Apr-95", "1-Apr-96", "1-Apr-97", "1-Apr-98", "1-Apr-99", "1-Aug-00",
"1-Aug-01", "1-Aug-02", "1-Aug-03", "1-Aug-04", "1-Aug-05", "1-Aug-06",
"1-Aug-07", "1-Aug-08", "1-Aug-09", "1-Aug-10", "1-Aug-11", "1-Aug-12",
"1-Aug-13", "1-Aug-14", "1-Aug-15", "1-Aug-16", "1-Aug-17", "1-Aug-18",
"1-Aug-86", "1-Aug-87", "1-Aug-88", "1-Aug-89", "1-Aug-90", "1-Aug-91",
"1-Aug-92", "1-Aug-93", "1-Aug-94", "1-Aug-95", "1-Aug-96", "1-Aug-97",
"1-Aug-98", "1-Aug-99", "1-Dec-00", "1-Dec-01", "1-Dec-02", "1-Dec-03",
"1-Dec-04", "1-Dec-05", "1-Dec-06", "1-Dec-07", "1-Dec-08", "1-Dec-09",
"1-Dec-10", "1-Dec-11", "1-Dec-12", "1-Dec-13", "1-Dec-14", "1-Dec-15",
"1-Dec-16", "1-Dec-17", "1-Dec-18", "1-Dec-85", "1-Dec-86", "1-Dec-87",
"1-Dec-88", "1-Dec-89", "1-Dec-90", "1-Dec-91", "1-Dec-92", "1-Dec-93",
"1-Dec-94", "1-Dec-95", "1-Dec-96", "1-Dec-97", "1-Dec-98", "1-Dec-99",
"1-Feb-00", "1-Feb-01", "1-Feb-02", "1-Feb-03", "1-Feb-04", "1-Feb-05",
"1-Feb-06", "1-Feb-07", "1-Feb-08", "1-Feb-09", "1-Feb-10", "1-Feb-11",
"1-Feb-12", "1-Feb-13", "1-Feb-14", "1-Feb-15", "1-Feb-16", "1-Feb-17",
"1-Feb-18", "1-Feb-19", "1-Feb-86", "1-Feb-87", "1-Feb-88", "1-Feb-89",
"1-Feb-90", "1-Feb-91", "1-Feb-92", "1-Feb-93", "1-Feb-94", "1-Feb-95",
"1-Feb-96", "1-Feb-97", "1-Feb-98", "1-Feb-99", "1-Jan-00", "1-Jan-01",
"1-Jan-02", "1-Jan-03", "1-Jan-04", "1-Jan-05", "1-Jan-06", "1-Jan-07",
"1-Jan-08", "1-Jan-09", "1-Jan-10", "1-Jan-11", "1-Jan-12", "1-Jan-13",
"1-Jan-14", "1-Jan-15", "1-Jan-16", "1-Jan-17", "1-Jan-18", "1-Jan-19",
"1-Jan-86", "1-Jan-87", "1-Jan-88", "1-Jan-89", "1-Jan-90", "1-Jan-91",
"1-Jan-92", "1-Jan-93", "1-Jan-94", "1-Jan-95", "1-Jan-96", "1-Jan-97",
"1-Jan-98", "1-Jan-99", "1-Jul-00", "1-Jul-01", "1-Jul-02", "1-Jul-03",
"1-Jul-04", "1-Jul-05", "1-Jul-06", "1-Jul-07", "1-Jul-08", "1-Jul-09",
"1-Jul-10", "1-Jul-11", "1-Jul-12", "1-Jul-13", "1-Jul-14", "1-Jul-15",
"1-Jul-16", "1-Jul-17", "1-Jul-18", 

Re: [R] subsetting ls() as per class...

2018-07-28 Thread akshay kulkarni
dear peter,
 Its workingthanks a lot...

yours sincerely,
AKSHAY M KULKARNI

From: Peter Langfelder 
Sent: Saturday, July 28, 2018 11:41 AM
To: akshay...@hotmail.com
Cc: r-help
Subject: Re: [R] subsetting ls() as per class...

Looking at ?rm, my solution would be something like

rm(list = grep("\\.NS$", ls(), value = TRUE))

But test it since I have not tested it.

Peter


On Fri, Jul 27, 2018 at 10:58 PM akshay kulkarni  wrote:
>
> dear memebers,
>I am using R in AWS linux instance for my 
> research. I want to remove certain objects from the global environment  to 
> reduce my EBS cost..for example, I want to remove all objects of class "xts", 
> "zoo". Is there any way to automate this, instead of removing the objects one 
> by one?
>
> Basically, I want to subset  ls() according to class, and then remove that 
> subset by using rm function.
>
> I got to know about mget in SO, but that is not working in my case
>
> Also, all the above objects end with ".NS".  I came to know that you can 
> remove objects starting with a certain pattern; is there any way to remove 
> objects ending in a certain pattern?
>
> very many thanks for your time and effort...
> yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting ls() as per class...

2018-07-28 Thread William Dunlap via R-help
> objClasses <- unlist(eapply(.GlobalEnv, function(x)class(x)[1]))
> head(objClasses)
f E
   "function" "environment"
   df h
 "tbl_df""function"
   myData L
   "list""list"
> names(objClasses)[objClasses=="tbl_df"]
[1] "df"  "out"


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jul 27, 2018 at 10:58 PM, akshay kulkarni 
wrote:

> dear memebers,
>I am using R in AWS linux instance for my
> research. I want to remove certain objects from the global environment  to
> reduce my EBS cost..for example, I want to remove all objects of class
> "xts", "zoo". Is there any way to automate this, instead of removing the
> objects one by one?
>
> Basically, I want to subset  ls() according to class, and then remove that
> subset by using rm function.
>
> I got to know about mget in SO, but that is not working in my case
>
> Also, all the above objects end with ".NS".  I came to know that you can
> remove objects starting with a certain pattern; is there any way to remove
> objects ending in a certain pattern?
>
> very many thanks for your time and effort...
> yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting ls() as per class...

2018-07-28 Thread Henrik Bengtsson
The ll() function of R.oo returns a data.frame with various attributes that
you can subset on, e.g.

> subset(R.oo::ll(), data.class %in% c("zoo", "xts"))
   member data.class dimension objectSize
2  fzzoo10   1344
4  sample.xtsxts  c(180,4)  10128
5   xzoo 5528
6  x1zoo 5880
7  x2zoo 5496
9   yzoo 5   1040
11  zzooc(5,3)   1184
12 z0zoo 0448
13 z2zooc(4,3)904
14z20zooc(4,0)616
15 z3zoo 8528
16 z4zoo 5592
17 z5zoo 5792

Henrik

On Sat, Jul 28, 2018, 08:22 Jeff Newmiller  wrote:

> You can extract the names into a character vector with ls and then use
> grep(..., values=TRUE ) to select which ones you want to remove, and then
> pass that list to rm.
>
> However, due to the way R handles memory you are unlikely to see much
> savings by doing this. I would recommend focusing on creating a script or
> series of scripts that can allow you to re-create your analysis, and then
> restarting R whenever you are ready to reduce memory usage. This will have
> the side benefit of leaving you with a verified-complete record of how your
> analysis was done.
>
> On July 27, 2018 10:58:36 PM PDT, akshay kulkarni 
> wrote:
> >dear memebers,
> >I am using R in AWS linux instance for my research. I want to remove
> >certain objects from the global environment  to reduce my EBS cost..for
> >example, I want to remove all objects of class "xts", "zoo". Is there
> >any way to automate this, instead of removing the objects one by one?
> >
> >Basically, I want to subset  ls() according to class, and then remove
> >that subset by using rm function.
> >
> >I got to know about mget in SO, but that is not working in my case
> >
> >Also, all the above objects end with ".NS".  I came to know that you
> >can remove objects starting with a certain pattern; is there any way to
> >remove objects ending in a certain pattern?
> >
> >very many thanks for your time and effort...
> >yours sincerely,
> >AKSHAY M KULKARNI
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting ls() as per class...

2018-07-28 Thread Jeff Newmiller
You can extract the names into a character vector with ls and then use 
grep(..., values=TRUE ) to select which ones you want to remove, and then pass 
that list to rm.

However, due to the way R handles memory you are unlikely to see much savings 
by doing this. I would recommend focusing on creating a script or series of 
scripts that can allow you to re-create your analysis, and then restarting R 
whenever you are ready to reduce memory usage. This will have the side benefit 
of leaving you with a verified-complete record of how your analysis was done.

On July 27, 2018 10:58:36 PM PDT, akshay kulkarni  wrote:
>dear memebers,
>I am using R in AWS linux instance for my research. I want to remove
>certain objects from the global environment  to reduce my EBS cost..for
>example, I want to remove all objects of class "xts", "zoo". Is there
>any way to automate this, instead of removing the objects one by one?
>
>Basically, I want to subset  ls() according to class, and then remove
>that subset by using rm function.
>
>I got to know about mget in SO, but that is not working in my case
>
>Also, all the above objects end with ".NS".  I came to know that you
>can remove objects starting with a certain pattern; is there any way to
>remove objects ending in a certain pattern?
>
>very many thanks for your time and effort...
>yours sincerely,
>AKSHAY M KULKARNI
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting ls() as per class...

2018-07-28 Thread Peter Langfelder
Looking at ?rm, my solution would be something like

rm(list = grep("\\.NS$", ls(), value = TRUE))

But test it since I have not tested it.

Peter


On Fri, Jul 27, 2018 at 10:58 PM akshay kulkarni  wrote:
>
> dear memebers,
>I am using R in AWS linux instance for my 
> research. I want to remove certain objects from the global environment  to 
> reduce my EBS cost..for example, I want to remove all objects of class "xts", 
> "zoo". Is there any way to automate this, instead of removing the objects one 
> by one?
>
> Basically, I want to subset  ls() according to class, and then remove that 
> subset by using rm function.
>
> I got to know about mget in SO, but that is not working in my case
>
> Also, all the above objects end with ".NS".  I came to know that you can 
> remove objects starting with a certain pattern; is there any way to remove 
> objects ending in a certain pattern?
>
> very many thanks for your time and effort...
> yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting ls() as per class...

2018-07-27 Thread akshay kulkarni
dear memebers,
   I am using R in AWS linux instance for my 
research. I want to remove certain objects from the global environment  to 
reduce my EBS cost..for example, I want to remove all objects of class "xts", 
"zoo". Is there any way to automate this, instead of removing the objects one 
by one?

Basically, I want to subset  ls() according to class, and then remove that 
subset by using rm function.

I got to know about mget in SO, but that is not working in my case

Also, all the above objects end with ".NS".  I came to know that you can remove 
objects starting with a certain pattern; is there any way to remove objects 
ending in a certain pattern?

very many thanks for your time and effort...
yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting lists....

2018-06-18 Thread MacQueen, Don via R-help
The unlist solution is quite clever.

But I will note that none of the solutions offered so far succeed if the input 
is, for example,

   YH <- list(1:5, letters[1:3], 1:7)
iuhV <- c(2,2,4)

and the desire is to return a list whose elements are of the same types as the 
input list. Which would be the sensible thing to do if the input list mixes 
types.

(Note that the output structure was not specified in the original question, nor 
was it stated whether the input list could mix types)

unlist(YH, recursive = FALSE, use.names = FALSE)[cumsum( lengths(YH)) - 
lengths(YH) + iuhV]
[1] "2" "b" "4"

However,

> lapply( 1:length(YH), function(i) { YH[[i]][iuhV[i]]})
[[1]]
[1] 2

[[2]]
[1] "b"

[[3]]
[1] 4

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 6/18/18, 10:00 AM, "R-help on behalf of Berry, Charles" 
 wrote:



> On Jun 18, 2018, at 4:15 AM, akshay kulkarni  
wrote:
> 
> correctionI want the method without a for loop

Here are two. The first is more readable, but the second is 5 times faster.

mapply("[", YH, iuhV)

unlist(YH, recursive = FALSE, use.names = FALSE)[cumsum( lengths(YH)) - 
lengths(YH) + iuhV]

HTH,

Chuck

> 
> From: akshay kulkarni 
> Sent: Monday, June 18, 2018 4:25 PM
> To: R help Mailing list
> Subject: subsetting lists
> 
> dear members,
>I have list YH and index vector iuhV. I want 
to select iuhV[1] from YH[[1]], iuhv[2] from YH[[2]], iuhv[3] from 
YH[[3]]..iuhv[n] from YH[[n]]...
> 
> How to do this?
> I searched SO and the internet but was bootless
> 
> Very many thanks for your time and effort.
> Yours sincerely,
> AKSHAY M KULKARNI
> 
>   [[alternative HTML version deleted]]
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting lists....

2018-06-18 Thread Berry, Charles



> On Jun 18, 2018, at 4:15 AM, akshay kulkarni  wrote:
> 
> correctionI want the method without a for loop

Here are two. The first is more readable, but the second is 5 times faster.

mapply("[", YH, iuhV)

unlist(YH, recursive = FALSE, use.names = FALSE)[cumsum( lengths(YH)) - 
lengths(YH) + iuhV]

HTH,

Chuck

> 
> From: akshay kulkarni 
> Sent: Monday, June 18, 2018 4:25 PM
> To: R help Mailing list
> Subject: subsetting lists
> 
> dear members,
>I have list YH and index vector iuhV. I want to 
> select iuhV[1] from YH[[1]], iuhv[2] from YH[[2]], iuhv[3] from 
> YH[[3]]..iuhv[n] from YH[[n]]...
> 
> How to do this?
> I searched SO and the internet but was bootless
> 
> Very many thanks for your time and effort.
> Yours sincerely,
> AKSHAY M KULKARNI
> 
>   [[alternative HTML version deleted]]
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting lists....

2018-06-18 Thread Eric Berger
 sapply( 1:length(YH), function(i) { YH[[i]][iuhV[i]]})

On Mon, Jun 18, 2018 at 1:55 PM, akshay kulkarni 
wrote:

> dear members,
> I have list YH and index vector iuhV. I want
> to select iuhV[1] from YH[[1]], iuhv[2] from YH[[2]], iuhv[3] from
> YH[[3]]..iuhv[n] from YH[[n]]...
>
> How to do this?
> I searched SO and the internet but was bootless
>
> Very many thanks for your time and effort.
> Yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting lists....

2018-06-18 Thread akshay kulkarni
dear members,
I have list YH and index vector iuhV. I want to 
select iuhV[1] from YH[[1]], iuhv[2] from YH[[2]], iuhv[3] from 
YH[[3]]..iuhv[n] from YH[[n]]...

How to do this?
I searched SO and the internet but was bootless

Very many thanks for your time and effort.
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting comparison problem

2018-03-11 Thread Neha Aggarwal
Hello All,
I am facing a unique problem and am unable to find any help in R help pages
or online. I will appreciate your help for the following problem:
I have 2 data-frames, samples below and there is an expected output

R Dataframe1:
C1  C2   C3 C4.. CN
R1   0  1   0   1
R21  0  11
R31  0   0 0
.
.
.
RN

U Dataframe2 :
 C1 C2C3 C4.. CN
U1 1   101
U2 1   1 11


Expected Output:
U1 satisfies R1, R3
U2 satisfies R1, R2, R3

So this is a comparison of dataframes problem, with a subset dimension.
There are 2 dataframe R and U. column names are same. There are certain
columns belonging to each row in dataframe 1, denoted as 1s, while there
are certain cols to each U denoted as 1s in each URow in dataframe2.

I have to find relationships between Rs and Us. So i start with each U row
in U dataframe (lets say U1 row) and try to find all the rows in R
dataframe, which are subset of U1 row.

I cant find a way to compare rows to see if one is subset of
anotherwhat can I try, any pointers/ packages will be great help.
Please help.

Thanks
Neha

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting comparison problem

2018-03-11 Thread Jim Lemon
Hi Neha,
This might help:

R<-read.table(text="C1 C2 C3 C4
R1 0 1 0 1
R2 1 0 1 1
R3 1 0 0 0",
header=TRUE)
U<-read.table(text="C1 C2 C3 C4
U1 1 1 0 1
U2 1 1 1 1",
header=TRUE)
# these are matrices - I think this will work for dataframes as well
for(ui in 1:dim(U)[1]) {
 for(ri in 1:dim(R)[1]) {
  if(sum(U[ui,][ri,])==sum(R[ri,]))
   cat("R$",rownames(R)[ri]," subset of ","U$",rownames(U)[ui],"\n",sep="")
 }
}

Jim

On Mon, Mar 12, 2018 at 1:59 PM, David Winsemius  wrote:
>
>> On Mar 11, 2018, at 3:32 PM, Neha Aggarwal  
>> wrote:
>>
>> Hello All,
>> I am facing a unique problem and am unable to find any help in R help pages
>> or online. I will appreciate your help for the following problem:
>> I have 2 data-frames, samples below and there is an expected output
>>
>> R Dataframe1:
>>C1  C2   C3 C4.. CN
>> R1   0  1   0   1
>> R21  0  11
>> R31  0   0 0
>> .
>> .
>> .
>> RN
>>
>> U Dataframe2 :
>> C1 C2C3 C4.. CN
>> U1 1   101
>> U2 1   1 11
>>
>>
>> Expected Output:
>> U1 satisfies R1, R3
>> U2 satisfies R1, R2, R3
>>
>
> I don't think you have communicated what sort of meaning is attached to the 
> word "satisfies".
>
> Here's a double loop that reports membership of the column names of each row 
> of U (Dataframe2) in each row of R (Dataframe1):
>
>  apply( Dataframe2, 1, function(x){ z <- which(x==1);
>z2 <- names(x)[z];
> zlist=apply(Dataframe1, 1, function(y){ z3 <- 
> which(y==1);
> z4 <- 
> names(y)[z3];
> z4[ which(z4 
> %in% z2) ]});
> zlist})
> $U1
> $U1$R1
> [1] "C2" "C4"
>
> $U1$R2
> [1] "C1" "C4"
>
> $U1$R3
> [1] "C1"
>
>
> $U2
> $U2$R1
> [1] "C2" "C4"
>
> $U2$R2
> [1] "C1" "C3" "C4"
>
> $U2$R3
> [1] "C1"
>
> --
> David.
>
>
>> So this is a comparison of dataframes problem, with a subset dimension.
>> There are 2 dataframe R and U. column names are same. There are certain
>> columns belonging to each row in dataframe 1, denoted as 1s, while there
>> are certain cols to each U denoted as 1s in each URow in dataframe2.
>>
>> I have to find relationships between Rs and Us. So i start with each U row
>> in U dataframe (lets say U1 row) and try to find all the rows in R
>> dataframe, which are subset of U1 row.
>>
>> I cant find a way to compare rows to see if one is subset of
>> anotherwhat can I try, any pointers/ packages will be great help.
>> Please help.
>>
>> Thanks
>> Neha
>>
>>   [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   
> -Gehm's Corollary to Clarke's Third Law
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting comparison problem

2018-03-11 Thread David Winsemius

> On Mar 11, 2018, at 3:32 PM, Neha Aggarwal  wrote:
> 
> Hello All,
> I am facing a unique problem and am unable to find any help in R help pages
> or online. I will appreciate your help for the following problem:
> I have 2 data-frames, samples below and there is an expected output
> 
> R Dataframe1:
>C1  C2   C3 C4.. CN
> R1   0  1   0   1
> R21  0  11
> R31  0   0 0
> .
> .
> .
> RN
> 
> U Dataframe2 :
> C1 C2C3 C4.. CN
> U1 1   101
> U2 1   1 11
> 
> 
> Expected Output:
> U1 satisfies R1, R3
> U2 satisfies R1, R2, R3
> 

I don't think you have communicated what sort of meaning is attached to the 
word "satisfies".

Here's a double loop that reports membership of the column names of each row of 
U (Dataframe2) in each row of R (Dataframe1):

 apply( Dataframe2, 1, function(x){ z <- which(x==1);
   z2 <- names(x)[z];  
zlist=apply(Dataframe1, 1, function(y){ z3 <- 
which(y==1); 
z4 <- 
names(y)[z3]; 
z4[ which(z4 
%in% z2) ]}); 
zlist})
$U1
$U1$R1
[1] "C2" "C4"

$U1$R2
[1] "C1" "C4"

$U1$R3
[1] "C1"


$U2
$U2$R1
[1] "C2" "C4"

$U2$R2
[1] "C1" "C3" "C4"

$U2$R3
[1] "C1"

-- 
David.


> So this is a comparison of dataframes problem, with a subset dimension.
> There are 2 dataframe R and U. column names are same. There are certain
> columns belonging to each row in dataframe 1, denoted as 1s, while there
> are certain cols to each U denoted as 1s in each URow in dataframe2.
> 
> I have to find relationships between Rs and Us. So i start with each U row
> in U dataframe (lets say U1 row) and try to find all the rows in R
> dataframe, which are subset of U1 row.
> 
> I cant find a way to compare rows to see if one is subset of
> anotherwhat can I try, any pointers/ packages will be great help.
> Please help.
> 
> Thanks
> Neha
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting comparison problem

2018-03-11 Thread Jeff Newmiller

Responses inline.

On Sun, 11 Mar 2018, Neha Aggarwal wrote:


Hello All,
I am facing a unique problem and am unable to find any help in R help pages
or online. I will appreciate your help for the following problem:
I have 2 data-frames, samples below and there is an expected output

R Dataframe1:
   C1  C2   C3 C4.. CN
R1   0  1   0   1
R21  0  11
R31  0   0 0
.
.
.
RN

U Dataframe2 :
C1 C2C3 C4.. CN
U1 1   101
U2 1   1 11


Expected Output:
U1 satisfies R1, R3
U2 satisfies R1, R2, R3

So this is a comparison of dataframes problem, with a subset dimension.
There are 2 dataframe R and U. column names are same. There are certain
columns belonging to each row in dataframe 1, denoted as 1s, while there
are certain cols to each U denoted as 1s in each URow in dataframe2.

I have to find relationships between Rs and Us. So i start with each U row
in U dataframe (lets say U1 row) and try to find all the rows in R
dataframe, which are subset of U1 row.

I cant find a way to compare rows to see if one is subset of
anotherwhat can I try, any pointers/ packages will be great help.
Please help.

Thanks
Neha

[[alternative HTML version deleted]]


As the Posting Guide says (you have read it, haven't you?), please post 
plain text... the mailing list mangles your code with varying levels of 
damage as it tries to fix this problem for you. It also helps if you can 
pose your question in R code rather than pseudo-code and formatted data 
tables.


Your problem appears to be an outer join of binary subsets... I don't 
think this is a very common problem structure (in most cases you want to 
avoid outer joins if you can because they are computationally expensive), 
but you can read ?outer and ?expand.grid to see some ways to pair up all 
possible row indexes.  If you know that the number of rows in both inputs 
is <32, this problem can be optimized for speed and memory with the bitops 
package, or for larger size problems you can use the bit package. The 
below code shows the skeleton of logic with no such optimizations, and is 
likely the most practical solution for a one-off analysis:


##
r <- read.table( text=
" C1   C2  C3  C4
R10 1   0   1
R21 0   1   1
R31 0   0   0
", header=TRUE )

u <- read.table( text=
"   C1  C2  C3  C4
U1  1   1   0   1
U2  1   1   1   1
", header=TRUE )

rmx <- as.matrix( r )
umx <- as.matrix( u )

result <- expand.grid( R = rownames( rmx )
 , U = rownames( umx )
 )

# see how:
1L - umx[ U, ]  # 1 for every 0 in u
rmx[ R, ]   # 1 for every 1 in r
( 1L - umx[ U, ] ) * rmx[ R, ] # 1 where both have 1

# do it:
# for every row, 0 where both conditions are true in any column
result$IN <- 1L - with( result
  , apply(   ( 1L - umx[ U, ] ) # any 0 column
   * rmx[ R, ]  # any 1 column
 , 1  # by rows
 , max
 )
  )
result
# show key pairings only
result[ as.logical( result$IN ), c( "U", "R" ) ]
##

---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting comparison problem

2018-03-11 Thread Neha Aggarwal
Hello All,
I am facing a unique problem and am unable to find any help in R help pages
or online. I will appreciate your help for the following problem:
I have 2 data-frames, samples below and there is an expected output

R Dataframe1:
C1  C2   C3 C4.. CN
R1   0  1   0   1
R21  0  11
R31  0   0 0
.
.
.
RN

U Dataframe2 :
 C1 C2C3 C4.. CN
U1 1   101
U2 1   1 11


Expected Output:
U1 satisfies R1, R3
U2 satisfies R1, R2, R3

So this is a comparison of dataframes problem, with a subset dimension.
There are 2 dataframe R and U. column names are same. There are certain
columns belonging to each row in dataframe 1, denoted as 1s, while there
are certain cols to each U denoted as 1s in each URow in dataframe2.

I have to find relationships between Rs and Us. So i start with each U row
in U dataframe (lets say U1 row) and try to find all the rows in R
dataframe, which are subset of U1 row.

I cant find a way to compare rows to see if one is subset of
anotherwhat can I try, any pointers/ packages will be great help.
Please help.

Thanks
Neha

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting

2016-02-24 Thread Val
Thank you for the info. I did solve it using
unlist lapply strsplit  functions.


On Wed, Feb 24, 2016 at 9:31 PM, Bert Gunter  wrote:

> Have you gone through any R tutorials yet? I didn't entirely
> understand your question (and so cannot answer), but this sounds like
> a basic subsetting/data wrangling task that you should know how to do
> if you have gone through a basic tutorial or two.
>
> See also ?subset, ?"[" (basic indexing) and possibly also the plyR,
> dplyr, or data.table packages that provide what some consider more
> convenient and/or faster interfaces to these sorts of tasks.
>
> See also: http://vita.had.co.nz/papers/tidy-data.pdf
>
> for a nice article on "tidying" data (using plyr/dplyr).
>
>
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Feb 24, 2016 at 6:57 PM, Val  wrote:
> > Hi all,
> >
> > One of the the columns of a data frame has a value such like
> >
> > S-2001-yy
> > S-2004-xx
> > F-2007-SS
> > and so on
> >
> > based on this column (variable) I want  subset a data frame  where the
> > middle value of this variable is between 2001 to 2004.
> > THE END RESULT THE DATA FRAME  WILL BE THIS.
> >
> > S-2001-yy
> > S-2004-xx
> >
> >
> > THANK YOU IN ADVANCE
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting

2016-02-24 Thread Ryan Derickson
A combination of subsetting and ?substr should get you close to a solution.
If the middle sequence you referenced isn't always the same distance from
the first character, you may have to involve regular expressions to find
"the middle".

On Wednesday, February 24, 2016, Bert Gunter  wrote:

> Have you gone through any R tutorials yet? I didn't entirely
> understand your question (and so cannot answer), but this sounds like
> a basic subsetting/data wrangling task that you should know how to do
> if you have gone through a basic tutorial or two.
>
> See also ?subset, ?"[" (basic indexing) and possibly also the plyR,
> dplyr, or data.table packages that provide what some consider more
> convenient and/or faster interfaces to these sorts of tasks.
>
> See also: http://vita.had.co.nz/papers/tidy-data.pdf
>
> for a nice article on "tidying" data (using plyr/dplyr).
>
>
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Feb 24, 2016 at 6:57 PM, Val >
> wrote:
> > Hi all,
> >
> > One of the the columns of a data frame has a value such like
> >
> > S-2001-yy
> > S-2004-xx
> > F-2007-SS
> > and so on
> >
> > based on this column (variable) I want  subset a data frame  where the
> > middle value of this variable is between 2001 to 2004.
> > THE END RESULT THE DATA FRAME  WILL BE THIS.
> >
> > S-2001-yy
> > S-2004-xx
> >
> >
> > THANK YOU IN ADVANCE
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting

2016-02-24 Thread Bert Gunter
Have you gone through any R tutorials yet? I didn't entirely
understand your question (and so cannot answer), but this sounds like
a basic subsetting/data wrangling task that you should know how to do
if you have gone through a basic tutorial or two.

See also ?subset, ?"[" (basic indexing) and possibly also the plyR,
dplyr, or data.table packages that provide what some consider more
convenient and/or faster interfaces to these sorts of tasks.

See also: http://vita.had.co.nz/papers/tidy-data.pdf

for a nice article on "tidying" data (using plyr/dplyr).



Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Feb 24, 2016 at 6:57 PM, Val  wrote:
> Hi all,
>
> One of the the columns of a data frame has a value such like
>
> S-2001-yy
> S-2004-xx
> F-2007-SS
> and so on
>
> based on this column (variable) I want  subset a data frame  where the
> middle value of this variable is between 2001 to 2004.
> THE END RESULT THE DATA FRAME  WILL BE THIS.
>
> S-2001-yy
> S-2004-xx
>
>
> THANK YOU IN ADVANCE
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting

2016-02-24 Thread Val
Hi all,

One of the the columns of a data frame has a value such like

S-2001-yy
S-2004-xx
F-2007-SS
and so on

based on this column (variable) I want  subset a data frame  where the
middle value of this variable is between 2001 to 2004.
THE END RESULT THE DATA FRAME  WILL BE THIS.

S-2001-yy
S-2004-xx


THANK YOU IN ADVANCE

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a square marix

2016-01-05 Thread Sarah Goslee
It really isn't clear what you want, and posting in HTML has mangled
what you did provide.

Please use dput() to provide sample data, and give us a clear idea of
what you want, ideally an example of what the output should look like.
Adding the R code you've tried to use is also a good idea.

Sarah

On Tue, Jan 5, 2016 at 4:06 AM, Tawanda Tarakini
 wrote:
> I have a global matrix (e.g. table below) of species feeding. I am trying
> to create specific matrix for specific sites. If for example a subset is to
> have sp1, sp3 and spp only these 3 species should be appearing in the
> subset (both column and rows).
>
> I have been checking online help but I seem not to get my scenario
>
>
>
> Sp1
>
> Sp2
>
> Sp3
>
> Sp4
>
> Sp5
>
> Sp6
>
> Sp1
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> 0
>
> Sp2
>
> 1
>
> 0
>
> 0
>
> 0
>
> 1
>
> 0
>
> Sp3
>
> 0
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> Sp4
>
> 0
>
> 1
>
> 0
>
> 1
>
> 0
>
> 0
>
> Sp5
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> 0
>
> Sp6
>
> 0
>
> 0
>
> 0
>
> 1
>
> 1
>
> 0
>
> --
> Kind Regards
>
> Tawanda Tarakini
>
> Lecturer and Industrial attachment coordinator
> Department of Wildlife, Ecology and Conservation
> Chinhoyi University of Technology
> Bag 7724, Chinhoyi
> Cell: +263 775 321 722
> Alternative email: ttarak...@cut.ac.zw
>
> [[alternative HTML version deleted]]
>


-- 
Sarah Goslee
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a square marix

2016-01-05 Thread Tawanda Tarakini
I have a global matrix (e.g. table below) of species feeding. I am trying
to create specific matrix for specific sites. If for example a subset is to
have sp1, sp3 and spp only these 3 species should be appearing in the
subset (both column and rows).

I have been checking online help but I seem not to get my scenario



Sp1

Sp2

Sp3

Sp4

Sp5

Sp6

Sp1

0

0

1

0

0

0

Sp2

1

0

0

0

1

0

Sp3

0

0

0

1

0

0

Sp4

0

1

0

1

0

0

Sp5

0

0

1

0

0

0

Sp6

0

0

0

1

1

0

-- 
Kind Regards

Tawanda Tarakini

Lecturer and Industrial attachment coordinator
Department of Wildlife, Ecology and Conservation
Chinhoyi University of Technology
Bag 7724, Chinhoyi
Cell: +263 775 321 722
Alternative email: ttarak...@cut.ac.zw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a square marix

2016-01-05 Thread David L Carlson
Assuming I've reconstructed your data correctly:

> dta
Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
Sp1   0   1   0   0   0   0
Sp2   0   0   0   1   0   0
Sp3   1   0   0   0   1   0
Sp4   0   0   1   1   0   1
Sp5   0   1   0   0   0   1
Sp6   0   0   0   0   0   0
> dput(dta)
structure(list(Sp1 = c(0L, 0L, 1L, 0L, 0L, 0L), Sp2 = c(1L, 0L, 
0L, 0L, 1L, 0L), Sp3 = c(0L, 0L, 0L, 1L, 0L, 0L), Sp4 = c(0L, 
1L, 0L, 1L, 0L, 0L), Sp5 = c(0L, 0L, 1L, 0L, 0L, 0L), Sp6 = c(0L, 
0L, 0L, 1L, 1L, 0L)), .Names = c("Sp1", "Sp2", "Sp3", "Sp4", 
"Sp5", "Sp6"), class = "data.frame", row.names = c("Sp1", "Sp2", 
"Sp3", "Sp4", "Sp5", "Sp6"))

The results of dput(dta) is what you should include in your plain text email.

As for the subset, your email indicated: sp1, sp3 and spp. But none of these 
are labels in your data set since R is case sensitive. Try for example:

> sub <- c("Sp1", "Sp3", "Sp5") 
> dta[sub, sub]
Sp1 Sp3 Sp5
Sp1   0   0   0
Sp3   1   0   1
Sp5   0   0   0

And definitely spend some time with the available free R tutorials so that you 
understand how R works.

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee
Sent: Tuesday, January 5, 2016 10:15 AM
To: Tawanda Tarakini
Cc: r-help
Subject: Re: [R] Subsetting a square marix

It really isn't clear what you want, and posting in HTML has mangled
what you did provide.

Please use dput() to provide sample data, and give us a clear idea of
what you want, ideally an example of what the output should look like.
Adding the R code you've tried to use is also a good idea.

Sarah

On Tue, Jan 5, 2016 at 4:06 AM, Tawanda Tarakini
<tawandatiz...@gmail.com> wrote:
> I have a global matrix (e.g. table below) of species feeding. I am trying
> to create specific matrix for specific sites. If for example a subset is to
> have sp1, sp3 and spp only these 3 species should be appearing in the
> subset (both column and rows).
>
> I have been checking online help but I seem not to get my scenario
>
>
>
> Sp1
>
> Sp2
>
> Sp3
>
> Sp4
>
> Sp5
>
> Sp6
>
> Sp1
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> 0
>
> Sp2
>
> 1
>
> 0
>
> 0
>
> 0
>
> 1
>
> 0
>
> Sp3
>
> 0
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> Sp4
>
> 0
>
> 1
>
> 0
>
> 1
>
> 0
>
> 0
>
> Sp5
>
> 0
>
> 0
>
> 1
>
> 0
>
> 0
>
> 0
>
> Sp6
>
> 0
>
> 0
>
> 0
>
> 1
>
> 1
>
> 0
>
> --
> Kind Regards
>
> Tawanda Tarakini
>
> Lecturer and Industrial attachment coordinator
> Department of Wildlife, Ecology and Conservation
> Chinhoyi University of Technology
> Bag 7724, Chinhoyi
> Cell: +263 775 321 722
> Alternative email: ttarak...@cut.ac.zw
>
> [[alternative HTML version deleted]]
>


-- 
Sarah Goslee
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting dataframe by the nearest values of a vector elements

2015-11-10 Thread Harun Rashid via R-help
HI Jean,
Here is part of my data. As you can see, I have cross-section point and 
corresponding elevation of a river. Now I want to select cross-section 
points by 50m interval. But the real cross-section data might not have 
exact points say 0, 50, 100,…and so on. Therefore, I need to take points 
closest to those values.

cross_section elevation
1: 5.608 12.765
2: 11.694 10.919
3: 14.784 10.274
4: 20.437 7.949
5: 22.406 7.180
101: 594.255 7.710
102: 595.957 7.717
103: 597.144 7.495
104: 615.925 7.513
105: 615.890 7.751

I checked for some suggestions [particularly here 
]
 
and finally did like this.

intervals <- c(5,50,100,150,200,250,300,350,400,450,500,550,600)
dt = data.table(real.val = w$cross_section, w)
setattr(dt,’sorted’,’cross_section’)
dt[J(intervals), roll = “nearest”]

And it gave me what I wanted.

dt[J(intervals), roll = “nearest”]
cross_section real.val elevation
1: 5 5.608 12.765
2: 50 49.535 6.744
3: 100 115.614 8.026
4: 150 152.029 7.206
5: 200 198.201 6.417
6: 250 247.855 4.497
7: 300 298.450 11.299
8: 350 352.473 11.534
9: 400 401.287 10.550
10: 450 447.768 9.371
11: 500 501.284 8.984
12: 550 550.650 16.488
13: 600 597.144 7.495

I don’t know whether there is a smarter to accomplish this!
Thanks in advance.
Regards,
Harun

On 11/10/15 11:17 AM, David Winsemius wrote:

>> On Nov 9, 2015, at 9:19 AM, Adams, Jean  wrote:
>>
>> Harun,
>>
>> Can you give a simple example?
>>
>> If your cross_section looked like this
>> c(144, 179, 214, 39, 284, 109, 74, 4, 249)
>> and your other vector looked like this
>> c(0, 50, 100, 150, 200, 250, 300, 350)
>> what would you want your subset to look like?
>>
>> Jean
>>
>> On Mon, Nov 9, 2015 at 7:26 AM, Harun Rashid via R-help <
>> r-help@r-project.org> wrote:
>>
>>> Hello,
>>> I have a dataset with two columns 1. cross_section (range: 0~635), and
>>> 2. elevation. The dataset has more than 100 rows. Now I want to make a
>>> subset on the condition that the 'cross_section' column will pick up the
>>> nearest cell from another vector (say 0, 50,100,150,200,.,650).
>>> How can I do this? I would really appreciate a solution.
> If you what the "other vector" to define the “cell” boundaries, and using 
> Jean’s example, it is a simple application of `findInterval`:
>
>> inp <- c(144, 179, 214, 39, 284, 109, 74, 4, 249)
>> mids <- c(0, 50, 100, 150, 200, 250, 300, 350)
>> findInterval( inp, c(mids) )
> [1] 3 4 5 1 6 3 2 1 5
>
> On the other hand ...
>
> To find the number of "closest point", this might help:
>
>
>> findInterval(inp, c( mids[1]-.001, head(mids,-1)+diff(mids)/2, 
>> tail(mids,1)+.001 ) )
> [1] 4 5 5 2 7 3 2 1 6
>
>
>
> —
> David Winsemius
> Alameda, CA, USA
>
​

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting dataframe by the nearest values of a vector elements

2015-11-09 Thread David Winsemius

> On Nov 9, 2015, at 9:19 AM, Adams, Jean  wrote:
> 
> Harun,
> 
> Can you give a simple example?
> 
> If your cross_section looked like this
> c(144, 179, 214, 39, 284, 109, 74, 4, 249)
> and your other vector looked like this
> c(0, 50, 100, 150, 200, 250, 300, 350)
> what would you want your subset to look like?
> 
> Jean
> 
> On Mon, Nov 9, 2015 at 7:26 AM, Harun Rashid via R-help <
> r-help@r-project.org> wrote:
> 
>> Hello,
>> I have a dataset with two columns 1. cross_section (range: 0~635), and
>> 2. elevation. The dataset has more than 100 rows. Now I want to make a
>> subset on the condition that the 'cross_section' column will pick up the
>> nearest cell from another vector (say 0, 50,100,150,200,.,650).
>> How can I do this? I would really appreciate a solution.

If you what the "other vector" to define the “cell” boundaries, and using 
Jean’s example, it is a simple application of `findInterval`:

> inp <- c(144, 179, 214, 39, 284, 109, 74, 4, 249)
> mids <- c(0, 50, 100, 150, 200, 250, 300, 350)

> findInterval( inp, c(mids) )
[1] 3 4 5 1 6 3 2 1 5

On the other hand ...

To find the number of "closest point", this might help:


> findInterval(inp, c( mids[1]-.001, head(mids,-1)+diff(mids)/2, 
> tail(mids,1)+.001 ) )
[1] 4 5 5 2 7 3 2 1 6



— 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting dataframe by the nearest values of a vector elements

2015-11-09 Thread jim holtman
Do you want the "closest" or what range it is in?  If you want the range,
then use 'cut':

> x <- c(144, 179, 214, 39, 284, 109, 74, 4, 249)
> range <- c(0, 50, 100, 150, 200, 250, 300, 350)
> result <- cut(x, breaks = range)
> cbind(x, as.character(result))
  x
 [1,] "144" "(100,150]"
 [2,] "179" "(150,200]"
 [3,] "214" "(200,250]"
 [4,] "39"  "(0,50]"
 [5,] "284" "(250,300]"
 [6,] "109" "(100,150]"
 [7,] "74"  "(50,100]"
 [8,] "4"   "(0,50]"
 [9,] "249" "(200,250]"



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Nov 9, 2015 at 12:19 PM, Adams, Jean  wrote:

> Harun,
>
> Can you give a simple example?
>
> If your cross_section looked like this
> c(144, 179, 214, 39, 284, 109, 74, 4, 249)
> and your other vector looked like this
> c(0, 50, 100, 150, 200, 250, 300, 350)
> what would you want your subset to look like?
>
> Jean
>
> On Mon, Nov 9, 2015 at 7:26 AM, Harun Rashid via R-help <
> r-help@r-project.org> wrote:
>
> > Hello,
> > I have a dataset with two columns 1. cross_section (range: 0~635), and
> > 2. elevation. The dataset has more than 100 rows. Now I want to make a
> > subset on the condition that the 'cross_section' column will pick up the
> > nearest cell from another vector (say 0, 50,100,150,200,.,650).
> > How can I do this? I would really appreciate a solution.
> > Regards,
> > Harun
> > --
> >
> > 
> > 
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting dataframe by the nearest values of a vector elements

2015-11-09 Thread Adams, Jean
Harun,

Can you give a simple example?

If your cross_section looked like this
c(144, 179, 214, 39, 284, 109, 74, 4, 249)
and your other vector looked like this
c(0, 50, 100, 150, 200, 250, 300, 350)
what would you want your subset to look like?

Jean

On Mon, Nov 9, 2015 at 7:26 AM, Harun Rashid via R-help <
r-help@r-project.org> wrote:

> Hello,
> I have a dataset with two columns 1. cross_section (range: 0~635), and
> 2. elevation. The dataset has more than 100 rows. Now I want to make a
> subset on the condition that the 'cross_section' column will pick up the
> nearest cell from another vector (say 0, 50,100,150,200,.,650).
> How can I do this? I would really appreciate a solution.
> Regards,
> Harun
> --
>
> 
> 
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting dataframe by the nearest values of a vector elements

2015-11-09 Thread Harun Rashid via R-help
Hello,
I have a dataset with two columns 1. cross_section (range: 0~635), and 
2. elevation. The dataset has more than 100 rows. Now I want to make a 
subset on the condition that the 'cross_section' column will pick up the 
nearest cell from another vector (say 0, 50,100,150,200,.,650).
How can I do this? I would really appreciate a solution.
Regards,
Harun
-- 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting a data.frame based on a specific group of columns

2015-11-06 Thread Assa Yeroslaviz
Hi,

I have a data frame with multiple columns, which are belong to several
groups
like that:
X1X2X3Y1Y2Y3
1232357230987172
0719811795743
4391907614

I would like to filter such rows out, where the sums in one group is lower
than a specifc value. For example, I would like to set all the values in a
group of cloums to zero, if the sum in one group is less than 100
In my example table I would like to set the values in the second row for
the three X-columns to 0, so that the table looks like that:

X1X2X3Y1Y2Y3
1232357230987172
000811795743
43919000

the same apply also for the Y-values in the last column.
Is there a more efficient way of doing it than going row by row and use the
apply function on each of the subgroups I have in the columns?

thanks
Assa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a data.frame based on a specific group of columns

2015-11-06 Thread jim holtman
Is this what you want:

> x <- read.table(text = "X1X2X3Y1Y2Y3
+ 1232357230987172
+ 0719811795743
+ 4391907614", header = TRUE)
> x
X1  X2   X3  Y1   Y2  Y3
1 1232 357   23   0 9871  72
20  719 811  795 743
3   43 919    0   76  14
>
> # create indices of columns that start with the same character
> indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> names(indx) <- NULL  # remove names so output not messed up
>
> result <- lapply(indx, function(a){
+ row_sum <- rowSums(x[, a])
+ x[row_sum < 100, a] <- 0
+ x[, a]
+ })
> # combine back together
> do.call(cbind, result)
X1  X2   X3  Y1   Y2  Y3
1 1232 357   23   0 9871  72
20   00 811  795 743
3   43 919    00   0


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz  wrote:

> Hi,
>
> I have a data frame with multiple columns, which are belong to several
> groups
> like that:
> X1X2X3Y1Y2Y3
> 1232357230987172
> 0719811795743
> 4391907614
>
> I would like to filter such rows out, where the sums in one group is lower
> than a specifc value. For example, I would like to set all the values in a
> group of cloums to zero, if the sum in one group is less than 100
> In my example table I would like to set the values in the second row for
> the three X-columns to 0, so that the table looks like that:
>
> X1X2X3Y1Y2Y3
> 1232357230987172
> 000811795743
> 43919000
>
> the same apply also for the Y-values in the last column.
> Is there a more efficient way of doing it than going row by row and use the
> apply function on each of the subgroups I have in the columns?
>
> thanks
> Assa
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a data.frame based on a specific group of columns

2015-11-06 Thread Boris Steipe
Please learn to use dput() to post example data.

# This is your data:
data <- structure(c(1232, 0, 43, 357, 71, 919, 23, 9, , 0, 811, 0, 
9871, 795, 76, 72, 743, 14), .Dim = c(3L, 6L), .Dimnames = list(
NULL, c("X1", "X2", "X3", "Y1", "Y2", "Y3")))

data

# define groups and threshold explicitly
groupA <- c(1, 2, 3)
groupB <- c(4, 5, 6)
thrsh  <- 100


# Here's how you evaluate your condition on the member elements of your group
rowSums(data[ , groupA]) > thrsh

# note that you can cast a logical TRUE/FALSE into an integer 0/1
as.numeric(rowSums(data[ , groupA]) >= thrsh)

# ... which you can multiply with your data (*)
data[ , groupA] * as.numeric(rowSums(data[ , groupA]) > thrsh)

#  now you could write this into your matrix
data[ , groupA] <- data[ , groupA] * as.numeric(rowSums(data[ , groupA]) > 
thrsh)
# data[ , groupB] etc ... 

data

# ... but you would be repeating code, therefore better to write this
# as a function:

clearReadsBelowThreshold <- function(m, g, t) {
m[ , g] <- m[ , g] * as.numeric(rowSums(m[ , g]) >= t)
 return(m)
}

data <- clearReadsBelowThreshold(data, groupA, thrsh)
data <- clearReadsBelowThreshold(data, groupB, thrsh)

data




(*) Note that R would do this conversion implicitly but omitting
the conversion will cause confusion for those who read the code
later. 



Cheers,
Boris





On Nov 6, 2015, at 8:53 AM, Assa Yeroslaviz  wrote:

> sorry, for the misunderstanding. here is a more elaborate description of
> what i would like to achieve.
> 
> I have a data set of counts from a RNA-Seq experiment and would like to
> filter reads with low counts. I don't want to set everything to 0
> automatically.
> 
> I would like to set each categorical group (e.g. condition) to 0, if and
> only if all replica in the group together have less than 100 reads.
> in my examples I used X and Y to represents the categories. Ususally they
> have a more distinct names like "control", "knockout1", "dKo" etc.
> 
> So what I really like to do is to check if the sum of all the "control"
> samples is lower than 100. If so, set all control sample to 0. This I would
> like to check *for each category* of every row of the data set.
> 
> I hope it is more clear now
> 
> thanks
> Assa
> 
> 
> On Fri, Nov 6, 2015 at 2:29 PM, jim holtman  wrote:
> 
>> Is this what you want:
>> 
>>> x <- read.table(text = "X1X2X3Y1Y2Y3
>> + 1232357230987172
>> + 0719811795743
>> + 4391907614", header = TRUE)
>>> x
>>X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 20  719 811  795 743
>> 3   43 919    0   76  14
>>> 
>>> # create indices of columns that start with the same character
>>> indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
>>> names(indx) <- NULL  # remove names so output not messed up
>>> 
>>> result <- lapply(indx, function(a){
>> + row_sum <- rowSums(x[, a])
>> + x[row_sum < 100, a] <- 0
>> + x[, a]
>> + })
>>> # combine back together
>>> do.call(cbind, result)
>>X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 20   00 811  795 743
>> 3   43 919    00   0
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>> 
>> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz  wrote:
>> 
>>> Hi,
>>> 
>>> I have a data frame with multiple columns, which are belong to several
>>> groups
>>> like that:
>>> X1X2X3Y1Y2Y3
>>> 1232357230987172
>>> 0719811795743
>>> 4391907614
>>> 
>>> I would like to filter such rows out, where the sums in one group is lower
>>> than a specifc value. For example, I would like to set all the values in a
>>> group of cloums to zero, if the sum in one group is less than 100
>>> In my example table I would like to set the values in the second row for
>>> the three X-columns to 0, so that the table looks like that:
>>> 
>>> X1X2X3Y1Y2Y3
>>> 1232357230987172
>>> 000811795743
>>> 43919000
>>> 
>>> the same apply also for the Y-values in the last column.
>>> Is there a more efficient way of doing it than going row by row and use
>>> the
>>> apply function on each of the subgroups I have in the columns?
>>> 
>>> thanks
>>> Assa
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> 

Re: [R] subsetting a data.frame based on a specific group of columns

2015-11-06 Thread jim holtman
I assume the solution is somewhat the same; you just have to define how to
determine what the "distinctive" names are to create the groupings.  My
solution assumed it was the first character.  If the group names end in a
unique sequence, you can use this to form the groups, or you can provide a
list of the first part of the names to match on to form the groups.  You
need to provide a reasonable subset of the data so that we can exactly
understand what the data is and how it should be grouped.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Nov 6, 2015 at 8:53 AM, Assa Yeroslaviz  wrote:

> sorry, for the misunderstanding. here is a more elaborate description of
> what i would like to achieve.
>
> I have a data set of counts from a RNA-Seq experiment and would like to
> filter reads with low counts. I don't want to set everything to 0
> automatically.
>
> I would like to set each categorical group (e.g. condition) to 0, if and
> only if all replica in the group together have less than 100 reads.
> in my examples I used X and Y to represents the categories. Ususally they
> have a more distinct names like "control", "knockout1", "dKo" etc.
>
> So what I really like to do is to check if the sum of all the "control"
> samples is lower than 100. If so, set all control sample to 0. This I would
> like to check *for each category* of every row of the data set.
>
> I hope it is more clear now
>
> thanks
> Assa
>
>
> On Fri, Nov 6, 2015 at 2:29 PM, jim holtman  wrote:
>
>> Is this what you want:
>>
>> > x <- read.table(text = "X1X2X3Y1Y2Y3
>> + 1232357230987172
>> + 0719811795743
>> + 4391907614", header = TRUE)
>> > x
>> X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 20  719 811  795 743
>> 3   43 919    0   76  14
>> >
>> > # create indices of columns that start with the same character
>> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
>> > names(indx) <- NULL  # remove names so output not messed up
>> >
>> > result <- lapply(indx, function(a){
>> + row_sum <- rowSums(x[, a])
>> + x[row_sum < 100, a] <- 0
>> + x[, a]
>> + })
>> > # combine back together
>> > do.call(cbind, result)
>> X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 20   00 811  795 743
>> 3   43 919    00   0
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz  wrote:
>>
>>> Hi,
>>>
>>> I have a data frame with multiple columns, which are belong to several
>>> groups
>>> like that:
>>> X1X2X3Y1Y2Y3
>>> 1232357230987172
>>> 0719811795743
>>> 4391907614
>>>
>>> I would like to filter such rows out, where the sums in one group is
>>> lower
>>> than a specifc value. For example, I would like to set all the values in
>>> a
>>> group of cloums to zero, if the sum in one group is less than 100
>>> In my example table I would like to set the values in the second row for
>>> the three X-columns to 0, so that the table looks like that:
>>>
>>> X1X2X3Y1Y2Y3
>>> 1232357230987172
>>> 000811795743
>>> 43919000
>>>
>>> the same apply also for the Y-values in the last column.
>>> Is there a more efficient way of doing it than going row by row and use
>>> the
>>> apply function on each of the subgroups I have in the columns?
>>>
>>> thanks
>>> Assa
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a data.frame based on a specific group of columns

2015-11-06 Thread Assa Yeroslaviz
sorry, for the misunderstanding. here is a more elaborate description of
what i would like to achieve.

I have a data set of counts from a RNA-Seq experiment and would like to
filter reads with low counts. I don't want to set everything to 0
automatically.

I would like to set each categorical group (e.g. condition) to 0, if and
only if all replica in the group together have less than 100 reads.
in my examples I used X and Y to represents the categories. Ususally they
have a more distinct names like "control", "knockout1", "dKo" etc.

So what I really like to do is to check if the sum of all the "control"
samples is lower than 100. If so, set all control sample to 0. This I would
like to check *for each category* of every row of the data set.

I hope it is more clear now

thanks
Assa


On Fri, Nov 6, 2015 at 2:29 PM, jim holtman  wrote:

> Is this what you want:
>
> > x <- read.table(text = "X1X2X3Y1Y2Y3
> + 1232357230987172
> + 0719811795743
> + 4391907614", header = TRUE)
> > x
> X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 20  719 811  795 743
> 3   43 919    0   76  14
> >
> > # create indices of columns that start with the same character
> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> > names(indx) <- NULL  # remove names so output not messed up
> >
> > result <- lapply(indx, function(a){
> + row_sum <- rowSums(x[, a])
> + x[row_sum < 100, a] <- 0
> + x[, a]
> + })
> > # combine back together
> > do.call(cbind, result)
> X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 20   00 811  795 743
> 3   43 919    00   0
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz  wrote:
>
>> Hi,
>>
>> I have a data frame with multiple columns, which are belong to several
>> groups
>> like that:
>> X1X2X3Y1Y2Y3
>> 1232357230987172
>> 0719811795743
>> 4391907614
>>
>> I would like to filter such rows out, where the sums in one group is lower
>> than a specifc value. For example, I would like to set all the values in a
>> group of cloums to zero, if the sum in one group is less than 100
>> In my example table I would like to set the values in the second row for
>> the three X-columns to 0, so that the table looks like that:
>>
>> X1X2X3Y1Y2Y3
>> 1232357230987172
>> 000811795743
>> 43919000
>>
>> the same apply also for the Y-values in the last column.
>> Is there a more efficient way of doing it than going row by row and use
>> the
>> apply function on each of the subgroups I have in the columns?
>>
>> thanks
>> Assa
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a dataframe

2015-06-09 Thread Ryan Derickson
Lots of ways to do this, I use %in% with bracket notation [row, column].
The empty column argument below returns all columns but you could have
conditional logic there as well.

dd[dd$rows %in% test_rows, ]



On Mon, Jun 8, 2015 at 6:44 PM, Bogdan Tanasa tan...@gmail.com wrote:

 Dear all,

 would appreciate your suggestions on subsetting a dataframe : please let's
 consider an example dataframe df:

 dd-c(1,2,3)
 rows-c(A1,A2,A3)
 columns-c(B1,B2,B3)
 numbers - c(400, 500, 600)
 df - dataframe(dd,rows,columns, numbers)

 and a vector : test_rows -c(A1,A3) ;

 how could I subset the dataframe df function of vector test_rows, in such a
 way that only the lines of dataframe df (df$rows) that match the elements
 of test_rows (A1 and A3) are listed ?

 thank you very much,

 -- bogdan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting a dataframe

2015-06-08 Thread Bogdan Tanasa
Dear all,

would appreciate your suggestions on subsetting a dataframe : please let's
consider an example dataframe df:

dd-c(1,2,3)
rows-c(A1,A2,A3)
columns-c(B1,B2,B3)
numbers - c(400, 500, 600)
df - dataframe(dd,rows,columns, numbers)

and a vector : test_rows -c(A1,A3) ;

how could I subset the dataframe df function of vector test_rows, in such a
way that only the lines of dataframe df (df$rows) that match the elements
of test_rows (A1 and A3) are listed ?

thank you very much,

-- bogdan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a dataframe

2015-06-08 Thread William Dunlap
Use is.element(elements,set), or its equivalent, elements %in% set:

df - data.frame(dd = c(1, 2, 3),
 rows = c(A1, A2, A3),
 columns = c(B1, B2, B3),
 numbers = c(400, 500, 600))
test_rows -c(A1,A3)
df[ is.element(df$rows, test_rows), ]
#  dd rows columns numbers
#1  1   A1  B1 400
#3  3   A3  B3 600


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jun 8, 2015 at 3:44 PM, Bogdan Tanasa tan...@gmail.com wrote:

 Dear all,

 would appreciate your suggestions on subsetting a dataframe : please let's
 consider an example dataframe df:

 dd-c(1,2,3)
 rows-c(A1,A2,A3)
 columns-c(B1,B2,B3)
 numbers - c(400, 500, 600)
 df - dataframe(dd,rows,columns, numbers)

 and a vector : test_rows -c(A1,A3) ;

 how could I subset the dataframe df function of vector test_rows, in such a
 way that only the lines of dataframe df (df$rows) that match the elements
 of test_rows (A1 and A3) are listed ?

 thank you very much,

 -- bogdan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting question

2015-05-20 Thread Dieter Anseeuw
Dear all,
I would like to do multiple actions on a subset of my data. Therefore, I want 
to create a for loop on the variable Date (actually a double for loop on yet 
another variable, but let's omit that for a moment).
I want to run down every level of Date and perform multiple actions on the 
data from a certain date. Here is my code:

for (i in 1:length(datums)){
meanweight-mean(dataset1[dataset1$Date==datums[i],]$Weight)
...

However, this subsetting obviously doesn't work. How can I adjust my code so 
that R runs down all levels of Data in a for loop?
(I need the for loop, not tapply(), sapply(), ...)

Thanks in advance,
Dieter Anseeuw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting question

2015-05-20 Thread MacQueen, Don
Assuming datums is a vector of the unique dates in Date... perhaps
  datums - sort(unique(dataset1$Date))

I usually set it up like this

for (i in 1:length(datums) ) {

  crnt.date - datums[i]
  tmpdat - subset(dataset1, Date==crnt.date)
  cat(i, format(crnt.date), 'dim(tmpdat)',dim(tmpdat),'\n\n')

 ## use tmpdat for the multiple actions

}

The extra step of creating a subset helps one check that everything is
working as expected. It has no noticeable effect on performance with
datasets of the size I normally work with.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/20/15, 3:03 AM, Dieter Anseeuw dieter.anse...@inagro.be wrote:

Dear all,
I would like to do multiple actions on a subset of my data. Therefore, I
want to create a for loop on the variable Date (actually a double for
loop on yet another variable, but let's omit that for a moment).
I want to run down every level of Date and perform multiple actions on
the data from a certain date. Here is my code:

for (i in 1:length(datums)){
meanweight-mean(dataset1[dataset1$Date==datums[i],]$Weight)
...

However, this subsetting obviously doesn't work. How can I adjust my code
so that R runs down all levels of Data in a for loop?
(I need the for loop, not tapply(), sapply(), ...)

Thanks in advance,
Dieter Anseeuw

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting question

2015-05-20 Thread William Dunlap
Here is a self-contained example of what you might be trying to do.
You would get better answers if you supplied this yourself.

dataset1 -
data.frame(Date=as.POSIXct(c(2015-04-01,2015-04-01,2015-04-07,
2015-04-19)), Weight=11:14)
datums - as.POSIXct(c(2015-04-01, 2015-04-08, 2015-04-19))
# note that neither dataset1$Date nor datums is a subset of the other

for (i in 1:length(datums)){
meanweight - mean(dataset1[dataset1$Date==datums[i],]$Weight)
}

You said is 'obviously' doesn't work, but didn't say in what was wrong with
what it did.  One obvious (to me) thing is that in each iteration you
overwrote the
meanweight assigned in the previous iteration so you end up with only the
last one calculated.  You can fix that by using making meanweight a vector
and assigning elements of it in the loop.

meanWeight - numeric(length(datums))
for (i in 1:length(datums)){
meanWeight[i] - mean(dataset1[dataset1$Date==datums[i],]$Weight)
}
data.frame(datums, meanWeight)
#  datums meanWeight
#1 2015-04-01   11.5
#2 2015-04-08NaN
#3 2015-04-19   14.0

This looks to me like the appropriate result, but your self-contained
example
would make me more convinced of that.

By the way, why is is important to you to use a for loop and not a function
like
tapply() or aggregate()?  They can cause hassles with the data I gave above
because datums does not partition dataset1$Date, but I don't know what your
problem with them is.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 20, 2015 at 3:03 AM, Dieter Anseeuw dieter.anse...@inagro.be
wrote:

 Dear all,
 I would like to do multiple actions on a subset of my data. Therefore, I
 want to create a for loop on the variable Date (actually a double for
 loop on yet another variable, but let's omit that for a moment).
 I want to run down every level of Date and perform multiple actions on
 the data from a certain date. Here is my code:

 for (i in 1:length(datums)){
 meanweight-mean(dataset1[dataset1$Date==datums[i],]$Weight)
 ...

 However, this subsetting obviously doesn't work. How can I adjust my code
 so that R runs down all levels of Data in a for loop?
 (I need the for loop, not tapply(), sapply(), ...)

 Thanks in advance,
 Dieter Anseeuw

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting question

2015-05-20 Thread Ivan Calandra

Hi,

What about using functions like aggregate()?
Something like:
aggregate(Weight~datums, data=dataset1, FUN=mean)

If you need to do more things, you can create your own function for 'FUN'

HTH,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 20/05/15 12:03, Dieter Anseeuw a écrit :

Dear all,
I would like to do multiple actions on a subset of my data. Therefore, I want to create a 
for loop on the variable Date (actually a double for loop on yet another 
variable, but let's omit that for a moment).
I want to run down every level of Date and perform multiple actions on the 
data from a certain date. Here is my code:

for (i in 1:length(datums)){
meanweight-mean(dataset1[dataset1$Date==datums[i],]$Weight)
...

However, this subsetting obviously doesn't work. How can I adjust my code so 
that R runs down all levels of Data in a for loop?
(I need the for loop, not tapply(), sapply(), ...)

Thanks in advance,
Dieter Anseeuw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting from pareto distribution

2015-04-28 Thread W Z
I have a dataset of 20k records heavily right skewed as pareto
distribution, I'd like to pull 1k subset of it with same distribution, any
R package would do that?

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting from pareto distribution

2015-04-28 Thread David Winsemius

On Apr 28, 2015, at 12:20 PM, W Z wrote:

 I have a dataset of 20k records heavily right skewed as pareto
 distribution, I'd like to pull 1k subset of it with same distribution, any
 R package would do that?

Why not just:

 subdat - dat[sample( nrow(dat), 1000), ] # if dataset is a dataframe

Or:

subdat - dat[sample( length(dat), 1000) ] # if dataset is a vector


 
 Thanks.
 
   [[alternative HTML version deleted]]

Do read the posting guide and the documentation for your mail client and learn 
how to post in plain text.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Aron Lindberg
Thanks Chuck and Rolf.




While Rolf’s code also works on the dput that I actually gave you (a smaller 
subset of the full dataset), it failed to work on the larger dataset, because 
there are further exceptions:





input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and 
sometimes input[[i]]$content simply returns list().




Chuck’s solution however bypasses this and works on the full dataset (which was 
8mb, which is why I didn’t upload it as a gist).




Best,

Aron




-- 

Aron Lindberg




Doctoral Candidate, Information Systems

Weatherhead School of Management 

Case Western Reserve University

aronlindberg.github.io

On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:

 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've put 
 a 
 dput of the data here:
 
  https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Aron Lindberg
Hmm…Chuck’s solution may actually be problematic because there are several 
entries which at the deepest level are called “sha”, but that should not be 
included, such as:





input[[67]]$content[[1]]$commit$tree$sha




and




input[[67]]$content[[1]]$parents[[1]]$sha





it’s only the “sha” that fit the following subsetting pattern that should be 
included:





input[[i]]$content[[1]]$sha[1]




It’s getting thornier!




To be fair to Rolf’s solution (which probably can be updated to solve the 
problem), I’ve posted the complete dput here:

https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R







-- 

Aron Lindberg




Doctoral Candidate, Information Systems

Weatherhead School of Management 

Case Western Reserve University

aronlindberg.github.io

On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
wrote:

 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, because 
 there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 -- 
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management 
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a 
 dput of the data here:
 
 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Bert Gunter
How can you expect a solution if you cannot specify the problem?

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Fri, Feb 20, 2015 at 6:13 AM, Aron Lindberg aron.lindb...@case.edu wrote:
 Hmm…Chuck’s solution may actually be problematic because there are several 
 entries which at the deepest level are called “sha”, but that should not be 
 included, such as:





 input[[67]]$content[[1]]$commit$tree$sha




 and




 input[[67]]$content[[1]]$parents[[1]]$sha





 it’s only the “sha” that fit the following subsetting pattern that should be 
 included:





 input[[i]]$content[[1]]$sha[1]




 It’s getting thornier!




 To be fair to Rolf’s solution (which probably can be updated to solve the 
 problem), I’ve posted the complete dput here:

 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R







 --

 Aron Lindberg




 Doctoral Candidate, Information Systems

 Weatherhead School of Management

 Case Western Reserve University

 aronlindberg.github.io

 On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
 wrote:

 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, 
 because there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 --
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:

 Hi Everyone,

 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a
 dput of the data here:

 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput

 IIUC, you want the value of every list element that is named sha and
 that name will only apply to atomic objects.
 If so, this should do it.
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Charles C. Berry

On Fri, 20 Feb 2015, Aron Lindberg wrote:


Hmm…Chuck’s solution may actually be problematic because there are several 
entries which at the deepest level are called “sha”, but that should not be 
included, such as:





input[[67]]$content[[1]]$commit$tree$sha




and




input[[67]]$content[[1]]$parents[[1]]$sha





it’s only the “sha” that fit the following subsetting pattern that should be 
included:





input[[i]]$content[[1]]$sha[1]





This should be straightforward. Look at what grepl() is doing.

And look at what names(unlist(input)) yields.

You can either write a regular expression to handle this (perhaps 
content.sha$) or write other grepl() expressions to select (or get rid 
of) the desired (or unwanted) pattern.


See ?grepl and the page on regular expression referenced there.

HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread David Winsemius

On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote:

 Hmm…Chuck’s solution may actually be problematic because there are several 
 entries which at the deepest level are called “sha”, but that should not be 
 included, such as:
 
 input[[67]]$content[[1]]$commit$tree$sh
 
 
 and
 
 input[[67]]$content[[1]]$parents[[1]]$sha
 
 it’s only the “sha” that fit the following subsetting pattern that should be 
 included:
 
 
 input[[i]]$content[[1]]$sha[1]
 
 
 It’s getting thornier!
 
 To be fair to Rolf’s solution (which probably can be updated to solve the 
 problem), I’ve posted the complete dput here:
 
 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R

I didn't try on the larger example, but this works on the smaller one:

 get_shas - function(input){
x - lapply(input, [[, content)
y - lapply(x, [[, 1)   
z - lapply(y, function(yy) if( length(names(yy))  names(yy) ==sha  
){ yy[[sha]] })
}
  sha_lists - get_shas(input)

It does deliver an entry for every leaf of the input-object which is either the 
value of sha or NA. I think that is not a bad thing because it lets you 
figure out where the values are coming from.

 
 -- 
 
 Aron Lindberg
 
 
 
 
 Doctoral Candidate, Information Systems
 
 Weatherhead School of Management 
 
 Case Western Reserve University
 
 aronlindberg.github.io
 
 On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
 wrote:
 
 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, 
 because there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 -- 
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management 
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a 
 dput of the data here:
 
https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread William Dunlap
The elNamed(x, name) function can simplify this code a bit.  The following
gives the same
result as David W's get_shas() for the sample dataset provided:

   get_shas2 - function (input) {
  lapply(input, function(el) elNamed(elNamed(el, content)[[1]],
 sha)[1])
   }

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Feb 20, 2015 at 10:56 AM, David Winsemius dwinsem...@comcast.net
wrote:


 On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote:

  Hmm…Chuck’s solution may actually be problematic because there are
 several entries which at the deepest level are called “sha”, but that
 should not be included, such as:
 
  input[[67]]$content[[1]]$commit$tree$sh
 
 
  and
 
  input[[67]]$content[[1]]$parents[[1]]$sha
 
  it’s only the “sha” that fit the following subsetting pattern that
 should be included:
 
 
  input[[i]]$content[[1]]$sha[1]
 
 
  It’s getting thornier!
 
  To be fair to Rolf’s solution (which probably can be updated to solve
 the problem), I’ve posted the complete dput here:
 
 
 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R

 I didn't try on the larger example, but this works on the smaller one:

  get_shas - function(input){
 x - lapply(input, [[, content)
 y - lapply(x, [[, 1)
 z - lapply(y, function(yy) if( length(names(yy))  names(yy)
 ==sha  ){ yy[[sha]] })
 }
   sha_lists - get_shas(input)

 It does deliver an entry for every leaf of the input-object which is
 either the value of sha or NA. I think that is not a bad thing because it
 lets you figure out where the values are coming from.

 
  --
 
  Aron Lindberg
 
 
 
 
  Doctoral Candidate, Information Systems
 
  Weatherhead School of Management
 
  Case Western Reserve University
 
  aronlindberg.github.io
 
  On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
  wrote:
 
  Thanks Chuck and Rolf.
  While Rolf’s code also works on the dput that I actually gave you (a
 smaller subset of the full dataset), it failed to work on the larger
 dataset, because there are further exceptions:
  input[[i]]$content[[1]] is sometimes a list, sometimes a character
 vector, and sometimes input[[i]]$content simply returns list().
  Chuck’s solution however bypasses this and works on the full dataset
 (which was 8mb, which is why I didn’t upload it as a gist).
  Best,
  Aron
  --
  Aron Lindberg
  Doctoral Candidate, Information Systems
  Weatherhead School of Management
  Case Western Reserve University
  aronlindberg.github.io
  On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu
 wrote:
  Aron Lindberg aron.lindberg at case.edu writes:
 
  Hi Everyone,
 
  I'm working on a thorny subsetting problem involving list of lists.
 I've put a
  dput of the data here:
 
 
 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
  raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
  IIUC, you want the value of every list element that is named sha and
  that name will only apply to atomic objects.
  If so, this should do it.
  input - dget(/tmp/dpt)
  shas - unlist( input, use.names=FALSE )[ grepl( sha,
 names(unlist(input)))]
  input[[67]]$content[[1]]$sha
  [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
  which(input[[67]]$content[[1]]$sha == shas )
  [1] 194
  HTH,
  Chuck
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting a list of lists using lapply

2015-02-19 Thread Aron Lindberg
Hi Everyone,


I'm working on a thorny subsetting problem involving list of lists. I've put a 
dput of the data here:



https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput


I can get one intense of the element I want this way:


 input[[67]]$content[[1]]$sha
[1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15


However, I need to use a lapply function to loop over all of the items of the 
list. I've tried something like this, but it doesn't work:


get_shas - function(input){
x - sapply(input, [[, content)
y - sapply(x, [[, sha)
return(y)
}


sha_lists - lapply(commit_lists, get_shas)


However, this doesn't work. When I run each of the lapply commands manually 
it returns NULL for every list, and when I run the whole apply function it 
says: 


Error in FUN(X[[1L]], ...) : subscript out of bounds


I've tried reading the sections on lists and subsetting in Hadley's Advanced R, 
but I still cannot figure it out. Can anyone help or offer a pointer?


Best,
Aron


-- 
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management 
Case Western Reserve University
aronlindberg.github.io
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-19 Thread Rolf Turner

On 20/02/15 08:45, Aron Lindberg wrote:

Hi Everyone,


I'm working on a thorny subsetting problem involving list of lists.


If you think this is thorny you ain't seen nothin' yet!

But note that you've got a list of lists of lists ... i.e. the nesting 
is at least 3 deep.



I've put a dput of the data here:

https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput



Thank you for creating a reproducible example.


I can get one intense of the element I want this way:

 input[[67]]$content[[1]]$sha [1]
58cf43ecdc1beb7e1043e9de612ecc817b090f15

However, I need to use a lapply function to loop over all of the
items of the list. I've tried something like this, but it doesn't
work:

get_shas - function(input){ x - sapply(input, [[, content) y -
sapply(x, [[, sha) return(y) }

sha_lists - lapply(commit_lists, get_shas)

However, this doesn't work. When I run each of the lapply commands
manually it returns NULL for every list, and when I run the whole
apply function it says:


Error in FUN(X[[1L]], ...) : subscript out of bounds


I've tried reading the sections on lists and subsetting in Hadley's
Advanced R, but I still cannot figure it out. Can anyone help or
offer a pointer?


At least part of the problem is that for some values of i 
input[[i]]$content[[1]] is a list (with an entry named sha) and 
sometimes it is a character vector.


I don't follow your function get_shas() completely, so I started from 
scratch:


foo - function (x){
sapply(x,function(y){
z - y$content[[1]]
if(is.list(z)) z$sha else NA
 })
}

I find that foo(input) gives a vector of length 100, 81 entries of which 
are NA.  Entry number 67 at least agrees with what was shown in your email.


HTH

cheers,

Rolf Turner

--
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-19 Thread Charles Berry
Aron Lindberg aron.lindberg at case.edu writes:

 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've put 
 a 
dput of the data here:
 
   https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 


IIUC, you want the value of every list element that is named sha and 
that name will only apply to atomic objects.

If so, this should do it. 

 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))]
 input[[67]]$content[[1]]$sha
[1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
[1] 194


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting data with svyglm

2015-02-11 Thread Brennan O'Banion
I am aware that it is possible to specify a subset with a single
logical operator when constructing a model, such as:
svyglm(formula, design=data, subset=variable==value).

What I can't figure out is how to specify a subset with two or more
logical operators:
svyglm(formula, design=data, subset=variable==value a|value b).

Is it possible to specify a subset in this way using *glm without
having to, in my case, subset the original data, create a survey
design, and then fit a model?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting data with svyglm

2015-02-11 Thread Anthony Damico
hi brennan, survey design objects can be subsetted with the same subset()
syntax as data.frame objects, so following jeff's advice maybe you want

svyglm( formula , design = subset( surveydesign , variable %in% c( 'value
a' , 'value b' ) ) )

for some examples of how to construct a survey design with public use data,
see http://github.com/ajdamico/usgsd


On Wed, Feb 11, 2015 at 11:49 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 This seems like a fundamental  misunderstanding on your part of how
 operators, and in particular logical expressions, work in computer
 languages. Consider some examples:

 1+2 has a numeric answer because 1 and 2 are both numeric.
 1+a has at the very least not a numeric answer because the values on
 either side of the + sign are not both numeric.
 TRUE | FALSE  has a logical type of answer because both sides of the
 logical or operator are logical.
 However, you are expressing something like
 TRUE | a string which might mean something but that something generally
 is not a logical type of answer.

 Try
 variable==value a | variable==value b
 or
 variable %in% c( value a, value b )

 You would probably find that the Introduction to R document that comes
 with R has some enlightening examples in it. You might also find Pat Burns'
 The R Inferno entertaining as well (search for it in your favorite search
 engine).
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On February 11, 2015 8:42:58 PM EST, Brennan O'Banion 
 brennan.oban...@gmail.com wrote:
 I am aware that it is possible to specify a subset with a single
 logical operator when constructing a model, such as:
 svyglm(formula, design=data, subset=variable==value).
 
 What I can't figure out is how to specify a subset with two or more
 logical operators:
 svyglm(formula, design=data, subset=variable==value a|value b).
 
 Is it possible to specify a subset in this way using *glm without
 having to, in my case, subset the original data, create a survey
 design, and then fit a model?
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting data with svyglm

2015-02-11 Thread Jeff Newmiller
This seems like a fundamental  misunderstanding on your part of how operators, 
and in particular logical expressions, work in computer languages. Consider 
some examples:

1+2 has a numeric answer because 1 and 2 are both numeric.
1+a has at the very least not a numeric answer because the values on either 
side of the + sign are not both numeric.
TRUE | FALSE  has a logical type of answer because both sides of the logical 
or operator are logical.
However, you are expressing something like
TRUE | a string which might mean something but that something generally is 
not a logical type of answer.

Try
variable==value a | variable==value b
or
variable %in% c( value a, value b )

You would probably find that the Introduction to R document that comes with R 
has some enlightening examples in it. You might also find Pat Burns' The R 
Inferno entertaining as well (search for it in your favorite search engine).
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 11, 2015 8:42:58 PM EST, Brennan O'Banion 
brennan.oban...@gmail.com wrote:
I am aware that it is possible to specify a subset with a single
logical operator when constructing a model, such as:
svyglm(formula, design=data, subset=variable==value).

What I can't figure out is how to specify a subset with two or more
logical operators:
svyglm(formula, design=data, subset=variable==value a|value b).

Is it possible to specify a subset in this way using *glm without
having to, in my case, subset the original data, create a survey
design, and then fit a model?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting R 3.1.2

2014-12-05 Thread Dinesh Chowdhary
 x - list(seq = 3:7, alpha = c(a, b, c))
 x$alpha
[1] a b c

 x[alpha]
$alpha
[1] a b c

 x[c(1,2)]
$seq
[1] 3 4 5 6 7

$alpha
[1] a b c

* x[c(1, alpha[2])]*
*$NA*
*NULL*

*$NA*
*NULL*

How to access a character subset withing a list?

Thank you for your effort...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting R 3.1.2

2014-12-05 Thread Lee, Chel Hee

Your question is not clear to me.

 x$alpha[1:2]
[1] a b
 x$alpha[2]
[1] b


Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 12/5/2014 11:12 AM, Dinesh Chowdhary wrote:

x - list(seq = 3:7, alpha = c(a, b, c))
x$alpha

[1] a b c


x[alpha]

$alpha
[1] a b c


x[c(1,2)]

$seq
[1] 3 4 5 6 7

$alpha
[1] a b c

* x[c(1, alpha[2])]*
*$NA*
*NULL*

*$NA*
*NULL*

How to access a character subset withing a list?

Thank you for your effort...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting data for split-sample validation, then repeating 1000x

2014-08-22 Thread Angela Boag
Hi all,

I'm doing some within-dataset model validation and would like to subset a
dataset 70/30 and fit a model to 70% of the data (the training data), then
validate it by predicting the remaining 30% (the testing data), and I would
like to do this split-sample validation 1000 times and average the
correlation coefficient and r2 between the training and testing data.

I have the following working for a single iteration, and would like to know
how to use either the replicate() or for-loop functions to average the 1000
'r2' and 'cor' outputs.

--

# create 70% training sample
A.samp - sample(1:nrow(A),floor(0.7*nrow(A)), replace = TRUE)

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[A.samp,])

# Use the model to predict the remaining 30% of the data
A.pred - predict(A.model, newdata = A[-A.samp,], type = response)

# Correlation between predicted 30% and actual 30%
cor - cor(A[-A.samp,]$nat.r, A.pred, method = pearson)

# r2 between predicted and observed
lm.A - lm(A.pred ~ A[-A.samp,]$nat.r)
r2 - summary(lm.A)$r.squared

# print values
r2
cor

--

Thanks for your time!

Cheers,
Angela

--
Angela E. Boag
Ph.D. Student, Environmental Studies
CAFOR Project Researcher
University of Colorado, Boulder
Mobile: 720-212-6505

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting data for split-sample validation, then repeating 1000x

2014-08-22 Thread David L Carlson
You can use replicate() or a for (i in 1:1000){} loop to do your replications, 
but you have other issues first. 

1. You are sampling with replacement which makes no sense at all. Your 70% 
sample will contain some observations multiple times and will use less than 70% 
of the data most of the time.

2. You compute r using cor() and r.squared using summary.lm(). Why? Once you 
have computed r, r*r or r^2 is equal to r.squared for the simple linear model 
you are using.

# To split your data, you need to sample without replacement, e.g.

train - sample.int(nrow(A), floor(nrow(A)*.7))
test - (1:nrow(A))[-train]

# Now run your analysis on A[train,] and test it on A[test,] 

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[train,])

# Correlation between predicted 30% and actual 30%
cor - cor(Atest$nat.r, predict(A.model, newdata = A[test,], type = response))


-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Angela Boag
Sent: Thursday, August 21, 2014 4:46 PM
To: r-help@r-project.org
Subject: [R] Subsetting data for split-sample validation, then repeating 1000x

Hi all,

I'm doing some within-dataset model validation and would like to subset a
dataset 70/30 and fit a model to 70% of the data (the training data), then
validate it by predicting the remaining 30% (the testing data), and I would
like to do this split-sample validation 1000 times and average the
correlation coefficient and r2 between the training and testing data.

I have the following working for a single iteration, and would like to know
how to use either the replicate() or for-loop functions to average the 1000
'r2' and 'cor' outputs.

--

# create 70% training sample
A.samp - sample(1:nrow(A),floor(0.7*nrow(A)), replace = TRUE)

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[A.samp,])

# Use the model to predict the remaining 30% of the data
A.pred - predict(A.model, newdata = A[-A.samp,], type = response)

# Correlation between predicted 30% and actual 30%
cor - cor(A[-A.samp,]$nat.r, A.pred, method = pearson)

# r2 between predicted and observed
lm.A - lm(A.pred ~ A[-A.samp,]$nat.r)
r2 - summary(lm.A)$r.squared

# print values
r2
cor

--

Thanks for your time!

Cheers,
Angela

--
Angela E. Boag
Ph.D. Student, Environmental Studies
CAFOR Project Researcher
University of Colorado, Boulder
Mobile: 720-212-6505

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting data for split-sample validation, then repeating 1000x

2014-08-22 Thread David L Carlson
Combine your code into a function:

Plant - function() {
train - sample.int(nrow(A), floor(nrow(A)*.7))
test - (1:nrow(A))[-train]
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[train,])
cor(Atest$nat.r, predict(A.model, newdata = A[test,], type = response))
}

Test the function. It should return a single correlation and no errors or 
warnings.

Plant()

If not, debug and run it again. When it works:

Out - replicate(1000, Plant())

Out should be a vector with 1000 correlation values.
hist(Out) # for a histogram of the correlation values

David C


From: Angela Boag [mailto:angela.b...@colorado.edu] 
Sent: Friday, August 22, 2014 4:01 PM
To: David L Carlson
Subject: Re: [R] Subsetting data for split-sample validation, then repeating 
1000x

Hi David,
Thanks for the feedback. I actually sampled without replacement initially but 
it's been a while since I looked at this code and just changed it because I 
thought it made more sense logically, but you've reassured me that my original 
hunch was right.
The real issue I'm having is how to use either the replicate() or for(i in 
1:1000){} loop code to get the average r value of 1000 repetitions as my 
output. I'm not familiar with either tool, so any suggestions on what that code 
would look like would be very helpful.

Thanks!
Angela 


--
Angela E. Boag
Ph.D. Student, Environmental Studies
CAFOR Project Researcher
University of Colorado, Boulder
Mobile: 720-212-6505


On Fri, Aug 22, 2014 at 2:46 PM, David L Carlson dcarl...@tamu.edu wrote:
You can use replicate() or a for (i in 1:1000){} loop to do your replications, 
but you have other issues first.

1. You are sampling with replacement which makes no sense at all. Your 70% 
sample will contain some observations multiple times and will use less than 70% 
of the data most of the time.

2. You compute r using cor() and r.squared using summary.lm(). Why? Once you 
have computed r, r*r or r^2 is equal to r.squared for the simple linear model 
you are using.

# To split your data, you need to sample without replacement, e.g.

train - sample.int(nrow(A), floor(nrow(A)*.7))
test - (1:nrow(A))[-train]

# Now run your analysis on A[train,] and test it on A[test,]

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[train,])

# Correlation between predicted 30% and actual 30%
cor - cor(Atest$nat.r, predict(A.model, newdata = A[test,], type = response))


-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Angela Boag
Sent: Thursday, August 21, 2014 4:46 PM
To: r-help@r-project.org
Subject: [R] Subsetting data for split-sample validation, then repeating 1000x

Hi all,

I'm doing some within-dataset model validation and would like to subset a
dataset 70/30 and fit a model to 70% of the data (the training data), then
validate it by predicting the remaining 30% (the testing data), and I would
like to do this split-sample validation 1000 times and average the
correlation coefficient and r2 between the training and testing data.

I have the following working for a single iteration, and would like to know
how to use either the replicate() or for-loop functions to average the 1000
'r2' and 'cor' outputs.

--

# create 70% training sample
A.samp - sample(1:nrow(A),floor(0.7*nrow(A)), replace = TRUE)

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model - glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
poisson, data = A[A.samp,])

# Use the model to predict the remaining 30% of the data
A.pred - predict(A.model, newdata = A[-A.samp,], type = response)

# Correlation between predicted 30% and actual 30%
cor - cor(A[-A.samp,]$nat.r, A.pred, method = pearson)

# r2 between predicted and observed
lm.A - lm(A.pred ~ A[-A.samp,]$nat.r)
r2 - summary(lm.A)$r.squared

# print values
r2
cor

--

Thanks for your time!

Cheers,
Angela

--
Angela E. Boag
Ph.D. Student, Environmental Studies
CAFOR Project Researcher
University of Colorado, Boulder
Mobile: 720-212-6505
        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting to exclude different values for each subject in study

2014-05-27 Thread Monaly Mistry
Hi Arun,

Thank you for your help,  I have a few questions though if you don't mind.
I'm a bit confused about the following 2 lines of code:
col.tri.nb - tri2nb(coords, row.names=ind)
 lapply(col.tri.nb,function(x) ind[x])[1:5]
## from what I understand in the first line determines the neighbouring
individuals, while in the second line it calls for the output of neighbours
for the first 5 individuals.
Also for the last line of code that you resent I don't really understand
what it is running?

Best,

Monaly.


On Sat, May 24, 2014 at 2:41 AM, arun smartpink...@yahoo.com wrote:

 Hi,
 Sorry, there is a mistake. XO[2,] should be:
 XO[2,] -  sapply(seq_along(col.tri.nb), function(i){ind1 -
 as.character(ind[i]); ind2 - as.character(ind[col.tri.nb[[i]]]);
 mean(abs(XO[1,ind1]-XO[1,ind2]))} )
 A.K.






 On Friday, May 23, 2014 12:56 PM, arun smartpink...@yahoo.com wrote:

 Hi Monaly,
 May be this helps:
 b- 77:99
 ao1 - ao[-b,]
 ##Your code:

 XO- matrix( 0,6, 76, byrow=TRUE);XO
 abo-ao$NestkastNummer[-b];abo  #removed values that were NA
 rownames(XO) = c(EB_score,avg,pop_size,pop_avg_score,
 adj_pop_avg, ind_pop_dif)
 colnames(XO) = abo
 t - ao$COR_LOC;t
 i - c(77:99)
 ti - t[-i];ti
 XO[1,] = c(ti);XO


 library(deldir)
 library(spdep)
 mat - cbind(lat=ao1$lat_xm, long=ao1$long_ym)
 library(spdep)
  coords - coordinates(mat)
 ind - ao1$NestkastNummer

 col.tri.nb - tri2nb(coords, row.names=ind)
  lapply(col.tri.nb,function(x) ind[x])[1:5] ###
 [[1]]
 [1] 713 715 162 148 140 117

 [[2]]
 [1] 130 128 172  64 113 117

 [[3]]
 [1] 54 19 16 73 74

 [[4]]
 [1]   2  31 704  34 707

 [[5]]
 [1] 51 94 57 73 62

 XO[2,] - sapply(seq_along(col.tri.nb),function(i)
 mean(abs(ind[i]-ind[col.tri.nb[[i]]])))

 A.K.



 On Friday, May 23, 2014 7:17 AM, Monaly Mistry monaly.mis...@gmail.com
 wrote:



 Hi Arun and Frede,

 So the dput() is below (it's the same data file as before), but below that
 is the code I used to make the tessellation.  Thanks for your help.

  dput(ao)
 structure(list(num = 1:99, FORM_CHK = c(20870L, 22018L, 30737L,
 22010L, 22028L, 36059L, 36063L, 36066L, 30587L, 30612L, 36056L,
 30376L, 35153L, 30435L, 30536L, 30486L, 30475L, 36053L, 36048L,
 36076L, 36045L, 36065L, 35772L, 36949L, 35702L, 36894L, 36080L,
 35542L, 35457L, 35533L, 36042L, 36925L, 36827L, 36008L, 35817L,
 36350L, 35985L, 35973L, 35801L, 36639L, 35810L, 35812L, 35807L,
 36351L, 35967L, 35944L, 37006L, 36345L, 36062L, 36077L, 35802L,
 35984L, 36043L, 35769L, 36360L, 36082L, 36071L, 36354L, 35771L,
 35754L, 36295L, 35746L, 36064L, 35779L, 35751L, 35752L, 35785L,
 35792L, 37011L, 36003L, 36040L, 36831L, 36031L, 36652L, 36992L,
 36965L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA), RingNummerMan = structure(c(1L,
 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
 16L, 17L, 19L, 22L, 23L, 24L, 25L, 26L, 27L, 29L, 30L, 31L, 34L,
 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 46L, 47L, 48L,
 49L, 50L, 51L, 52L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L,
 65L, 67L, 69L, 70L, 73L, 74L, 75L, 76L, 78L, 79L, 80L, 81L, 82L,
 83L, 85L, 86L, 87L, 88L, 89L, 93L, 96L, 97L, 18L, 20L, 21L, 28L,
 32L, 33L, 45L, 53L, 62L, 64L, 66L, 68L, 71L, 72L, 77L, 84L, 90L,
 91L, 92L, 94L, 95L, 98L, 99L), .Label = c(AJ...75425, AL...62371,
 AR...11060, AR...29297, AR...29307, AR...29502, AR...29504,
 AR...29507, AR...30039, AR...30085, AR...30165, AR...30491,
 AR...30563, AR...30616, AR...30652, AR...30687, AR...30701,
 AR...30927, AR...30959, AR...30963, AR...30964, AR...30965,
 AR...30966, AR...30985, AR...30988, AR...40917, AR...40996,
 AR...45735, AR...45904, AR...45928, AR...47609, AR...65387,
 AR...65479, AR...65550, AR...65629, AR...65948, AR...86074,
 AR...86521, AR...86527, AR...90061, AR...90064, AR...90067,
 AR...90077, AR...90081, AR...90098, AR...90101, AR...90106,
 AR...90112, AR...90133, AR...90155, AR...90176, AR...90178,
 AR...90180, AR...90187, AR...90212, AR...90247, AR...90252,
 AR...90256, AR...90258, AR...90269, AR...90272, AR...90275,
 AR...90294, AR...90298, AR...90300, AR...90337, AR...90338,
 AR...90367, AR...90397, AR...90410, AR...90463, AR...90520,
 AR...90544, AR...90556, AR...90678, AR...90712, AR...90737,
 AR...90744, AR...90829, AR...90862, AR...90863, AR...90873,
 AR...90880, AR...90892, AR...90898, AR...90945, AR...90951,
 AR...90965, AR...90970, AR...90972, AU...15008, AU...15009,
 AU...15027, AU...15032, AU...15036, AU...15038, AU...15046,
 AU...15049, AU...15505), class = factor), year_score_taken =
 c(2006L,
 2008L, 2009L, 2008L, 2008L, 2011L, 2011L, 2011L, 2009L, 2009L,
 2011L, 2009L, 2010L, 2009L, 2009L, 2009L, 2009L, 2011L, 2011L,
 2011L, 2011L, 2011L, 2011L, 2012L, 2011L, 2012L, 2011L, 2010L,
 2010L, 2010L, 2011L, 2012L, 2012L, 2011L, 2011L, 2012L, 2011L,
 2011L, 2011L, 2012L, 2011L, 2011L, 2011L, 2012L, 2011L, 2011L,
 2013L, 2012L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L,
 2012L, 2011L, 2012L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
 2011L, 2011L, 

Re: [R] subsetting to exclude different values for each subject in study

2014-05-27 Thread arun
Hi Monaly,

According to the description of ?tri2nb
The function uses the ‘deldir’ package to convert a matrix of
 two-dimensional coordinates into a neighbours list of class ‘nb’
 with a list of integer vectors containing neighbour region number
 ids.


So, col.tri.nb is a list of length 76. 

 str(col.tri.nb)
List of 76
 $ : int [1:6] 11 26 33 46 57 75
 $ : int [1:6] 27 31 36 42 70 75
---

 lst1 - lapply(col.tri.nb,function(x) ind[x]) 
 lst1[1:5] #returns 1st five elements
##Regarding the last piece of code:
#Check the output of

res - sapply(seq_along(col.tri.nb), function(i) {
    ind1 - as.character(ind[i])
    ind2 - as.character(ind[col.tri.nb[[i]]])
    XOind1 - XO[1, ind1]
    XOind2 - XO[1, ind2]
    cat( list element=, i, \n,  NestkastNummer=, ind1, \n,  
Neighbouring NestkastNummer=, 
    ind2, \n,  EBScore=, XOind1, \n,  EBScore Neighbour element=, 
XOind2, 
    \n,  avg=, mean(abs(XOind1 - XOind2)), sep =  , \n)
    mean(abs(XOind1 - XOind2))
})



A.K.




On Tuesday, May 27, 2014 7:20 AM, Monaly Mistry monaly.mis...@gmail.com wrote:



Hi Arun,

Thank you for your help,  I have a few questions though if you don't mind. I'm 
a bit confused about the following 2 lines of code:
col.tri.nb - tri2nb(coords, row.names=ind)
 lapply(col.tri.nb,function(x) ind[x])[1:5]

## from what I understand in the first line determines the neighbouring 
individuals, while in the second line it calls for the output of neighbours for 
the first 5 individuals.
Also for the last line of code that you resent I don't really understand what 
it is running?

Best,

Monaly.



On Sat, May 24, 2014 at 2:41 AM, arun smartpink...@yahoo.com wrote:

Hi,
Sorry, there is a mistake. XO[2,] should be:
XO[2,] -  sapply(seq_along(col.tri.nb), function(i){ind1 - 
as.character(ind[i]); ind2 - as.character(ind[col.tri.nb[[i]]]); 
mean(abs(XO[1,ind1]-XO[1,ind2]))} )
A.K.







On Friday, May 23, 2014 12:56 PM, arun smartpink...@yahoo.com wrote:

Hi Monaly,
May be this helps:
b- 77:99
ao1 - ao[-b,]
##Your code:

XO- matrix( 0,6, 76, byrow=TRUE);XO
abo-ao$NestkastNummer[-b];abo  #removed values that were NA
rownames(XO) = c(EB_score,avg,pop_size,pop_avg_score,
adj_pop_avg, ind_pop_dif)
colnames(XO) = abo
t - ao$COR_LOC;t
i - c(77:99)
ti - t[-i];ti
XO[1,] = c(ti);XO
 

library(deldir)
library(spdep)
mat - cbind(lat=ao1$lat_xm, long=ao1$long_ym)
library(spdep)
 coords - coordinates(mat)
ind - ao1$NestkastNummer

col.tri.nb - tri2nb(coords, row.names=ind)
 lapply(col.tri.nb,function(x) ind[x])[1:5] ###
[[1]]
[1] 713 715 162 148 140 117

[[2]]
[1] 130 128 172  64 113 117

[[3]]
[1] 54 19 16 73 74

[[4]]
[1]   2  31 704  34 707

[[5]]
[1] 51 94 57 73 62

XO[2,] - sapply(seq_along(col.tri.nb),function(i) 
mean(abs(ind[i]-ind[col.tri.nb[[i]]])))

A.K.



On Friday, May 23, 2014 7:17 AM, Monaly Mistry monaly.mis...@gmail.com wrote:



Hi Arun and Frede,

So the dput() is below (it's the same data file as before), but below that is 
the code I used to make the tessellation.  Thanks for your help.

 dput(ao)
structure(list(num = 1:99, FORM_CHK = c(20870L, 22018L, 30737L, 
22010L, 22028L, 36059L, 36063L, 36066L, 30587L, 30612L, 36056L, 
30376L, 35153L, 30435L, 30536L, 30486L, 30475L, 36053L, 36048L, 
36076L, 36045L, 36065L, 35772L, 36949L, 35702L, 36894L, 36080L, 
35542L, 35457L, 35533L, 36042L, 36925L, 36827L, 36008L, 35817L, 
36350L, 35985L, 35973L, 35801L, 36639L, 35810L, 35812L, 35807L, 
36351L, 35967L, 35944L, 37006L, 36345L, 36062L, 36077L, 35802L, 
35984L, 36043L, 35769L, 36360L, 36082L, 36071L, 36354L, 35771L, 
35754L, 36295L, 35746L, 36064L, 35779L, 35751L, 35752L, 35785L, 
35792L, 37011L, 36003L, 36040L, 36831L, 36031L, 36652L, 36992L, 
36965L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), RingNummerMan = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 19L, 22L, 23L, 24L, 25L, 26L, 27L, 29L, 30L, 31L, 34L, 
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L, 
65L, 67L, 69L, 70L, 73L, 74L, 75L, 76L, 78L, 79L, 80L, 81L, 82L, 
83L, 85L, 86L, 87L, 88L, 89L, 93L, 96L, 97L, 18L, 20L, 21L, 28L, 
32L, 33L, 45L, 53L, 62L, 64L, 66L, 68L, 71L, 72L, 77L, 84L, 90L, 
91L, 92L, 94L, 95L, 98L, 99L), .Label = c(AJ...75425, AL...62371, 
AR...11060, AR...29297, AR...29307, AR...29502, AR...29504, 
AR...29507, AR...30039, AR...30085, AR...30165, AR...30491, 
AR...30563, AR...30616, AR...30652, AR...30687, AR...30701, 
AR...30927, AR...30959, AR...30963, AR...30964, AR...30965, 
AR...30966, AR...30985, AR...30988, AR...40917, AR...40996, 
AR...45735, AR...45904, AR...45928, AR...47609, AR...65387, 
AR...65479, AR...65550, AR...65629, AR...65948, AR...86074, 
AR...86521, AR...86527, AR...90061, AR...90064, AR...90067, 
AR...90077, AR...90081, AR...90098, AR...90101, AR...90106, 
AR...90112, AR...90133, AR...90155, AR...90176, AR...90178, 

Re: [R] subsetting to exclude different values for each subject in study

2014-05-23 Thread Monaly Mistry
,c(6,39,38,36,3)]))
  XO[avg, 84]-
 mean(abs((XO[1,84])-XO[1,c(82,91,90,88,86)]))
  XO[avg, 113]-
  mean(abs((XO[1,113])-XO[1,c(68,60,62,64,124,117,101)]))
  XO[avg, 62]- mean(abs((XO[1,62])-XO[1,c(60,64,113)]))
  XO[avg, 168]-
 mean(abs((XO[1,168])-XO[1,c(170,169,164,163)]))
  XO[avg, 23]-
 mean(abs((XO[1,23])-XO[1,c(9,11,22,79,42)]))
  XO[avg, 3]-
 mean(abs((XO[1,3])-XO[1,c(6,28,36,35,31,2)]))
  XO[avg, 117]-
 
 mean(abs((XO[1,117])-XO[1,c(101,113,124,130,133,140,68)]))
  XO[avg, 150]- mean(abs((XO[1,150])-XO[1,c(148)]))
  XO[pop_size,] - 76
  XO[pop_avg_score,]- mean(XO[EB_score,])
  for (i in XO){
XO[adj_pop_avg,] -
 
 ((XO[pop_avg_score,])*(XO[pop_size,])-(XO[EB_score,]))/((XO[pop_size,]-1))
#here I ran a loop to get info
XO[ind_pop_dif,] -abs((XO[EB_score,]-XO[adj_pop_avg,]))}
  t.test(XO[avg,], XO[ind_pop_dif,], paired=TRUE)
  XO
  XO-rbind(XO,0)
  rownames(XO)-c(EB_score,avg,pop_size,pop_avg_score,
 adj_pop_avg,
  ind_pop_dif, non_nei)
  XO[non_nei,]-0
  rowMeans(XO[,1:76])
 
  #This is the average observed discrepancy from individuals to neighbours
  #IOW on average how different is a focal bird in this year different from
  its neighbours
  obso=mean(XO[avg,])
  print(paste(Observed=, obso))
  XY[15,1]-round(obso, digits=4)
 
 
  #This is the code I previously posted to find the difference in scores
  between a single subject and its non-neighbours
  o-(ao[,c(13,5)])
  o-na.omit(o)
  o-o[!o$NestkastNummer %in% c(176,140,162,713),]
  XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]
 
 
  Best,
 
  Monaly.
 
 
  On Thu, May 22, 2014 at 5:08 PM, John Kane jrkrid...@inbox.com wrote:
 
   Re dput() etc
   https://github.com/hadley/devtools/wiki/Reproducibility
  
  
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
  
   What dput() does is take your data and ouput it in an ascii format that
   let's the reader here create an exact duplicate of your database.
  
   R is not WYSIWYG. Often what you see on the screen does not tell the
 whole
   tale. R supports a number of different data types: vectors, matrices,
   data.frames, lists, arrays and others. This site gives a useful though
 not
   complete summary of many data types
   http://www.statmethods.net/input/datatypes.html. When you have just
   created a new data set, or even when working with one that you have not
   worked with in some time it is a good idea to do a str() and class()
 on the
   data object just to be sure that you are working with the data types
 you
   think you have. What looks like a column of numbers in a data.frame may
   actually be a set of factors or a set of character (text) data and
 you're
   left wondering why multiplying it by some number is not working.
  
   Here is a short example to illustrate. Just copy and paste in the code
dat1  - data.frame(aa = as.factor(1:5), bb = 1:5)
   dat1 # data looks identical on the screen
   5*dat1[,aa]  # oops
   5*dat1[, bb] # okay
   str(dat1)
  
  
   John Kane
   Kingston ON Canada
  
  
-Original Message-
From: monaly.mis...@gmail.com
Sent: Thu, 22 May 2014 16:31:39 +0100
To: smartpink...@yahoo.com, r-help@r-project.org
Subject: Re: [R] subsetting to exclude different values for each
 subject
in study
   
Hi,
   
Sorry I'm fairly new to R and I don't really understand using dput(),
when
you say reproducible example do you mean the code with the output?
   
Best,
   
Monaly.
   
   
On Thu, May 22, 2014 at 4:03 PM, arun smartpink...@yahoo.com
 wrote:
   
Hi,
   
It would be helpful if you provide a reproducible example using
 ?dput().
   
A.K.
   
   
   
   
On Thursday, May 22, 2014 10:15 AM, Monaly Mistry
monaly.mis...@gmail.com
wrote:
Hi,
   
I've written a code to determine the difference in score for a
 single
subject and its non-neighbours
   
o-(ao[,c(13,5)]) ##this is the table with the relevant information
o-na.omit(o)  ##omitted data with NA
o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed
 neighbours
XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference
 between
that
individual and average non-neighbours scores
   
Since each subject has a different number of non-neighbours I was
wondering
if there is an efficient way of writing the code, instead of
 writing the
same code again and again (76 subjects) for each subject and its
non-neighbours.
   
   
Best,
   
Monaly.
   
[[alternative HTML version deleted]]
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   
   
   
  [[alternative HTML version deleted]]
   
__
R-help@r-project.org mailing list

Re: [R] subsetting to exclude different values for each subject in study

2014-05-23 Thread Frede Aakmann Tøgersen
Hi Monaly

I guess that if you made the neighborhood data available (using dput()) then 
Arun will easily show you how to automatically with only  a couple of code 
lines instead of those many lines you had to make by hand.

Have a nice day.

Yours sincerely / Med venlig hilsen


Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance  Modeling

Technology  Service Solutions
T +45 9730 5135
M +45 2547 6050
fr...@vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Monaly Mistry
 Sent: 23. maj 2014 12:34
 To: arun; r-help@r-project.org
 Subject: Re: [R] subsetting to exclude different values for each subject in
 study

 Hi,

 I did use the library deldir, I didn't put that code in since I  wasn't
 sure if it was really relevant to the question as I just made the
 tesselations identifying which tessellation belonged to which individual.
 Following that I by hand recorded which individuals were sharing a boundary
 with each other.

 Best,

 Monaly.


 On Fri, May 23, 2014 at 11:25 AM, arun smartpink...@yahoo.com wrote:

  Hi,
 
  I am not sure how you did that.  May be using library(deldir).  I didn't
  find that codes in your previous email.
 
  A.K.
 
  On Friday, May 23, 2014 12:42 AM, Monaly Mistry
 monaly.mis...@gmail.com
  wrote:
 
 
 
  Hi,
  Neighbours in this case were selected if they shared a boundary in the
  voroni tesellation.
 
  Best,
  Monaly
  On May 23, 2014 3:19 AM, arun smartpink...@yahoo.com wrote:
  
  
  
   HI Monaly,
   Thanks for the code and dput.  But, I have a doubt about how you are
  selecting the neigbours.  Is there another dataset with the information?
  Sorry, if I have missed something
   For e.g.
   ### average difference b/n neighbours for each individual
   XO[avg, 176]- mean(abs((XO[1,176])-XO[1,c(140,162,713)]))
  
  
   A.K.
  
  
   On Thursday, May 22, 2014 5:21 PM, Monaly Mistry 
  monaly.mis...@gmail.com wrote:
   Hi Everyone,
  
   I hope I did this correctly (I called my data frame ao) and Thank you
  very
   much for the info about using dput(), I'm starting to understand all the
   different things that can be done in R and I appreciate all the advice.
  I
   must appologize in advance since my coding is quite long but hopefully it
   makes sense. and there is a efficient way to do this.
  
   structure(list(num = 1:99, FORM_CHK = c(20870L, 22018L, 30737L,
   22010L, 22028L, 36059L, 36063L, 36066L, 30587L, 30612L, 36056L,
   30376L, 35153L, 30435L, 30536L, 30486L, 30475L, 36053L, 36048L,
   36076L, 36045L, 36065L, 35772L, 36949L, 35702L, 36894L, 36080L,
   35542L, 35457L, 35533L, 36042L, 36925L, 36827L, 36008L, 35817L,
   36350L, 35985L, 35973L, 35801L, 36639L, 35810L, 35812L, 35807L,
   36351L, 35967L, 35944L, 37006L, 36345L, 36062L, 36077L, 35802L,
   35984L, 36043L, 35769L, 36360L, 36082L, 36071L, 36354L, 35771L,
   35754L, 36295L, 35746L, 36064L, 35779L, 35751L, 35752L, 35785L,
   35792L, 37011L, 36003L, 36040L, 36831L, 36031L, 36652L, 36992L,
   36965L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
   NA, NA, NA, NA, NA, NA, NA, NA, NA), RingNummerMan =
 structure(c(1L,
   2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
   16L, 17L, 19L, 22L, 23L, 24L, 25L, 26L, 27L, 29L, 30L, 31L, 34L,
   35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 46L, 47L, 48L,
   49L, 50L, 51L, 52L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L,
   65L, 67L, 69L, 70L, 73L, 74L, 75L, 76L, 78L, 79L, 80L, 81L, 82L,
   83L, 85L, 86L, 87L, 88L, 89L, 93L, 96L, 97L, 18L, 20L, 21L, 28L,
   32L, 33L, 45L, 53L, 62L, 64L, 66L, 68L, 71L, 72L, 77L, 84L, 90L,
   91L, 92L, 94L, 95L, 98L, 99L), .Label = c(AJ...75425, AL...62371,
   AR...11060, AR...29297, AR...29307, AR...29502, AR...29504,
   AR...29507, AR...30039, AR...30085, AR...30165, AR...30491,
   AR...30563, AR...30616, AR...30652, AR...30687, AR...30701,
   AR...30927, AR...30959, AR...30963, AR...30964, AR...30965,
   AR...30966, AR...30985, AR...30988, AR...40917, AR...40996,
   AR...45735, AR...45904, AR...45928, AR...47609, AR...65387,
   AR...65479, AR...65550, AR...65629, AR...65948, AR...86074,
   AR...86521, AR...86527, AR...90061, AR...90064, AR...90067,
   AR...90077, AR...90081, AR...90098, AR...90101, AR...90106,
   AR...90112, AR...90133, AR...90155, AR...90176, AR...90178,
   AR...90180, AR...90187, AR...90212, AR...90247, AR...90252,
   AR...90256, AR...90258, AR...90269, AR...90272, AR...90275,
   AR...90294, AR...90298, AR...90300, AR...90337, AR...90338,
   AR...90367, AR...90397, AR...90410, AR...90463, AR...90520,
   AR...90544, AR...90556, AR...90678, AR...90712, AR...90737,
   AR...90744, AR...90829, AR...90862, AR...90863, AR...90873,
   AR...90880, AR...90892, AR

Re: [R] subsetting to exclude different values for each subject in study

2014-05-23 Thread Monaly Mistry
, 314.4300781, 117.5861334,
437.9708453, 95.41039954, 105.7453938, 235.5829892, 627.9704095,
177.0636713, 99.17481232, 396.6993402, 973.4739067, 1034.662528,
1046.77705, 221.278275, 27.24031031, 724.0652756, 942.6742674,
325.9970589, 261.933799, 116.7648206, 464.0478832, 532.6968545,
423.9399058, 656.8536222, 979.9076146, 221.2098377, 701.5473216,
709.8290013, 1120.559295, 345.5719307, 463.4318862, 429.6207308,
659.112262, 717.7684649, 533.3812884, 819.3388243, 600.9351721,
722.4910753, 1126.719223, 26.8297633), long_ym = c(385.4016022,
744.3388344, 1278.519267, 582.1054392, 1183.781188, 1313.545671,
1155.204087, 1008.093201, 812.6125238, 1045.899477, 474.135164,
887.4467064, 626.9169985, 700.9728169, 849.3068501, 799.1579293,
1418.180093, 598.1175046, 928.3664402, .83807, 367.2768291,
1318.32705, 501.4891137, 542.5200518, 1095.7148, 552.6387801,
636.2573659, 479.9172936, 1057.018971, 980.7392501, 739.0014835,
485.8106446, 371.9470232, 1365.91848, 942.3769994, 664.2784869,
887.335514, 669.5046549, 1156.983212, 893.8960158, 933.9261864,
783.4794517, 1191.342439, 975.8466709, 453.8976828, 55.70866057,
731.2178331, 973.6227733, 1002.199869, 920.5827929, 678.1778549,
1141.415921, 578.9919757, 710.2019861, 738.8902861, 936.706063,
480.8068625, 454.8984371, 771.1368166, 510.940689, 680.7353401,
1087.041598, 895.6751282, 641.8171157, 573.7658194, 651.9358502,
816.2819528, 819.6178023, 828.7357905, 801.8266126, 856.9792948,
415.0906484, 1086.374437, 737.4447458, 559.866446, 0, 423.6526577,
1166.990753, 957.8330951, 562.8687158, 564.7590286, 1339.676479,
197.5933584, 132.099559, 1205.686591, 246.6303384, 1106.500715,
597.3391415, 1389.380609, 1312.878499, 1155.760068, 1152.090634,
433.6602223, 1252.833235, 1028.88666, 522.3937678, 151.7810272,
796.3780665, 631.3647851), avg_pop_eb = c(23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359, 23.57103359, 23.57103359, 23.57103359,
23.57103359, 23.57103359)), .Names = c(num, FORM_CHK, RingNummerMan,
year_score_taken, COR_LOC, IndividuID, BroedJaar,
ManipulatieOuders,
LegBeginDag, LegBeginMaand, broodinfo, BroedselID,
NestkastNummer,
lat_xm, long_ym, avg_pop_eb), class = data.frame, row.names = c(NA,
-99L))


#Code for  tessellation
library(deldir)
ao= read.table(C:/Users/Monaly/Desktop/2012_malenest.txt, header=TRUE)
a29= deldir(ao$lat_xm, ao$long_ym)
a30=tile.list(a29)
plot(a30, close=TRUE, main=2012 Male Nest, xlab=latitude (m),
ylab=longitude (m), wpoints=real, verbose=FALSE,num=TRUE, rw=c(0, 1200,
0, 2000))
text(ao$lat_xm, ao$long_ym,col=c(2,1,4),labels=round(ao$NestkastNummer, 3),
pos=2, offset=0.2, cex=0.7)  #this was to identify the points


On Fri, May 23, 2014 at 11:58 AM, Frede Aakmann Tøgersen
fr...@vestas.comwrote:

 Hi Monaly

 I guess that if you made the neighborhood data available (using dput())
 then Arun will easily show you how to automatically with only  a couple of
 code lines instead of those many lines you had to make by hand.

 Have a nice day.

 Yours sincerely / Med venlig hilsen


 Frede Aakmann Tøgersen
 Specialist, M.Sc., Ph.D.
 Plant Performance  Modeling

 Technology  Service Solutions
 T +45 9730 5135
 M +45 2547 6050
 fr...@vestas.com
 http://www.vestas.com

 Company reg. name: Vestas Wind Systems A/S
 This e-mail is subject to our e-mail disclaimer statement.
 Please refer to www.vestas.com/legal/notice
 If you have received this e-mail in error please contact the sender.


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On Behalf Of Monaly Mistry
  Sent: 23. maj 2014 12:34
  To: arun; r-help@r-project.org
  Subject: Re: [R] subsetting to exclude different values for each subject
 in
  study
 
  Hi,
 
  I did use the library deldir, I didn't put that code in since I  wasn't
  sure if it was really relevant to the question as I just

Re: [R] subsetting to exclude different values for each subject in study

2014-05-23 Thread arun
 Monaly

I guess that if you made the neighborhood data available (using dput()) then 
Arun will easily show you how to automatically with only  a couple of code 
lines instead of those many lines you had to make by hand.

Have a nice day.

Yours sincerely / Med venlig hilsen


Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance  Modeling

Technology  Service Solutions
T +45 9730 5135
M +45 2547 6050
fr...@vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Monaly Mistry
 Sent: 23. maj 2014 12:34
 To: arun; r-help@r-project.org
 Subject: Re: [R] subsetting to exclude different values for each subject in
 study

 Hi,


 I did use the library deldir, I didn't put that code in since I  wasn't
 sure if it was really relevant to the question as I just made the
 tesselations identifying which tessellation belonged to which individual.
 Following that I by hand recorded which individuals were sharing a boundary
 with each other.

 Best,

 Monaly.


 On Fri, May 23, 2014 at 11:25 AM, arun smartpink...@yahoo.com wrote:

  Hi,
 
  I am not sure how you did that.  May be using library(deldir).  I didn't
  find that codes in your previous email.
 
  A.K.
 
  On Friday, May 23, 2014 12:42 AM, Monaly Mistry
 monaly.mis...@gmail.com
  wrote:
 
 
 
  Hi,
  Neighbours in this case were selected if they shared a boundary in the
  voroni tesellation.
 
  Best,
  Monaly
  On May 23, 2014 3:19 AM, arun smartpink...@yahoo.com wrote:
  
  
  
   HI Monaly,
   Thanks for the code and dput.  But, I have a doubt about how you are
  selecting the neigbours.  Is there another dataset with the information?
  Sorry, if I have missed something
   For e.g.
   ### average difference b/n neighbours for each individual
   XO[avg, 176]- mean(abs((XO[1,176])-XO[1,c(140,162,713)]))
  
  
   A.K.
  
  
   On Thursday, May 22, 2014 5:21 PM, Monaly Mistry 
  monaly.mis...@gmail.com wrote:
   Hi Everyone,
  
   I hope I did this correctly (I called my data frame ao) and Thank you
  very
   much for the info about using dput(), I'm starting to understand all the
   different things that can be done in R and I appreciate all the advice.
  I
   must appologize in advance since my coding is quite long but hopefully it
   makes sense. and there is a efficient way to do this.
  
   structure(list(num = 1:99, FORM_CHK = c(20870L, 22018L, 30737L,
   22010L, 22028L, 36059L, 36063L, 36066L, 30587L, 30612L, 36056L,
   30376L, 35153L, 30435L, 30536L, 30486L, 30475L, 36053L, 36048L,
   36076L, 36045L, 36065L, 35772L, 36949L, 35702L, 36894L, 36080L,
   35542L, 35457L, 35533L, 36042L, 36925L, 36827L, 36008L, 35817L,
   36350L, 35985L, 35973L, 35801L, 36639L, 35810L, 35812L, 35807L,
   36351L, 35967L, 35944L, 37006L, 36345L, 36062L, 36077L, 35802L,
   35984L, 36043L, 35769L, 36360L, 36082L, 36071L, 36354L, 35771L,
   35754L, 36295L, 35746L, 36064L, 35779L, 35751L, 35752L, 35785L,
   35792L, 37011L, 36003L, 36040L, 36831L, 36031L, 36652L, 36992L,
   36965L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
   NA, NA, NA, NA, NA, NA, NA, NA, NA), RingNummerMan =
 structure(c(1L,
   2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
   16L, 17L, 19L, 22L, 23L, 24L, 25L, 26L, 27L, 29L, 30L, 31L, 34L,
   35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 46L, 47L, 48L,
   49L, 50L, 51L, 52L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L,
   65L, 67L, 69L, 70L, 73L, 74L, 75L, 76L, 78L, 79L, 80L, 81L, 82L,
   83L, 85L, 86L, 87L, 88L, 89L, 93L, 96L, 97L, 18L, 20L, 21L, 28L,
   32L, 33L, 45L, 53L, 62L, 64L, 66L, 68L, 71L, 72L, 77L, 84L, 90L,
   91L, 92L, 94L, 95L, 98L, 99L), .Label = c(AJ...75425, AL...62371,
   AR...11060, AR...29297, AR...29307, AR...29502, AR...29504,
   AR...29507, AR...30039, AR...30085, AR...30165, AR...30491,
   AR...30563, AR...30616, AR...30652, AR...30687, AR...30701,
   AR...30927, AR...30959, AR...30963, AR...30964, AR...30965,
   AR...30966, AR...30985, AR...30988, AR...40917, AR...40996,
   AR...45735, AR...45904, AR...45928, AR...47609, AR...65387,
   AR...65479, AR...65550, AR...65629, AR...65948, AR...86074,
   AR...86521, AR...86527, AR...90061, AR...90064, AR...90067,
   AR...90077, AR...90081, AR...90098, AR...90101, AR...90106,
   AR...90112, AR...90133, AR...90155, AR...90176, AR...90178,
   AR...90180, AR...90187, AR...90212, AR...90247, AR...90252,
   AR...90256, AR...90258, AR...90269, AR...90272, AR...90275,
   AR...90294, AR...90298, AR...90300, AR...90337, AR...90338,
   AR...90367, AR...90397, AR...90410, AR...90463, AR...90520,
   AR...90544, AR...90556, AR...90678, AR...90712, AR...90737,
   AR...90744, AR...90829, AR...90862, AR...90863, AR...90873,
   AR...90880, AR...90892, AR...90898

Re: [R] subsetting to exclude different values for each subject in study

2014-05-23 Thread arun
Hi,
Sorry, there is a mistake. XO[2,] should be:
XO[2,] -  sapply(seq_along(col.tri.nb), function(i){ind1 - 
as.character(ind[i]); ind2 - as.character(ind[col.tri.nb[[i]]]); 
mean(abs(XO[1,ind1]-XO[1,ind2]))} )
A.K.






On Friday, May 23, 2014 12:56 PM, arun smartpink...@yahoo.com wrote:

Hi Monaly,
May be this helps:
b- 77:99
ao1 - ao[-b,]
##Your code:

XO- matrix( 0,6, 76, byrow=TRUE);XO
abo-ao$NestkastNummer[-b];abo  #removed values that were NA
rownames(XO) = c(EB_score,avg,pop_size,pop_avg_score,
adj_pop_avg, ind_pop_dif)
colnames(XO) = abo
t - ao$COR_LOC;t
i - c(77:99)
ti - t[-i];ti
XO[1,] = c(ti);XO
  

library(deldir)
library(spdep)
mat - cbind(lat=ao1$lat_xm, long=ao1$long_ym)
library(spdep)
 coords - coordinates(mat)
ind - ao1$NestkastNummer

col.tri.nb - tri2nb(coords, row.names=ind)
 lapply(col.tri.nb,function(x) ind[x])[1:5] ###
[[1]]
[1] 713 715 162 148 140 117

[[2]]
[1] 130 128 172  64 113 117

[[3]]
[1] 54 19 16 73 74

[[4]]
[1]   2  31 704  34 707

[[5]]
[1] 51 94 57 73 62

XO[2,] - sapply(seq_along(col.tri.nb),function(i) 
mean(abs(ind[i]-ind[col.tri.nb[[i]]])))

A.K.



On Friday, May 23, 2014 7:17 AM, Monaly Mistry monaly.mis...@gmail.com wrote:



Hi Arun and Frede,

So the dput() is below (it's the same data file as before), but below that is 
the code I used to make the tessellation.  Thanks for your help.

 dput(ao)
structure(list(num = 1:99, FORM_CHK = c(20870L, 22018L, 30737L, 
22010L, 22028L, 36059L, 36063L, 36066L, 30587L, 30612L, 36056L, 
30376L, 35153L, 30435L, 30536L, 30486L, 30475L, 36053L, 36048L, 
36076L, 36045L, 36065L, 35772L, 36949L, 35702L, 36894L, 36080L, 
35542L, 35457L, 35533L, 36042L, 36925L, 36827L, 36008L, 35817L, 
36350L, 35985L, 35973L, 35801L, 36639L, 35810L, 35812L, 35807L, 
36351L, 35967L, 35944L, 37006L, 36345L, 36062L, 36077L, 35802L, 
35984L, 36043L, 35769L, 36360L, 36082L, 36071L, 36354L, 35771L, 
35754L, 36295L, 35746L, 36064L, 35779L, 35751L, 35752L, 35785L, 
35792L, 37011L, 36003L, 36040L, 36831L, 36031L, 36652L, 36992L, 
36965L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), RingNummerMan = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 19L, 22L, 23L, 24L, 25L, 26L, 27L, 29L, 30L, 31L, 34L, 
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L, 
65L, 67L, 69L, 70L, 73L, 74L, 75L, 76L, 78L, 79L, 80L, 81L, 82L, 
83L, 85L, 86L, 87L, 88L, 89L, 93L, 96L, 97L, 18L, 20L, 21L, 28L, 
32L, 33L, 45L, 53L, 62L, 64L, 66L, 68L, 71L, 72L, 77L, 84L, 90L, 
91L, 92L, 94L, 95L, 98L, 99L), .Label = c(AJ...75425, AL...62371, 
AR...11060, AR...29297, AR...29307, AR...29502, AR...29504, 
AR...29507, AR...30039, AR...30085, AR...30165, AR...30491, 
AR...30563, AR...30616, AR...30652, AR...30687, AR...30701, 
AR...30927, AR...30959, AR...30963, AR...30964, AR...30965, 
AR...30966, AR...30985, AR...30988, AR...40917, AR...40996, 
AR...45735, AR...45904, AR...45928, AR...47609, AR...65387, 
AR...65479, AR...65550, AR...65629, AR...65948, AR...86074, 
AR...86521, AR...86527, AR...90061, AR...90064, AR...90067, 
AR...90077, AR...90081, AR...90098, AR...90101, AR...90106, 
AR...90112, AR...90133, AR...90155, AR...90176, AR...90178, 
AR...90180, AR...90187, AR...90212, AR...90247, AR...90252, 
AR...90256, AR...90258, AR...90269, AR...90272, AR...90275, 
AR...90294, AR...90298, AR...90300, AR...90337, AR...90338, 
AR...90367, AR...90397, AR...90410, AR...90463, AR...90520, 
AR...90544, AR...90556, AR...90678, AR...90712, AR...90737, 
AR...90744, AR...90829, AR...90862, AR...90863, AR...90873, 
AR...90880, AR...90892, AR...90898, AR...90945, AR...90951, 
AR...90965, AR...90970, AR...90972, AU...15008, AU...15009, 
AU...15027, AU...15032, AU...15036, AU...15038, AU...15046, 
AU...15049, AU...15505), class = factor), year_score_taken = c(2006L, 
2008L, 2009L, 2008L, 2008L, 2011L, 2011L, 2011L, 2009L, 2009L, 
2011L, 2009L, 2010L, 2009L, 2009L, 2009L, 2009L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2012L, 2011L, 2012L, 2011L, 2010L, 
2010L, 2010L, 2011L, 2012L, 2012L, 2011L, 2011L, 2012L, 2011L, 
2011L, 2011L, 2012L, 2011L, 2011L, 2011L, 2012L, 2011L, 2011L, 
2013L, 2012L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 
2012L, 2011L, 2012L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2013L, 2011L, 2011L, 2012L, 2011L, 
2012L, 2012L, 2012L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), COR_LOC = c(15.13404, 
13.88054, 30.0969, 19.09152, 16.88054, 14.15718, 39.15718, 16.15718, 
16.13566, 23.07538, 39.15718, 24.56838, 12.13942, 21.4123, 19.06945, 
12.33264, 32.48872, 30.15718, 37.15718, 37.15718, 49.15718, 22.15718, 
18.50272, 23.69432, 24.9322, 47.29712, 41.15718, 21.47903, 38.6588, 
34.99572, 28.15718, 13.08614, 16.71908, 22.68894, 19.2616, 15.96234, 
22.83964, 13.89992, 14.2616, 18.17118, 24.2616, 22.2616, 13.2616, 
23.96234, 24.89992, 

[R] subsetting to exclude different values for each subject in study

2014-05-22 Thread Monaly Mistry
Hi,

I've written a code to determine the difference in score for a single
subject and its non-neighbours

o-(ao[,c(13,5)]) ##this is the table with the relevant information
o-na.omit(o)  ##omitted data with NA
o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed neighbours
XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference between that
individual and average non-neighbours scores

Since each subject has a different number of non-neighbours I was wondering
if there is an efficient way of writing the code, instead of writing the
same code again and again (76 subjects) for each subject and its
non-neighbours.


Best,

Monaly.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting to exclude different values for each subject in study

2014-05-22 Thread Monaly Mistry
Hi,

Sorry I'm fairly new to R and I don't really understand using dput(), when
you say reproducible example do you mean the code with the output?

Best,

Monaly.


On Thu, May 22, 2014 at 4:03 PM, arun smartpink...@yahoo.com wrote:

 Hi,

 It would be helpful if you provide a reproducible example using ?dput().

 A.K.




 On Thursday, May 22, 2014 10:15 AM, Monaly Mistry monaly.mis...@gmail.com
 wrote:
 Hi,

 I've written a code to determine the difference in score for a single
 subject and its non-neighbours

 o-(ao[,c(13,5)]) ##this is the table with the relevant information
 o-na.omit(o)  ##omitted data with NA
 o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed neighbours
 XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference between that
 individual and average non-neighbours scores

 Since each subject has a different number of non-neighbours I was wondering
 if there is an efficient way of writing the code, instead of writing the
 same code again and again (76 subjects) for each subject and its
 non-neighbours.


 Best,

 Monaly.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting to exclude different values for each subject in study

2014-05-22 Thread Bert Gunter
Follow the link at the bottom of this message!

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, May 22, 2014 at 8:31 AM, Monaly Mistry monaly.mis...@gmail.com wrote:
 Hi,

 Sorry I'm fairly new to R and I don't really understand using dput(), when
 you say reproducible example do you mean the code with the output?

 Best,

 Monaly.


 On Thu, May 22, 2014 at 4:03 PM, arun smartpink...@yahoo.com wrote:

 Hi,

 It would be helpful if you provide a reproducible example using ?dput().

 A.K.




 On Thursday, May 22, 2014 10:15 AM, Monaly Mistry monaly.mis...@gmail.com
 wrote:
 Hi,

 I've written a code to determine the difference in score for a single
 subject and its non-neighbours

 o-(ao[,c(13,5)]) ##this is the table with the relevant information
 o-na.omit(o)  ##omitted data with NA
 o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed neighbours
 XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference between that
 individual and average non-neighbours scores

 Since each subject has a different number of non-neighbours I was wondering
 if there is an efficient way of writing the code, instead of writing the
 same code again and again (76 subjects) for each subject and its
 non-neighbours.


 Best,

 Monaly.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting to exclude different values for each subject in study

2014-05-22 Thread Monaly Mistry
 does not tell the whole
 tale. R supports a number of different data types: vectors, matrices,
 data.frames, lists, arrays and others. This site gives a useful though not
 complete summary of many data types
 http://www.statmethods.net/input/datatypes.html. When you have just
 created a new data set, or even when working with one that you have not
 worked with in some time it is a good idea to do a str() and class() on the
 data object just to be sure that you are working with the data types you
 think you have. What looks like a column of numbers in a data.frame may
 actually be a set of factors or a set of character (text) data and you're
 left wondering why multiplying it by some number is not working.

 Here is a short example to illustrate. Just copy and paste in the code
  dat1  - data.frame(aa = as.factor(1:5), bb = 1:5)
 dat1 # data looks identical on the screen
 5*dat1[,aa]  # oops
 5*dat1[, bb] # okay
 str(dat1)


 John Kane
 Kingston ON Canada


  -Original Message-
  From: monaly.mis...@gmail.com
  Sent: Thu, 22 May 2014 16:31:39 +0100
  To: smartpink...@yahoo.com, r-help@r-project.org
  Subject: Re: [R] subsetting to exclude different values for each subject
  in study
 
  Hi,
 
  Sorry I'm fairly new to R and I don't really understand using dput(),
  when
  you say reproducible example do you mean the code with the output?
 
  Best,
 
  Monaly.
 
 
  On Thu, May 22, 2014 at 4:03 PM, arun smartpink...@yahoo.com wrote:
 
  Hi,
 
  It would be helpful if you provide a reproducible example using ?dput().
 
  A.K.
 
 
 
 
  On Thursday, May 22, 2014 10:15 AM, Monaly Mistry
  monaly.mis...@gmail.com
  wrote:
  Hi,
 
  I've written a code to determine the difference in score for a single
  subject and its non-neighbours
 
  o-(ao[,c(13,5)]) ##this is the table with the relevant information
  o-na.omit(o)  ##omitted data with NA
  o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed neighbours
  XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference between
  that
  individual and average non-neighbours scores
 
  Since each subject has a different number of non-neighbours I was
  wondering
  if there is an efficient way of writing the code, instead of writing the
  same code again and again (76 subjects) for each subject and its
  non-neighbours.
 
 
  Best,
 
  Monaly.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 
 FREE ONLINE PHOTOSHARING - Share your photos online with your friends and
 family!
 Visit http://www.inbox.com/photosharing to find out more!




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting to exclude different values for each subject in study

2014-05-22 Thread Monaly Mistry
])

 #This is the average observed discrepancy from individuals to neighbours
 #IOW on average how different is a focal bird in this year different from
 its neighbours
 obso=mean(XO[avg,])
 print(paste(Observed=, obso))
 XY[15,1]-round(obso, digits=4)


 #This is the code I previously posted to find the difference in scores
 between a single subject and its non-neighbours
 o-(ao[,c(13,5)])
 o-na.omit(o)
 o-o[!o$NestkastNummer %in% c(176,140,162,713),]
 XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]


 Best,

 Monaly.


 On Thu, May 22, 2014 at 5:08 PM, John Kane jrkrid...@inbox.com wrote:

  Re dput() etc
  https://github.com/hadley/devtools/wiki/Reproducibility
 
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 
  What dput() does is take your data and ouput it in an ascii format that
  let's the reader here create an exact duplicate of your database.
 
  R is not WYSIWYG. Often what you see on the screen does not tell the
whole
  tale. R supports a number of different data types: vectors, matrices,
  data.frames, lists, arrays and others. This site gives a useful though
not
  complete summary of many data types
  http://www.statmethods.net/input/datatypes.html. When you have just
  created a new data set, or even when working with one that you have not
  worked with in some time it is a good idea to do a str() and class() on
the
  data object just to be sure that you are working with the data types you
  think you have. What looks like a column of numbers in a data.frame may
  actually be a set of factors or a set of character (text) data and
you're
  left wondering why multiplying it by some number is not working.
 
  Here is a short example to illustrate. Just copy and paste in the code
   dat1  - data.frame(aa = as.factor(1:5), bb = 1:5)
  dat1 # data looks identical on the screen
  5*dat1[,aa]  # oops
  5*dat1[, bb] # okay
  str(dat1)
 
 
  John Kane
  Kingston ON Canada
 
 
   -Original Message-
   From: monaly.mis...@gmail.com
   Sent: Thu, 22 May 2014 16:31:39 +0100
   To: smartpink...@yahoo.com, r-help@r-project.org
   Subject: Re: [R] subsetting to exclude different values for each
subject
   in study
  
   Hi,
  
   Sorry I'm fairly new to R and I don't really understand using dput(),
   when
   you say reproducible example do you mean the code with the output?
  
   Best,
  
   Monaly.
  
  
   On Thu, May 22, 2014 at 4:03 PM, arun smartpink...@yahoo.com wrote:
  
   Hi,
  
   It would be helpful if you provide a reproducible example using
?dput().
  
   A.K.
  
  
  
  
   On Thursday, May 22, 2014 10:15 AM, Monaly Mistry
   monaly.mis...@gmail.com
   wrote:
   Hi,
  
   I've written a code to determine the difference in score for a single
   subject and its non-neighbours
  
   o-(ao[,c(13,5)]) ##this is the table with the relevant information
   o-na.omit(o)  ##omitted data with NA
   o-o[!o$NestkastNummer %in% c(176,140,162,713),] ##removed neighbours
   XO[7,1]-abs((XO[1,176]-(mean(o[,COR_LOC]  #difference
between
   that
   individual and average non-neighbours scores
  
   Since each subject has a different number of non-neighbours I was
   wondering
   if there is an efficient way of writing the code, instead of writing
the
   same code again and again (76 subjects) for each subject and its
   non-neighbours.
  
  
   Best,
  
   Monaly.
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  
  FREE ONLINE PHOTOSHARING - Share your photos online with your friends
and
  family!
  Visit http://www.inbox.com/photosharing to find out more!
 
 
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting data by ID with different constraints

2014-04-04 Thread Lib Gray
Hello,

I have a data set with many individuals all with multiple timed
observations, and I would like to subset the data to exclude later timed
observations.
However, I would like to exclude different amounts of data for each
individual. These individuals have two types of data: DV and dose. What I
would like to do is exclude later instances when one of the types of data
is no longer included.

The data is structured with an (approximate) 28 day cycle. Each individual
has a baseline DV, and on day 1, they receive their first dose. Around day
28, they will have their first DV observed. This means that an individual
should have one less dose data item than they have DV data items.

What I would like is to take the following:

IDTIMEDV   DOSE  TYPE
1 0 0NA 2
1 1 NA 1001
1 27   0NA 2
1 29   NA 1001
1 54   2NA 2
1 84   3NA 2
1 100 3NA 2
1 127 3NA 2

2 0 0NA 2
2 1 NA 1201
2 28   4NA 2
2 29   NA 1201
2 56   8NA 2
2 57   NA 1001

3 0 2NA 2
3 1 NA 80  1
3 28   5NA 2
3 56   2NA 1
3 84   1NA 2

4 0 0NA 2
4 1 NA 1001
4 29   NA 1001
4 57   NA 1001
4 85   NA 1001
...


And turn it into:

IDTIMEDV   DOSE  TYPE
1 0 0NA 2
1 1 NA 1001
1 27   0NA 2
1 29   NA 1001
1 54   2NA 2

2 0 0NA 2
2 1 NA 1201
2 28   4NA 2
2 29   NA 1201
2 56   8NA 2

3 0 2NA 2
3 1 NA 80  1
3 28   5NA 2

4 0 0NA 2
...


My thought for how to do this was to:

(1)  Subset the data by the maximum time an individual had an observed DV
(type=2). However, this will be a different time for every patient, and I
was unsure how to do this type of subsetting.

(2) After I had done that, I would want to take my new subsetted data and
determine the maximum time an individual had a dose. Then I would
determine the total rows of data a patient had up to their last dose data
time. Then I could subset the data by taking the first n+1 observations
for each individual, n=total rows of data a patient had up to their last
dose data time. This step I would hope I could determine from knowing how
to do step (1), if I can use table and max interchangeably.

Any help would be appreciated!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a dataframe by dynamic column name

2014-03-27 Thread Sneha Bishnoi
Hi all!

I am trying to drop columns from a data frame dynamically depending on user
input. The dataframe whose columns need to be dropped is called Finaldata
So here is what I do:

V is a dataframe with columns v1 and v2 as follows

   v1  v2
1   1  Shape
2   0   Length
3   0   Rate

v1 corresponds to user input, 1 if you want to drop the column, 0 otherwise
v2 corresponds to column names of the all the columns in Finaldata
I then use following code to drop columns

for (i in 1:3)
  {
if(V$v1[i]==1)
{
  print(V$v2[i])
  Finaldata-subset(Finaldata,select=-c(V$v2[i]))
}

  }

However v2 being type character is not accepted by subset.
I read subset needs column names without quotes.
I tried stripping off quotes through gsub and cat,however it din't help
There are lot of columns and I cannot perform this individually on all
columns.
How do i go about it?


Thanks!
SB

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a dataframe by dynamic column name

2014-03-27 Thread Sarah Goslee
There are many ways. You're making it overly complicated

Here, in an actual reproducible example (as you were requested to submit):

V - data.frame(v1=c(1,0,0), v2=c(Shape, Length, Rate),
stringsAsFactors=FALSE)
Finaldata - data.frame(Shape = runif(5), Length = runif(5), Rate = runif(5))

# assuming names in V are not in the same order as columns in Finaldata
# also assuming there might accidentally be names not in Finaldata
Finaldata[, colnames(Finaldata) %in% V$v2[V$v1 == 0]]

# more elegant?
Finaldata[, colnames(Finaldata) %in% V$v2[!V$v1]]

# not robust to errors in V
Finaldata[, V$v2[!V$v1]]

# assumes order of column names matches order of V
Finaldata[, -V$v1]

Sarah

On Thu, Mar 27, 2014 at 11:09 AM, Sneha Bishnoi sneha.bish...@gmail.com wrote:
 Hi all!

 I am trying to drop columns from a data frame dynamically depending on user
 input. The dataframe whose columns need to be dropped is called Finaldata
 So here is what I do:

 V is a dataframe with columns v1 and v2 as follows

v1  v2
 1   1  Shape
 2   0   Length
 3   0   Rate

 v1 corresponds to user input, 1 if you want to drop the column, 0 otherwise
 v2 corresponds to column names of the all the columns in Finaldata
 I then use following code to drop columns

 for (i in 1:3)
   {
 if(V$v1[i]==1)
 {
   print(V$v2[i])
   Finaldata-subset(Finaldata,select=-c(V$v2[i]))
 }

   }

 However v2 being type character is not accepted by subset.
 I read subset needs column names without quotes.
 I tried stripping off quotes through gsub and cat,however it din't help
 There are lot of columns and I cannot perform this individually on all
 columns.
 How do i go about it?


 Thanks!
 SB


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a dataframe by dynamic column name

2014-03-27 Thread Sneha Bishnoi
Hi Sarah,

Thanks! Do agree its over complicated.
However looking at the solutions I think I did not state my problem
completely.
V provides choices for only certain set of columns in Finaldata.
So v2 may not represent all columns of Finaldata.
I want to retain columns not provided as a choice for users plus the ones
user chooses to keep.

Thanks!



On Thu, Mar 27, 2014 at 11:30 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 There are many ways. You're making it overly complicated

 Here, in an actual reproducible example (as you were requested to submit):

 V - data.frame(v1=c(1,0,0), v2=c(Shape, Length, Rate),
 stringsAsFactors=FALSE)
 Finaldata - data.frame(Shape = runif(5), Length = runif(5), Rate =
 runif(5))

 # assuming names in V are not in the same order as columns in Finaldata
 # also assuming there might accidentally be names not in Finaldata
 Finaldata[, colnames(Finaldata) %in% V$v2[V$v1 == 0]]

 # more elegant?
 Finaldata[, colnames(Finaldata) %in% V$v2[!V$v1]]

 # not robust to errors in V
 Finaldata[, V$v2[!V$v1]]

 # assumes order of column names matches order of V
 Finaldata[, -V$v1]

 Sarah

 On Thu, Mar 27, 2014 at 11:09 AM, Sneha Bishnoi sneha.bish...@gmail.com
 wrote:
  Hi all!
 
  I am trying to drop columns from a data frame dynamically depending on
 user
  input. The dataframe whose columns need to be dropped is called
 Finaldata
  So here is what I do:
 
  V is a dataframe with columns v1 and v2 as follows
 
 v1  v2
  1   1  Shape
  2   0   Length
  3   0   Rate
 
  v1 corresponds to user input, 1 if you want to drop the column, 0
 otherwise
  v2 corresponds to column names of the all the columns in Finaldata
  I then use following code to drop columns
 
  for (i in 1:3)
{
  if(V$v1[i]==1)
  {
print(V$v2[i])
Finaldata-subset(Finaldata,select=-c(V$v2[i]))
  }
 
}
 
  However v2 being type character is not accepted by subset.
  I read subset needs column names without quotes.
  I tried stripping off quotes through gsub and cat,however it din't help
  There are lot of columns and I cannot perform this individually on all
  columns.
  How do i go about it?
 
 
  Thanks!
  SB
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org




-- 
Sneha Bishnoi
+14047235469
H. Milton Stewart School of Industrial   Systems Engineering
Georgia Tech

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a dataframe by dynamic column name

2014-03-27 Thread David Carlson
That only requires two small changes in Sarah's first solution:

 Finaldata[, !colnames(Finaldata) %in% V$v2[V$v1 == 1]]
  Length   Rate
1 0.53607323 0.01739951
2 0.15405615 0.11837908
3 0.04542388 0.53702702
4 0.15633703 0.68870041
5 0.35293973 0.38258981

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Sneha Bishnoi
Sent: Thursday, March 27, 2014 11:06 AM
To: Sarah Goslee
Cc: r-help
Subject: Re: [R] Subsetting a dataframe by dynamic column name

Hi Sarah,

Thanks! Do agree its over complicated.
However looking at the solutions I think I did not state my
problem
completely.
V provides choices for only certain set of columns in Finaldata.
So v2 may not represent all columns of Finaldata.
I want to retain columns not provided as a choice for users plus
the ones
user chooses to keep.

Thanks!



On Thu, Mar 27, 2014 at 11:30 AM, Sarah Goslee
sarah.gos...@gmail.comwrote:

 There are many ways. You're making it overly complicated

 Here, in an actual reproducible example (as you were requested
to submit):

 V - data.frame(v1=c(1,0,0), v2=c(Shape, Length, Rate),
 stringsAsFactors=FALSE)
 Finaldata - data.frame(Shape = runif(5), Length = runif(5),
Rate =
 runif(5))

 # assuming names in V are not in the same order as columns in
Finaldata
 # also assuming there might accidentally be names not in
Finaldata
 Finaldata[, colnames(Finaldata) %in% V$v2[V$v1 == 0]]

 # more elegant?
 Finaldata[, colnames(Finaldata) %in% V$v2[!V$v1]]

 # not robust to errors in V
 Finaldata[, V$v2[!V$v1]]

 # assumes order of column names matches order of V
 Finaldata[, -V$v1]

 Sarah

 On Thu, Mar 27, 2014 at 11:09 AM, Sneha Bishnoi
sneha.bish...@gmail.com
 wrote:
  Hi all!
 
  I am trying to drop columns from a data frame dynamically
depending on
 user
  input. The dataframe whose columns need to be dropped is
called
 Finaldata
  So here is what I do:
 
  V is a dataframe with columns v1 and v2 as follows
 
 v1  v2
  1   1  Shape
  2   0   Length
  3   0   Rate
 
  v1 corresponds to user input, 1 if you want to drop the
column, 0
 otherwise
  v2 corresponds to column names of the all the columns in
Finaldata
  I then use following code to drop columns
 
  for (i in 1:3)
{
  if(V$v1[i]==1)
  {
print(V$v2[i])
Finaldata-subset(Finaldata,select=-c(V$v2[i]))
  }
 
}
 
  However v2 being type character is not accepted by subset.
  I read subset needs column names without quotes.
  I tried stripping off quotes through gsub and cat,however it
din't help
  There are lot of columns and I cannot perform this
individually on all
  columns.
  How do i go about it?
 
 
  Thanks!
  SB
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org




-- 
Sneha Bishnoi
+14047235469
H. Milton Stewart School of Industrial   Systems Engineering
Georgia Tech

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a dataframe by dynamic column name

2014-03-27 Thread Sneha Bishnoi
Thank you! Works like a charm!


On Thu, Mar 27, 2014 at 12:19 PM, David Carlson dcarl...@tamu.edu wrote:

 That only requires two small changes in Sarah's first solution:

  Finaldata[, !colnames(Finaldata) %in% V$v2[V$v1 == 1]]
   Length   Rate
 1 0.53607323 0.01739951
 2 0.15405615 0.11837908
 3 0.04542388 0.53702702
 4 0.15633703 0.68870041
 5 0.35293973 0.38258981

 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Sneha Bishnoi
 Sent: Thursday, March 27, 2014 11:06 AM
 To: Sarah Goslee
 Cc: r-help
 Subject: Re: [R] Subsetting a dataframe by dynamic column name

 Hi Sarah,

 Thanks! Do agree its over complicated.
 However looking at the solutions I think I did not state my
 problem
 completely.
 V provides choices for only certain set of columns in Finaldata.
 So v2 may not represent all columns of Finaldata.
 I want to retain columns not provided as a choice for users plus
 the ones
 user chooses to keep.

 Thanks!



 On Thu, Mar 27, 2014 at 11:30 AM, Sarah Goslee
 sarah.gos...@gmail.comwrote:

  There are many ways. You're making it overly complicated
 
  Here, in an actual reproducible example (as you were requested
 to submit):
 
  V - data.frame(v1=c(1,0,0), v2=c(Shape, Length, Rate),
  stringsAsFactors=FALSE)
  Finaldata - data.frame(Shape = runif(5), Length = runif(5),
 Rate =
  runif(5))
 
  # assuming names in V are not in the same order as columns in
 Finaldata
  # also assuming there might accidentally be names not in
 Finaldata
  Finaldata[, colnames(Finaldata) %in% V$v2[V$v1 == 0]]
 
  # more elegant?
  Finaldata[, colnames(Finaldata) %in% V$v2[!V$v1]]
 
  # not robust to errors in V
  Finaldata[, V$v2[!V$v1]]
 
  # assumes order of column names matches order of V
  Finaldata[, -V$v1]
 
  Sarah
 
  On Thu, Mar 27, 2014 at 11:09 AM, Sneha Bishnoi
 sneha.bish...@gmail.com
  wrote:
   Hi all!
  
   I am trying to drop columns from a data frame dynamically
 depending on
  user
   input. The dataframe whose columns need to be dropped is
 called
  Finaldata
   So here is what I do:
  
   V is a dataframe with columns v1 and v2 as follows
  
  v1  v2
   1   1  Shape
   2   0   Length
   3   0   Rate
  
   v1 corresponds to user input, 1 if you want to drop the
 column, 0
  otherwise
   v2 corresponds to column names of the all the columns in
 Finaldata
   I then use following code to drop columns
  
   for (i in 1:3)
 {
   if(V$v1[i]==1)
   {
 print(V$v2[i])
 Finaldata-subset(Finaldata,select=-c(V$v2[i]))
   }
  
 }
  
   However v2 being type character is not accepted by subset.
   I read subset needs column names without quotes.
   I tried stripping off quotes through gsub and cat,however it
 din't help
   There are lot of columns and I cannot perform this
 individually on all
   columns.
   How do i go about it?
  
  
   Thanks!
   SB
  
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 



 --
 Sneha Bishnoi
 +14047235469
 H. Milton Stewart School of Industrial   Systems Engineering
 Georgia Tech

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
 code.




-- 
Sneha Bishnoi
+14047235469
H. Milton Stewart School of Industrial   Systems Engineering
Georgia Tech

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting between two values (into a range)????

2014-03-10 Thread arun
Hi,
If 'dat' is the dataset:
Try

subset(dat, Start  MapInfo  End  MapInfo)

A.K.


Dear All, 

I want to subset a column (MapInfo in the attached photo) in csv
 file if its values be ranged between values in two other columns (Start
 and End in the attached photo) using R 3.0.1. Thank you in advance for 
all nice guides. 

M.H. Banabazi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting between two values (into a range)????

2014-03-10 Thread Duncan Murdoch

On 14-03-10 10:47 AM, arun wrote:

Hi,
If 'dat' is the dataset:
Try

subset(dat, Start  MapInfo  End  MapInfo)


A bit of advice I think I read in The Elements of Programming Style: 
try to make complex conjunctions look like their mathematical 
equivalents, and they'll be easier to read.  The mathematical way to 
write the test above would be Start  MapInfo  End, so the programmatic 
way to write it should be


Start  MapInfo  MapInfo  End

This makes the character of the interval completely obvious.

Duncan Murdoch



A.K.


Dear All,

I want to subset a column (MapInfo in the attached photo) in csv
  file if its values be ranged between values in two other columns (Start
  and End in the attached photo) using R 3.0.1. Thank you in advance for
all nice guides.

M.H. Banabazi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a named list of parameters in mle

2014-02-15 Thread John Hodgson
I have a 7-parameter model to fit using mle.  I would like to generate 
fits for all
pairs of parameters (with others fixed)

  The following code looked like it should work:

library(stats4)

# dummy mll function for sake of example
mll = function (lnk=1.5,lnhs=-5,lnhi=-5,lnss=-5,lnsi=-5,lnws=-5,lnwi=-5) 
lnk^2 + lnhs^2 + lnhi^2+ lnss^2 + lnsi^2+ lnws^2 + lnwi^2

fit=1:6
pars = list(lnk=1.5,lnhs=-5,lnhi=-5,lnss=-5,lnsi=-5,lnws=-5,lnwi=-5)

for (i in 2:7)  {
fit[i] = mle(mll,start=pars[c(1,i)],fixed= 
pars[-c(1,i)],method=Nelder)
}


but it gives the following error:
}
Error in fit[i] = mle(mll, start = pars[c(1, i)], fixed = pars[-c(1, 
i)],  :
   incompatible types (from S4 to integer) in subassignment type fix

What have I missed?

Thanks for any suggestions

John Hodgson


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a named list of parameters in mle

2014-02-15 Thread Bert Gunter
fit is initialized as a vector of integers. How can you assign an mle
fit to an element of an integer vector?

Initialize fit as a list, use lapply, or whatever. Have you read An
Intro to R (ships with R) or other R (e.g. web) tutorial? This looks
like the sort of basic misunderstanding that one who has not made much
effort to learn how R works would make.If not, please do so before
posting further.

Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Sat, Feb 15, 2014 at 5:38 AM, John Hodgson j...@formby.plus.com wrote:
 I have a 7-parameter model to fit using mle.  I would like to generate
 fits for all
 pairs of parameters (with others fixed)

   The following code looked like it should work:

 library(stats4)

 # dummy mll function for sake of example
 mll = function (lnk=1.5,lnhs=-5,lnhi=-5,lnss=-5,lnsi=-5,lnws=-5,lnwi=-5)
 lnk^2 + lnhs^2 + lnhi^2+ lnss^2 + lnsi^2+ lnws^2 + lnwi^2

 fit=1:6
 pars = list(lnk=1.5,lnhs=-5,lnhi=-5,lnss=-5,lnsi=-5,lnws=-5,lnwi=-5)

 for (i in 2:7)  {
 fit[i] = mle(mll,start=pars[c(1,i)],fixed=
 pars[-c(1,i)],method=Nelder)
 }


 but it gives the following error:
 }
 Error in fit[i] = mle(mll, start = pars[c(1, i)], fixed = pars[-c(1,
 i)],  :
incompatible types (from S4 to integer) in subassignment type fix

 What have I missed?

 Thanks for any suggestions

 John Hodgson


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread Jeff Johnson
I'm running the following to get what I would expect is a subset of
countries that are not equal to US AND COUNTRY is not in one of my
validcountries values.

non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
select = COUNTRY, na.rm=TRUE)

however, when I then do table(non_us) I get:
 table(non_us)
non_us
   AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
EC ES
 0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
 2  4
FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
NZ PA
 2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
 3  1
PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
 2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3


Notice US appears as the second to last. I expected it to NOT appear.

Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
valid syntax? I don't get errors, but then again I don't get what I expect
back.

Thanks in advance!



-- 
Jeff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread arun
Hi,
Try:
table(as.character(non_us[,COUNTRY]))
A.K.




On Tuesday, January 14, 2014 3:17 PM, Jeff Johnson mrjeffto...@gmail.com 
wrote:
I'm running the following to get what I would expect is a subset of
countries that are not equal to US AND COUNTRY is not in one of my
validcountries values.

non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
select = COUNTRY, na.rm=TRUE)

however, when I then do table(non_us) I get:
 table(non_us)
non_us
   AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
EC ES
0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
2  4
FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
NZ PA
2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
3  1
PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3


Notice US appears as the second to last. I expected it to NOT appear.

Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
valid syntax? I don't get errors, but then again I don't get what I expect
back.

Thanks in advance!



-- 
Jeff

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread Marc Schwartz
On Jan 14, 2014, at 1:38 PM, Jeff Johnson mrjeffto...@gmail.com wrote:

 I'm running the following to get what I would expect is a subset of
 countries that are not equal to US AND COUNTRY is not in one of my
 validcountries values.
 
 non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
 select = COUNTRY, na.rm=TRUE)
 
 however, when I then do table(non_us) I get:
 table(non_us)
 non_us
   AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
 EC ES
 0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
 2  4
 FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
 NZ PA
 2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
 3  1
 PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
 2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3
 
 
 Notice US appears as the second to last. I expected it to NOT appear.
 
 Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
 AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
 valid syntax? I don't get errors, but then again I don't get what I expect
 back.
 
 Thanks in advance!
 
 
 
 -- 
 Jeff


Review the Details section of ?subset, where you will find the following:

Factors may have empty levels after subsetting; unused levels are not 
automatically removed. See droplevels for a way to drop all unused levels from 
a data frame.


Your syntax is fine and the behavior is as expected.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread William Dunlap
Here is a reproducible example of your problem where you do not
want to see a table entry for Medium.
   tmp_df - data.frame(Size=factor(rep(c(Small,Medium,Large),1:3), 
levels=c(Small,Medium,Large)))
   non_medium - subset(tmp_df, Size != Medium, select=Size)
   table(non_medium)
  non_medium
   Small Medium  Large 
   1  0  3

The problem arises because, by default, when you take a subset of a factor
all the levels of the factor are retained and table(factor) makes an entry for
every level.  If you want to drop the unused levels in a factor (and retain the
order of the remaining levels) you can pass it through the factor function:
   table(Size=factor(non_medium$Size))
  Size
  Small Large 
 1 3

You can also subset the factor with the drop=TRUE argument to drop the unused
levels when you make the subset
table(Size=tmp_df$Size[tmp_df$Size != Medium, drop=TRUE])
   Size
  Small Large 
  1 3

Some will say to use as.character on the factor or not to use factors at all.  
That
works if you are OK with the entries in the table being in alphabetic order and
not a semantic order of your choosing.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Jeff Johnson
 Sent: Tuesday, January 14, 2014 11:39 AM
 To: r-help@r-project.org
 Subject: [R] Subsetting on multiple criteria (AND condition) in R
 
 I'm running the following to get what I would expect is a subset of
 countries that are not equal to US AND COUNTRY is not in one of my
 validcountries values.
 
 non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
 select = COUNTRY, na.rm=TRUE)
 
 however, when I then do table(non_us) I get:
  table(non_us)
 non_us
AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
 EC ES
  0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
  2  4
 FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
 NZ PA
  2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
  3  1
 PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
  2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3
 
 
 Notice US appears as the second to last. I expected it to NOT appear.
 
 Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
 AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
 valid syntax? I don't get errors, but then again I don't get what I expect
 back.
 
 Thanks in advance!
 
 
 
 --
 Jeff
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread Jeff Johnson
Thanks so much Marc and for those that responded. Mark's suggestion with
droplevels gave me the desired result.

I'm new to figuring out how to post reproducible code. I'll try using the
set.seed and rnorm functions next time and hope that does the trick.
 Thanks everyone!


On Tue, Jan 14, 2014 at 1:05 PM, Marc Schwartz marc_schwa...@me.com wrote:

 On Jan 14, 2014, at 1:38 PM, Jeff Johnson mrjeffto...@gmail.com wrote:

  I'm running the following to get what I would expect is a subset of
  countries that are not equal to US AND COUNTRY is not in one of my
  validcountries values.
 
  non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
  select = COUNTRY, na.rm=TRUE)
 
  however, when I then do table(non_us) I get:
  table(non_us)
  non_us
AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
  EC ES
  0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
  2  4
  FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL
 NO
  NZ PA
  2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
  3  1
  PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
  2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3
 
 
  Notice US appears as the second to last. I expected it to NOT appear.
 
  Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
  AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
  valid syntax? I don't get errors, but then again I don't get what I
 expect
  back.
 
  Thanks in advance!
 
 
 
  --
  Jeff


 Review the Details section of ?subset, where you will find the following:

 Factors may have empty levels after subsetting; unused levels are not
 automatically removed. See droplevels for a way to drop all unused levels
 from a data frame.


 Your syntax is fine and the behavior is as expected.

 Regards,

 Marc Schwartz




-- 
Jeff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting 3D array

2014-01-09 Thread arun
Hi Alex,
Try:
set.seed(345)
results- array(sample(-5:5,120,replace=TRUE),dim=c(10,3,4))
indx - !!apply(results,1,sum)
library(plyr)
results2 - laply(lapply(seq(dim(results)[1]),function(i) 
results[i,,])[indx],identity)
attr(results2,dimnames) - NULL
 dim(results2)
#[1] 9 3 4

A.K.



I have a 3D array with 13,000 11x8 matrices. 

dim(results
[1] 13000    11     8 

Some matrices in the array add up to 0. For example 

sum(results[1,,])==0
[1] TRUE 

I would like to remove these. How can I do this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting 3D array

2014-01-09 Thread Bert Gunter
Just use apply() and indexing instead!

results[,,apply(results,3,sum)TRUE]
## will do it.

However, note that numerical error may make a hash of this. So safer
would be something like:

eps - 1e-15 ## i.e. something small
results[,,abs(apply(results,3,sum))eps]


Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Jan 9, 2014 at 7:23 PM, arun smartpink...@yahoo.com wrote:
 Hi Alex,
 Try:
 set.seed(345)
 results- array(sample(-5:5,120,replace=TRUE),dim=c(10,3,4))
 indx - !!apply(results,1,sum)
 library(plyr)
 results2 - laply(lapply(seq(dim(results)[1]),function(i) 
 results[i,,])[indx],identity)
 attr(results2,dimnames) - NULL
  dim(results2)
 #[1] 9 3 4

 A.K.



 I have a 3D array with 13,000 11x8 matrices.

 dim(results
 [1] 1300011 8

 Some matrices in the array add up to 0. For example

 sum(results[1,,])==0
 [1] TRUE

 I would like to remove these. How can I do this?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting 3D array

2014-01-09 Thread arun
I figured it out:
dim(results[apply(results,1,sum)TRUE,,])
#[1] 9 3 4
A.K.




On , arun smartpink...@yahoo.com wrote:


dim(results[,,apply(results,3,sum)TRUE])
#[1] 10  3  4
dim(results[,,abs(apply(results,3,sum))eps])
#[1] 10  3  4

 dim(results2)
#[1] 9 3 4
A.K.




On Friday, January 10, 2014 12:56 AM, Bert Gunter gunter.ber...@gene.com 
wrote:
Just use apply() and indexing instead!

results[,,apply(results,3,sum)TRUE]
## will do it.

However, note that numerical error may make a hash of this. So safer
would be something like:

eps - 1e-15 ## i.e. something small
results[,,abs(apply(results,3,sum))eps]


Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch





On Thu, Jan 9, 2014 at 7:23 PM, arun smartpink...@yahoo.com wrote:
 Hi Alex,
 Try:
 set.seed(345)
 results- array(sample(-5:5,120,replace=TRUE),dim=c(10,3,4))
 indx - !!apply(results,1,sum)
 library(plyr)
 results2 - laply(lapply(seq(dim(results)[1]),function(i) 
 results[i,,])[indx],identity)
 attr(results2,dimnames) - NULL
  dim(results2)
 #[1] 9 3 4

 A.K.



 I have a 3D array with 13,000 11x8 matrices.

 dim(results
 [1] 13000    11     8

 Some matrices in the array add up to 0. For example

 sum(results[1,,])==0
 [1] TRUE

 I would like to remove these. How can I do this?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting vector with preserved order

2014-01-02 Thread arun
Hi,
Try ?match
 b[match(d,a)]
#[1] Joe  Bob  Dick
A.K.


I have three vectors as follows: 

 a - c('A','B','C','D','E') 
 b - c('Tom','Dick','Harry','Bob','Joe') 
 d - c('E','D','B') 

Subsetting b by using d on a, with b[a %in% d], gives the names in the order 
they appear in b: 

  b[a %in% d] 
 [1] Dick Bob  Joe 

But I'd like them to show in the order in d, as Joe Bob Dick. What is the 
easy way to do this? 

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting vector with preserved order

2014-01-02 Thread Hervé Pagès

Hi

On 01/02/2014 04:04 PM, arun wrote:

Hi,
Try ?match
  b[match(d,a)]
#[1] Joe  Bob  Dick


Or use 'a' to put names on 'b':

   names(b) - a
   b
A   B   C   D   E
Tom  Dick Harry   Bob   Joe

Then subset by names:

   b[d]
   E  D  B
   Joe  Bob Dick

Cheers,
H.



A.K.


I have three vectors as follows:


a - c('A','B','C','D','E')
b - c('Tom','Dick','Harry','Bob','Joe')
d - c('E','D','B')


Subsetting b by using d on a, with b[a %in% d], gives the names in the order 
they appear in b:


  b[a %in% d]
[1] Dick Bob  Joe


But I'd like them to show in the order in d, as Joe Bob Dick. What is the 
easy way to do this?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting Timestamped data

2013-10-07 Thread MacQueen, Don
Here is an approach using base R tools (not tested, so I hope I don't
embarrass myself!)

dayid - format(data$TimeStamp, '%Y-%m-%d')
day.counts - table(dayid)
good.days - names(day.counts)[day.counts == 48]
subset(data, dayid %in% good.days)

This could be written in a one-liner, but it's much easier to understand
and to check if done step by step.

(And I'll indulge in a side comment ... as a matter of personal opinion, I
think it's beneficial to learn how to do basic data manipulation using
base R tools before delving into the use of more sophisticated functions
from various packages. This helps build R skills.)

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 10/4/13 8:03 AM, aj...@bath.ac.uk aj...@bath.ac.uk wrote:


Hi,

I have a data frame, data, containing two columns: one- the TimeStamp
(formatted using data$TimeStamp -
as.POSTIXct(as.character(data$TimeStamp), format = %d/%m/%Y %H:%M) )
and two- the data value.

The data frame has been read from a .csv file and should contain 48
values for each day of the year (values sampled at 30 minute
intervals). However, there are only 15,948 observations i.e. only
approx 332 days worth of data. I therefore would like to remove any
days that do not contain the 48 values.

My question, how would I go about doing this?

Many thanks,

-A.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting Timestamped data

2013-10-04 Thread aj409


Hi,

I have a data frame, data, containing two columns: one- the TimeStamp  
(formatted using data$TimeStamp -  
as.POSTIXct(as.character(data$TimeStamp), format = %d/%m/%Y %H:%M) )  
and two- the data value.


The data frame has been read from a .csv file and should contain 48  
values for each day of the year (values sampled at 30 minute  
intervals). However, there are only 15,948 observations i.e. only  
approx 332 days worth of data. I therefore would like to remove any  
days that do not contain the 48 values.


My question, how would I go about doing this?

Many thanks,

-A.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting Timestamped data

2013-10-04 Thread arun
Hi,

May be this helps:

set.seed(45)
df1- 
data.frame(datetime=as.POSIXct(2011-05-25,tz=GMT)+0:200*30*60,value=sample(1:40,201,replace=TRUE),value2=
 sample(45:90,201,replace=TRUE))
 df2- df1[ave(1:nrow(df1),as.Date(df1[,1]),FUN=length)==48,]
 dim(df2)
#[1] 192   3

#or
library(plyr)
df3-df1[ddply(df1,.(as.Date(datetime)),mutate,Ldt=length(datetime)==48)$Ldt,] 
 identical(df3,df2)
#[1] TRUE


A.K.



- Original Message -
From: aj...@bath.ac.uk aj...@bath.ac.uk
To: r-help@r-project.org
Cc: 
Sent: Friday, October 4, 2013 11:03 AM
Subject: [R] Subsetting Timestamped data


Hi,

I have a data frame, data, containing two columns: one- the TimeStamp  
(formatted using data$TimeStamp -  
as.POSTIXct(as.character(data$TimeStamp), format = %d/%m/%Y %H:%M) )  
and two- the data value.

The data frame has been read from a .csv file and should contain 48  
values for each day of the year (values sampled at 30 minute  
intervals). However, there are only 15,948 observations i.e. only  
approx 332 days worth of data. I therefore would like to remove any  
days that do not contain the 48 values.

My question, how would I go about doing this?

Many thanks,

-A.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting isolating a group of values in a group of variables

2013-09-08 Thread arun
Hi Razi,

Using dat1:
dat1[apply(dat1[,2:4],1,function(x) any(x%in% vec1)),]
#  ID diag1 diag2 diag3 proc1 proc2 proc3
#2  2   k69   i80  u456  z456  z123  z456
#3  3   l91  i801  g678  u456  u123  u123
#4  4   i80   i90  h983  z123  z456  z456


#similarly, if the columns are from 18:93, change accordingly.

A.K.






Hi, thanks again. 

Just wondering, if you have a data frame dat.1 and you know the 
diag codes are in columns from 18:93, is there any way to search for 
the vec1 codes while specifying the range of columns. 


Best Wishes 

Razi Zaidi 



- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Saturday, September 7, 2013 10:25 PM
Subject: Re: Subsetting isolating a group of values in a group of variables

Hi,
Using the same example:
str1-paste(colnames(dat1)[grepl(diag,colnames(dat1))],%in%,vec1,collapse=|)
 subset(dat1,eval(parse(text=str1)))
#  ID diag1 diag2 diag3 proc1 proc2 proc3
#2  2   k69   i80  u456  z456  z123  z456
#3  3   l91  i801  g678  u456  u123  u123
#4  4   i80   i90  h983  z123  z456  z456
lapply(split(dat2,dat2$diag),function(x) {x1- x$code; str1- 
paste(colnames(dat1)[grepl(diag,colnames(dat1))],%in%,x1,collapse=|); 
x2- subset(dat1,eval(parse(text=str1))); cbind(x2,Diag=x$diag)})
#$`Broken finger`
#  ID diag1 diag2 diag3 proc1 proc2 proc3  Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken finger
#2  2   k69   i80  u456  z456  z123  z456 Broken finger
#
#$`Broken foot`
#  ID diag1 diag2 diag3 proc1 proc2 proc3    Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken foot
#2  2   k69   i80  u456  z456  z123  z456 Broken foot
#
#$`Broken legs`
#  ID diag1 diag2 diag3 proc1 proc2 proc3    Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken legs
#3  3   l91  i801  g678  u456  u123  u123 Broken legs
#
#$`Broken rib`
#  ID diag1 diag2 diag3 proc1 proc2 proc3   Diag
#3  3   l91  i801  g678  u456  u123  u123 Broken rib
#4  4   i80   i90  h983  z123  z456  z456 Broken rib
#
#$`Broken toe`
#  ID diag1 diag2 diag3 proc1 proc2 proc3   Diag
#2  2   k69   i80  u456  z456  z123  z456 Broken toe
#3  3   l91  i801  g678  u456  u123  u123 Broken toe
#4  4   i80   i90  h983  z123  z456  z456 Broken toe

##You can also use a larger dataset:
set.seed(48)
dat1New- 
as.data.frame(matrix(sample(paste0(letters,sample(1:800,700,replace=TRUE)),90*1e5,replace=TRUE),ncol=90),stringsAsFactors=FALSE)
set.seed(185)
 dat2New- 
as.data.frame(matrix(sample(paste0(letters,sample(400:1200,700,replace=TRUE)),90*1e5,replace=TRUE),ncol=90),stringsAsFactors=FALSE)
 dat3- cbind(ID=1:1e5,dat1New,dat2New)
colnames(dat3)[-1]-c(paste0(diag,1:90),paste0(proc,1:90))

set.seed(1459)
Refdat- data.frame(code=unique(unlist(dat1New)), diag=sample(c(Broken 
finger,Broken toe, Broken legs, Broken foot, Broken rib, Broken 
nose, Broken elbow, Broken 
hip),length(unique(unlist(dat1New))),replace=TRUE),stringsAsFactors=FALSE)

res- lapply(split(Refdat,Refdat$diag),function(x) {x1- x$code; str1- 
paste(colnames(dat3)[grepl(diag,colnames(dat3))],%in%,x1,collapse=|); 
x2- subset(dat3,eval(parse(text=str1))) })

sapply(split(Refdat,Refdat$diag),function(x) {x1- x$code; str1- 
paste(colnames(dat3)[grepl(diag,colnames(dat3))],%in%,x1,collapse=|); 
x2- subset(dat3,eval(parse(text=str1)));nrow(x2) })
# Broken elbow Broken finger   Broken foot    Broken hip   Broken legs 
 #   7    10 4    10 7 
 # Broken nose    Broken rib    Broken toe 
  #  6 9    10 


A.K.






Thanks for the prompt reply arun, this really has helped. 

My actual data frame has diagnostic codes diag1, diag2 etc which are range from 
1 to 93. Is there any way to apply 
subset(dat1,diag1%in%vec1|diag2%in%vec1|diag3%in% vec1)  such that i can 
search many multiple columns in dat1 without specifying each column separately? 


- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Saturday, September 7, 2013 4:12 PM
Subject: Re: Subsetting isolating a group of values in a group of variables

Hi,

The expected output is not clear.
dat1- read.table(text=ID diag1 diag2 diag3 proc1 proc2 proc3
1 k23 i269 j123   u123  u456  u123
2 k69 i80 u456   z456  z123  z456
3 l91 i801 g678   u456  u123  u123
4 i80 i90 h983   z123  z456   z456,sep=,header=TRUE,stringsAsFactors=FALSE)

vec1- c(i80,i90,l91)


subset(dat1,diag1%in%vec1|diag2%in%vec1|diag3%in% vec1)
#  ID diag1 diag2 diag3 proc1 proc2 proc3
#2  2   k69   i80  u456  z456  z123  z456
#3  3   l91  i801  g678  u456  u123  u123
#4  4   i80   i90  h983  z123  z456  z456


##Creating another data frame with codes and diagnosis

dat2-data.frame(code=unique(unlist(dat1[,2:4])),diag=c(rep(Broken 
finger,2),rep(Broken toe,2),rep(Broken legs,2),Broken toe,rep(Broken 
foot,2),rep(Broken rib,2)),stringsAsFactors=FALSE)



 lst1- lapply(split(dat2,dat2$diag), function(x) {x1- x$code;x2- 

Re: [R] Subsetting isolating a group of values in a group of variables

2013-09-07 Thread arun
Hi,

The expected output is not clear.
dat1- read.table(text=ID diag1 diag2 diag3 proc1 proc2 proc3
1 k23 i269 j123   u123  u456  u123
2 k69 i80 u456   z456  z123  z456
3 l91 i801 g678   u456  u123  u123
4 i80 i90 h983   z123  z456   z456,sep=,header=TRUE,stringsAsFactors=FALSE)

vec1- c(i80,i90,l91)


subset(dat1,diag1%in%vec1|diag2%in%vec1|diag3%in% vec1)
#  ID diag1 diag2 diag3 proc1 proc2 proc3
#2  2   k69   i80  u456  z456  z123  z456
#3  3   l91  i801  g678  u456  u123  u123
#4  4   i80   i90  h983  z123  z456  z456


##Creating another data frame with codes and diagnosis

dat2-data.frame(code=unique(unlist(dat1[,2:4])),diag=c(rep(Broken 
finger,2),rep(Broken toe,2),rep(Broken legs,2),Broken toe,rep(Broken 
foot,2),rep(Broken rib,2)),stringsAsFactors=FALSE)



 lst1- lapply(split(dat2,dat2$diag), function(x) {x1- x$code;x2- 
subset(dat1,diag1%in%x1|diag2%in%x1|diag3%in%x1);cbind(x2,Diag=x$diag)})
 lst1
#$`Broken finger`
 # ID diag1 diag2 diag3 proc1 proc2 proc3  Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken finger
#2  2   k69   i80  u456  z456  z123  z456 Broken finger
#
#$`Broken foot`
 # ID diag1 diag2 diag3 proc1 proc2 proc3    Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken foot
#2  2   k69   i80  u456  z456  z123  z456 Broken foot
#
#$`Broken legs`
 # ID diag1 diag2 diag3 proc1 proc2 proc3    Diag
#1  1   k23  i269  j123  u123  u456  u123 Broken legs
#3  3   l91  i801  g678  u456  u123  u123 Broken legs

#$`Broken rib`
 # ID diag1 diag2 diag3 proc1 proc2 proc3   Diag
#3  3   l91  i801  g678  u456  u123  u123 Broken rib
#4  4   i80   i90  h983  z123  z456  z456 Broken rib

#$`Broken toe`
#  ID diag1 diag2 diag3 proc1 proc2 proc3   Diag
#2  2   k69   i80  u456  z456  z123  z456 Broken toe
#3  3   l91  i801  g678  u456  u123  u123 Broken toe
#4  4   i80   i90  h983  z123  z456  z456 Broken toe


A.K.


Hello. 

I have date frame structured like this: 

ID  diag1 diag2 diag3 proc1 proc2 proc3 
1   k23 i269 j123  u123  u456  u123 
2   k69 i80  u456   z456  z123  z456 
3   l91 i801 g678   u456  u123  u123 
4   i80 i90  h983   z123  z456   z456 

Each observation has a group of diagnostics codes(diag) and procedure 
codes(proc). 

A single diagnosis maybe be described by more than one code eg broken toe maybe 
coded for by i80,i90,l91 or more. 

My aim to subset all rows with any of the codes representing a 
single diagnosis. So i would like to use multiple values (i80,i90,l91= 
broken toe) applied to specific columns, ie diag1,2 and 3 to isolate 
those rows which contain any of the specified codes. 

Your help would be greatly appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   >