Re: [R] Need a vectorized way to avoid two nested FOR loops

2009-10-08 Thread Rama Ramakrishnan

Bert, Jim, Dimitris and Joris,

Thank you all very much for your prompt help and suggestions.

After trying the ideas out, I have decided to go with Bert's approach  
since it is by far the fastest of the lot.


Thanks again!

Rama Ramakrishnan


On Oct 8, 2009, at 12:49 PM, Bert Gunter wrote:



If I understand your intent, I believe you can get what you want  
much faster

(no interpreted loops and linear times) by looking at this slightly
differently.

First of all, the choice of columns is unimportant, as indexing can  
be used
to create a data frame containing only the columns of interest. So I  
think
you can abstract your request to: group the rows of a data frame so  
that all
rows in a group "match."  Now the problem here is exactly what you  
mean by
"match." If the data are numeric, finite precision arithmetic  
requires one
to ask whether you mean  **exactly equal** or just equal within a  
tolerance.
I shall assume the former, but the latter is often what one wants.  
It is a

little more difficult to handle, but one way to do it with the present
approach is to first round to a few digits that represent the  
tolerance and

then proceed with the rounded values.

As always (and as recommended by the posting guide !) a small  
reproducible

example is helpful:

## Create a data frame with groups of identical rows.

z <- data.frame(matrix(rnorm(60),ncol=3))[sample(20,50,repl=TRUE),]

## now create a factor column of "id's" in which identical columns
## have identical id's (a hash)

id <- factor(do.call(paste,c(z,sep="+")))

## The levels of the factors now "index" groups of rows that "match"
## They can be easily accessed in a variety of way, e.g.

as.numeric(id)
## gives all rows of each group of matching rows
## the same integer index.

etc.
This all requires only linear time.

Hope this helps -- or my apologies if I have misinterpreted what was
requested.


Bert Gunter
Genentech Nonclinical Biostatistics



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- 
project.org] On

Behalf Of Dimitris Rizopoulos
Sent: Thursday, October 08, 2009 6:28 AM
To: joris meys
Cc: r-help@r-project.org; Rama Ramakrishnan
Subject: Re: [R] Need a vectorized way to avoid two nested FOR loops

Another approach is:

n <- 20
set.seed(2)
x <- as.data.frame(matrix(sample(1:2, n*6, TRUE), nrow = n))
x.col <- c(1, 3, 5)

values <- do.call(paste, c(x[x.col], sep = "\r"))
out <- lapply(seq_along(ind), function (i) {
ind <- which(values == values[i])
ind[!ind %in% i]
})
out


Best,
Dimitris


joris meys wrote:
Neat piece of code, Jim, but it still uses a nested loop. If you  
order

the matrix first, you only need one passage through the whole matrix
to find the information you need.

Off course I don't take into account the ordering. If the ordering
algorithm doesn't work in linear time, then it doesn't really  
matter I

guess. The limiting step would become the ordering algorithm.

Kind regards
Joris



On Thu, Oct 8, 2009 at 2:24 PM, jim holtman   
wrote:

I answered the wrong question.  Here is the code to find all the
matches for each row:

n <- 20
set.seed(2)
# create test dataframe
x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
x.col <- c(1,3,5)

# match against all the other rows
x.match1 <- apply(x[, x.col], 1, function(a){
  .mat <- which(apply(x[, x.col], 1, function(z){
  all(a == z)
  }))
})

# remove matches to itself
x.match2 <- lapply(seq(length(x.match1)), function(z){
  x.match1[[z]][!(x.match1[[z]] %in% z)]
})
# x.match2 contains which rows indices match










On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan  


wrote:

Hi Friends,

I have a data frame d. Let vars be the column indices for a  
subset of

the

columns in d (e.g., vars <- c(1,3,4,8))

For each row r in d, I want to collect all the other rows in d that

match

the values in row r for just the columns in vars.

The naive way to do this is to have a for loop stepping through  
each row

in

d, and within the loop have another loop going through all the rows

again,
checking for equality. This is quadratic in the number of rows  
and takes

way

too long. Is there a better, "vectorized" way to do this?

Thanks in advance!

Rama Ramakrishnan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

a

[R] Need a vectorized way to avoid two nested FOR loops

2009-10-07 Thread Rama Ramakrishnan


Hi Friends,

I have a data frame d. Let vars be the column indices for a subset of  
the columns in d (e.g., vars <- c(1,3,4,8))


For each row r in d, I want to collect all the other rows in d that  
match the values in row r for just the columns in vars.


The naive way to do this is to have a for loop stepping through each  
row in d, and within the loop have another loop going through all the  
rows again, checking for equality. This is quadratic in the number of  
rows and takes way too long. Is there a better, "vectorized" way to do  
this?


Thanks in advance!

Rama Ramakrishnan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
Thanks, Gabor. Works great!


On Thu, Jun 25, 2009 at 10:38 AM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> Try this (shown for stated problem but generalizes by just adding
> additional arguments):
>
> mapply("[", list(x), ltrs, mnths)
>
>
> On Thu, Jun 25, 2009 at 10:24 AM, Rama Ramakrishnan
> wrote:
> > Follow-on question: is there a way to do this for higher-dimensional
> (i.e.
> > more than 2 dimensions) arrays?
> >
> >
> > On Thu, Jun 25, 2009 at 10:17 AM, Rama Ramakrishnan  >wrote:
> >
> >> That works!! Very nice way to do it! Thank you, Henrique!
> >> Rama Ramakrishnan
> >>
> >>
> >> On Thu, Jun 25, 2009 at 10:11 AM, Henrique Dallazuanna <
> www...@gmail.com>wrote:
> >>
> >>> Try this:
> >>>
> >>> y$values <- diag(x[y$ltrs, y$mnths])
> >>>
> >>> On Thu, Jun 25, 2009 at 11:02 AM, Rama Ramakrishnan  >wrote:
> >>>
> >>>> Dear R-Users,
> >>>> I need to lookup values from a 2-d table using the row names and
> column
> >>>> names as indices. I was wondering if there's a way to do this without
> an
> >>>> explicit loop.
> >>>>
> >>>> Example:
> >>>> #x is the 2-d table that holds the values
> >>>>
> >>>> x <- matrix(rnorm(26*12),nrow=26)
> >>>>
> >>>> rownames(x) <- letters
> >>>>
> >>>> colnames(x) <- month.name
> >>>>
> >>>>
> >>>> #y is a data frame that has the "keys" I want to use as indices into x
> >>>>
> >>>> y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name
> >>>> ,5),values=0)
> >>>>
> >>>>
> >>>> #I want to fill in the "values" column using the "ltrs" and "mnths"
> >>>> columns
> >>>> as keys to look up
> >>>>
> >>>> # the associated value from x
> >>>>
> >>>> #One way to do this is with a FOR loop
> >>>>
> >>>> for (i in 1:nrow(y)) {y$val[i] <- x[y$ltrs[i],y$mnths[i]]}
> >>>>
> >>>>
> >>>> My question: Is there a more efficient way (e.g., one without using an
> >>>> explicit loop) to do this?
> >>>>
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>>
> >>>> -Rama Ramakrishnan
> >>>>
> >>>>[[alternative HTML version deleted]]
> >>>>
> >>>> __
> >>>> R-help@r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Henrique Dallazuanna
> >>> Curitiba-Paraná-Brasil
> >>> 25° 25' 40" S 49° 16' 22" O
> >>>
> >>
> >>
> >
> >[[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
Thanks, David, that works too!


On Thu, Jun 25, 2009 at 10:30 AM, David Winsemius wrote:

>
> On Jun 25, 2009, at 10:24 AM, Rama Ramakrishnan wrote:
>
>  Follow-on question: is there a way to do this for higher-dimensional (i.e.
>> more than 2 dimensions) arrays?
>>
>
> The apply method I just posted generalizes to higher dimensional arrays.
>
> --
> DW
>
>
>>
>> On Thu, Jun 25, 2009 at 10:17 AM, Rama Ramakrishnan > >wrote:
>>
>>  That works!! Very nice way to do it! Thank you, Henrique!
>>> Rama Ramakrishnan
>>>
>>>
>>> On Thu, Jun 25, 2009 at 10:11 AM, Henrique Dallazuanna >> >wrote:
>>>
>>>  Try this:
>>>>
>>>> y$values <- diag(x[y$ltrs, y$mnths])
>>>>
>>>> On Thu, Jun 25, 2009 at 11:02 AM, Rama Ramakrishnan >>> >wrote:
>>>>
>>>>  Dear R-Users,
>>>>> I need to lookup values from a 2-d table using the row names and column
>>>>> names as indices. I was wondering if there's a way to do this without
>>>>> an
>>>>> explicit loop.
>>>>>
>>>>> Example:
>>>>> #x is the 2-d table that holds the values
>>>>>
>>>>> x <- matrix(rnorm(26*12),nrow=26)
>>>>>
>>>>> rownames(x) <- letters
>>>>>
>>>>> colnames(x) <- month.name
>>>>>
>>>>>
>>>>> #y is a data frame that has the "keys" I want to use as indices into x
>>>>>
>>>>> y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name
>>>>> ,5),values=0)
>>>>>
>>>>>
>>>>> #I want to fill in the "values" column using the "ltrs" and "mnths"
>>>>> columns
>>>>> as keys to look up
>>>>>
>>>>>  snip
>>>>
>>>
>>>
>>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
Follow-on question: is there a way to do this for higher-dimensional (i.e.
more than 2 dimensions) arrays?


On Thu, Jun 25, 2009 at 10:17 AM, Rama Ramakrishnan wrote:

> That works!! Very nice way to do it! Thank you, Henrique!
> Rama Ramakrishnan
>
>
> On Thu, Jun 25, 2009 at 10:11 AM, Henrique Dallazuanna 
> wrote:
>
>> Try this:
>>
>> y$values <- diag(x[y$ltrs, y$mnths])
>>
>> On Thu, Jun 25, 2009 at 11:02 AM, Rama Ramakrishnan wrote:
>>
>>> Dear R-Users,
>>> I need to lookup values from a 2-d table using the row names and column
>>> names as indices. I was wondering if there's a way to do this without an
>>> explicit loop.
>>>
>>> Example:
>>> #x is the 2-d table that holds the values
>>>
>>> x <- matrix(rnorm(26*12),nrow=26)
>>>
>>> rownames(x) <- letters
>>>
>>> colnames(x) <- month.name
>>>
>>>
>>> #y is a data frame that has the "keys" I want to use as indices into x
>>>
>>> y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name
>>> ,5),values=0)
>>>
>>>
>>> #I want to fill in the "values" column using the "ltrs" and "mnths"
>>> columns
>>> as keys to look up
>>>
>>> # the associated value from x
>>>
>>> #One way to do this is with a FOR loop
>>>
>>> for (i in 1:nrow(y)) {y$val[i] <- x[y$ltrs[i],y$mnths[i]]}
>>>
>>>
>>> My question: Is there a more efficient way (e.g., one without using an
>>> explicit loop) to do this?
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>> -Rama Ramakrishnan
>>>
>>>[[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
That works!! Very nice way to do it! Thank you, Henrique!
Rama Ramakrishnan

On Thu, Jun 25, 2009 at 10:11 AM, Henrique Dallazuanna wrote:

> Try this:
>
> y$values <- diag(x[y$ltrs, y$mnths])
>
> On Thu, Jun 25, 2009 at 11:02 AM, Rama Ramakrishnan wrote:
>
>> Dear R-Users,
>> I need to lookup values from a 2-d table using the row names and column
>> names as indices. I was wondering if there's a way to do this without an
>> explicit loop.
>>
>> Example:
>> #x is the 2-d table that holds the values
>>
>> x <- matrix(rnorm(26*12),nrow=26)
>>
>> rownames(x) <- letters
>>
>> colnames(x) <- month.name
>>
>>
>> #y is a data frame that has the "keys" I want to use as indices into x
>>
>> y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name
>> ,5),values=0)
>>
>>
>> #I want to fill in the "values" column using the "ltrs" and "mnths"
>> columns
>> as keys to look up
>>
>> # the associated value from x
>>
>> #One way to do this is with a FOR loop
>>
>> for (i in 1:nrow(y)) {y$val[i] <- x[y$ltrs[i],y$mnths[i]]}
>>
>>
>> My question: Is there a more efficient way (e.g., one without using an
>> explicit loop) to do this?
>>
>>
>> Thanks in advance!
>>
>>
>> -Rama Ramakrishnan
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
Resending after fixing a mistake in the earlier email ... sorry for the
confusion.

**
Dear R-Users,
I need to lookup values from a 2-d table using the row names and column
names as indices. I was wondering if there's a way to do this without an
explicit loop.

Example:
#x is the 2-d table that holds the values

x <- matrix(rnorm(26*12),nrow=26)

rownames(x) <- letters

colnames(x) <- month.name


#y is a data frame that has the "keys" I want to use as indices into x

y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name,5),values=0)


#I want to fill in the "values" column using the "ltrs" and "mnths" columns
as keys to look up

# the associated value from x

#One way to do this is with a FOR loop


for (i in 1:nrow(y)) {y$values[i] <- x[as.character(y$ltrs[i]),as.character(
y$mnths[i])]}


My question: Is there a more efficient way (e.g., one without using an
explicit loop) to do this?


Thanks in advance!


-Rama Ramakrishnan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient lookup on a two-dimensional table

2009-06-25 Thread Rama Ramakrishnan
Dear R-Users,
I need to lookup values from a 2-d table using the row names and column
names as indices. I was wondering if there's a way to do this without an
explicit loop.

Example:
#x is the 2-d table that holds the values

x <- matrix(rnorm(26*12),nrow=26)

rownames(x) <- letters

colnames(x) <- month.name


#y is a data frame that has the "keys" I want to use as indices into x

y <- data.frame(ltrs=sample(letters,5),mnths=sample(month.name,5),values=0)


#I want to fill in the "values" column using the "ltrs" and "mnths" columns
as keys to look up

# the associated value from x

#One way to do this is with a FOR loop

for (i in 1:nrow(y)) {y$val[i] <- x[y$ltrs[i],y$mnths[i]]}


My question: Is there a more efficient way (e.g., one without using an
explicit loop) to do this?


Thanks in advance!


-Rama Ramakrishnan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.