Re: [R] Regular expressions: offsets of groups

2010-09-30 Thread Titus von der Malsburg
Ok, we decided to have a shot at modifying gregexpr.  Let's see how it
works out.  If anybody is interested in discussing this please contact
me.  R-help doesn't seem like the right place for further discussion.
Is there a default place for discussing things like that?

Thanks everybody for your responses!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
Bill, Michael,

good to see I'm not the only one who sees potential for improvements
in the regexpr domain.  Adding a subpattern argument is certainly a
step in the right direction and would make my life much easier.
However, in my application I need to know not only the position of one
group but also the position of the overall match in the original
string.  The ideal solution would provide positions and match lengths
for the whole pattern and for all groups if desired.  Only this would
solve all related issues.  One possibility is to have a subpattern
argument that accepts a vector of numbers (0 refers to the whole
pattern):

   gregexpr(a+(b+), abcdaabbc, subpattern=c(0,1))
 [[1]]:
 [[1]][[1]]:
 [1] 1 5
 attr(, match.length):
 [1] 2 4
 [[1]][[2]]:
 [1] 2 7
 attr(, match.length):
 [1] 1 2

A weakness of this solution is that the structure of the return values
changes if length(subpattern)1.  An alternative is to have a separate
function, say ggregepxr for group gregexpr, that returns a list of
lists as in the above example.  This function would always return
positions and match lengths of the whole pattern (group 0) and all
groups.  The original gregexpr could still have the subpattern
argument but it would only accept single numbers.  This way the return
format of gregexpr remains the same.

Best,

  Titus


On Wed, Sep 29, 2010 at 2:42 AM, Michael Bedward
michael.bedw...@gmail.com wrote:
 Ah, that's interesting - thanks Bill. That's certainly on the right
 track for me (Titus, you too ?) especially if the subpattern argument
 accepted a vector of multiple group indices.

 As you say, this is straightforward in C. I'd be happy to (try to)
 make a patch for the R sources if there was some consensus on the best
 way to implement it, ie. as a new R function or by extending existing
 function(s).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
On Wed, Sep 29, 2010 at 1:58 PM, Michael Bedward
michael.bedw...@gmail.com wrote:
 How is your C coding ? Bill ? Anyone else ?  I could have a got at
 writing some prototype code to test in the next few days, though if
 someone else with decent C skills is itching to do it please speak up.

We have a skilled C- and R-programmer who could work on it. I'll talk to him.

   Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Titus von der Malsburg
On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward
michael.bedw...@gmail.com wrote:
 What Titus wants to do is akin to retrieving capturing groups from a
 Matcher object in Java.

Precisely.  Here's the description:

  
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int)

Gabor's lookbehind trick solves some special cases but it's not the
kind of general solution I'm looking for.  Let me explain what I'm
trying to achieve here.  I'm working on a package that provides tools
for processing and analyzing eye movements (we're doing reading
research).  In most situations, eye movements consist of fixations
where the eyes are relatively stationary and saccades, quick movements
between fixations.  A common way to represent eye movements is as
strings of symbols, where each symbol corresponds to a fixation on a
particular region.  AABC means two fixations followed by a fixation on
B and then C.  When people analyze eye movements it's often necessary
to find specific events in the eye movement record like: fixations on
the word C preceded by fixations on words D-F and followed by
fixations on words A-C.  This event can be specified using this
regexpr: [D-F]+(C)[A-C]+  The group (in parenthesis) indicates the
substring for which I'd like to know the position in the overall
string.  Another application is the extraction of subsequences from a
sequence of fixations.  Note that in some situations people might have
to use more groups in their regexprs and that groups can be nested.
In this case the user would have to indicate for which group he/she
wants to know the offset.  I'm not an expert for regexpr engines but
I'm pretty sure the necessary information is available in the engine.

Gabor, I see you're the author of gsubfn (fantastic package!).  Do you
see a relatively simple way to expose information about group offsets
and their corresponding match lengths?  I think this could be useful
for other applications as well.  At least it seems Michael could use
it, too.  We can cook up something for ourselves but a general
solution would benefit the larger community.

   Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Dear list!

 gregexpr(a+(b+), abcdaabbc)
[[1]]
[1] 1 5
attr(,match.length)
[1] 2 4

What I want is the offsets of the matches for the group (b+), i.e. 2
and 7, not the offsets of the complete matches.  Is there a way in R
to get that?

I know about gsubgn and strapply, but they only give me the strings
matched by groups not their offsets.

I could write something myself that first takes the above matches
(ab and aabb) and then searches again using only the group (b+).
For this to work, I'd have to parse the regular expression and search
several times ( 2, for nested groups) instead of just once.  But I'm
sure there is a better way to do this.

Thanks for any suggestion!

   Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Thank you Jim, but just as the solution that I discussed, your
proposal involves deconstructing the pattern and searching several
times.  I'm looking for a general and efficient solution.  Internally,
the regexpr engine has all necessary information after one pass
through the string.  What I need is an interface that exposes this
information.

  Titus

On Mon, Sep 27, 2010 at 6:43 PM, jim holtman jholt...@gmail.com wrote:
 try this:

 x -  gregexpr(a+(b+), abcdaabbcaaacaaab)
 justA -  gregexpr(a+, abcdaabbcaaacaaab)
 # find matches in 'x' for 'justA'
 indx - which(justA[[1]] %in% x[[1]])
 # now determine where 'b' starts
 justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx]
 [1]  2  7 17



 On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
 malsb...@gmail.com wrote:
 Dear list!

 gregexpr(a+(b+), abcdaabbc)
 [[1]]
 [1] 1 5
 attr(,match.length)
 [1] 2 4

 What I want is the offsets of the matches for the group (b+), i.e. 2
 and 7, not the offsets of the complete matches.  Is there a way in R
 to get that?

 I know about gsubgn and strapply, but they only give me the strings
 matched by groups not their offsets.

 I could write something myself that first takes the above matches
 (ab and aabb) and then searches again using only the group (b+).
 For this to work, I'd have to parse the regular expression and search
 several times ( 2, for nested groups) instead of just once.  But I'm
 sure there is a better way to do this.

 Thanks for any suggestion!

   Titus

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna www...@gmail.com wrote:
 You've tried:

 gregexpr(b+, abcdaabbc)

But this would match the third occurrence of b+ in abcdaabbcbb.  But
in this example I'm only interested in b+ if it's preceded by a+.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Try this zero width negative look behind expression:

 gregexpr((?!a+)(b+), abcdaabbc, perl = TRUE)
 [[1]]
 [1] 2 7
 attr(,match.length)
 [1] 1 2

Thanks Gabor, but this gives me the same result as

  gregexpr(b+, abcdaabbc, perl = TRUE)

which is wrong if the string is abcdaabbcbbb.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg
Hi list,

I run R on Linux and OSX.  On both systems I use R version 2.9.2 (2009-08-24)
and reshape version: 0.8.2 (2008-11-04).  When I do a melt with
na.rm=T on a data frame I get different results on these systems:

library(reshape)

x - read.table(textConnection(char trial wn
p E10I13D0  4
r E10I13D0  4
a E10I13D0  4
c E10I13D0  4
t E10I13D0  4
i E10I13D0  4
c E10I13D0  4
e E10I13D0  4
d E10I13D0  4
, E10I13D0 NA), head=T)

melt(x, measure.vars=char, na.rm=T)

On Linux I get:

  1 E10I13D0  4 char p
  2 E10I13D0  4 char r
  3 E10I13D0  4 char a
  4 E10I13D0  4 char c
  5 E10I13D0  4 char t
  6 E10I13D0  4 char i
  7 E10I13D0  4 char c
  8 E10I13D0  4 char e
  9 E10I13D0  4 char d

But on OSX I get:

  1  E10I13D0  4 char p
  2  E10I13D0  4 char r
  3  E10I13D0  4 char a
  4  E10I13D0  4 char c
  5  E10I13D0  4 char t
  6  E10I13D0  4 char i
  7  E10I13D0  4 char c
  8  E10I13D0  4 char e
  9  E10I13D0  4 char d
  10 E10I13D0 NA char ,


What's causing this glitch?  Is there a simple way to subset lines
that do not have any NAs?  I'm looking for a line that I can use for
all data.frames without modification.

As always: thanks a lot!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg
On Sat, Feb 6, 2010 at 8:23 PM, hadley wickham h.wick...@gmail.com wrote:
 The latest version of reshape is 0.8.3 - perhaps upgrading will fix
 your problem.

Thanks for your response, Hadley!

I just did the upgrade on the Linux system.  On OSX I was already at
0.8.3.  Now, I get the same result on both systems.  However, the
result includes the NAs although I said na.rm=T:

library(reshape)

x - read.table(textConnection(char trial wn
p E10I13D0  4
r E10I13D0  4
a E10I13D0  4
c E10I13D0  4
t E10I13D0  4
i E10I13D0  4
c E10I13D0  4
e E10I13D0  4
d E10I13D0  4
, E10I13D0 NA), head=T)

melt(x, measure.vars=char, na.rm=T)
  trial wn variable value
1  E10I13D0  4 char p
2  E10I13D0  4 char r
3  E10I13D0  4 char a
4  E10I13D0  4 char c
5  E10I13D0  4 char t
6  E10I13D0  4 char i
7  E10I13D0  4 char c
8  E10I13D0  4 char e
9  E10I13D0  4 char d
10 E10I13D0 NA char ,

The documentation says na.rm: Should NA values be removed from the
data set?.  Do I get something wrong?

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg
Ok, I studied the source code of melt.data.frame.  With na.rm=T melt
operates normally except that it deletes rows from the molten
data.frame that have NAs in the value column.  NAs in the id.vars are
not touched.  This could be clearer in the documentation especially as
it seems that earlier versions of reshape behaved differently.

Best,

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating rho of Poisson distributed data

2010-01-15 Thread Titus von der Malsburg
Mean and variance of Poisson distributed data are specified by \rho.
How can I estimate \rho for a set of measurements in R?

Many thanks!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] estimating rho of Poisson distributed data

2010-01-15 Thread Titus von der Malsburg
On Fri, Jan 15, 2010 at 09:19:23AM -0500, David Winsemius wrote:

 On Jan 15, 2010, at 5:59 AM, Titus von der Malsburg wrote:

 Mean and variance of Poisson distributed data are specified by \rho.
 How can I estimate \rho for a set of measurements in R?

 rho - mean(x)

Yeah, thanks :-)  I was looking for a general way to fit a
distribution.  Should've made that clear.  I'm surprised that nobody
is complaining because I called lambda rho!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] estimating rho of Poisson distributed data

2010-01-15 Thread Titus von der Malsburg
On Fri, Jan 15, 2010 at 08:33:52AM -0700, Peter Ehlers wrote:
 Why would anyone complain? You're free to call it 'applesauce'
 if that suits you.

Good idea, I will do this from now on!  ;-)

 What do you mean by 'general way to fit a distribution'?
 Maximum likelihood might be one way.

Somebody else pointed me to fitdistr from MASS which does the job.

Thanks for the reply!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-09 Thread Titus von der Malsburg
On Wed, Dec 9, 2009 at 12:11 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Here are a couple of solutions.  The first uses by and the second sqldf:

Brilliant!  Now I have a whole collection of solutions.  I did a simple
performance comparison with a data frame that has 7929 lines.

The results were as following (loading appropriate packages is not included in
the measurements):

 times - c(0.248, 0.551, 41.080, 0.16, 0.190)
 names(times) - c(aggregate,summaryBy,by+transform,sqldf,tapply)
 barplot(times, log=y, ylab=log(s))

So sqldf clearly wins followed by tapply and aggregate.  summaryBy is slower
than necessary because it computes for x and dur both, mean /and/ sum.
by+transform presumably suffers from the contruction of many intermediate data
frames.

Are there any canonical places where R-recipes are collected?  If yes I would
write-up a summary.

These were the competitors:

 # Gary's and Nikhil's aggregate solution:

 aggregate.fixations1 - function(d) {

   idx  - c(TRUE,diff(d$roi)!=0)
   d2 - d[idx,]

   idx  - cumsum(idx)
   d2$dur - aggregate(d$dur, list(idx), sum)[2]
   d2$x   - aggregate(d$x, list(idx), mean)[2]

   d2
 }

 # Marek's symmaryBy:

 library(doBy)

 aggregate.fixations2 - function(d) {

   idx  - c(TRUE,diff(d$roi)!=0)
   d2 - d[idx,]

   d$idx  - cumsum(idx)
   d2$r - summaryBy(dur+x~idx, data=d, FUN=c(sum,
mean))[c(dur.sum, x.mean)]
   d2
 }

 # Gabor's by+transform solution:

 aggregate.fixations3 - function(d) {

   idx  - cumsum(c(TRUE,diff(d$roi)!=0))

   d2 - do.call(rbind, by(d, idx, function(x)
 transform(x, dur = sum(dur), x = mean(x))[1,,drop = FALSE ]))

   d2
 }

 # Gabor's sqldf solution:

 library(sqldf)

 aggregate.fixations4 - function(d) {

   idx  - c(TRUE,diff(d$roi)!=0)
   d2 - d[idx,]

   d$idx  - cumsum(idx)
   d2$r - sqldf(select sum(dur), avg(x) x from d group by idx)

   d2
 }

 # Titus' solution using plain old tapply:

 aggregate.fixations5 - function(d) {

   idx  - c(TRUE,diff(d$roi)!=0)
   d2 - d[idx,]

   idx  - cumsum(idx)
   d2$dur - tapply(d$dur, idx, sum)
   d2$x - tapply(d$x, idx, mean)

   d2
 }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] general question about functions

2009-12-08 Thread Titus von der Malsburg
http://www.rseek.org/

It is particularly useful to search the mailing list archives of
r-help with rSeek. No matter what kind of problem you have, somebody
has had it before and asked on r-help.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Titus von der Malsburg
Hi, I have a data frame and want to merge adjacent rows if some condition is
met.  There's an obvious solution using a loop but it is prohibitively slow
because my data frame is large.  Is there an efficient canonical solution for
that?

 head(d)
 rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 209   4  subj   4  9
58 5523 188   4  subj   4  7
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

The desired result would have consecutive rows with the same roi value merged.
dur values should be added and x values averaged, other values don't differ in
these rows and should stay the same.

 head(result)
 rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 397   4  subj   4  8
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

There's also a solution using reshape.  It uses an index for blocks

  d$index - cumsum(c(TRUE,diff(d$roi)!=0))

melts and then casts for every column using an appropriate fun.aggregate.
However, this is a bit cumbersome and also I'm not sure how to make sure that
I get the original order of rows.

Thanks for any suggestion.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Titus von der Malsburg
On Tue, Dec 8, 2009 at 4:50 PM, Gray Calhoun gray.calh...@gmail.com wrote:
 I think there might be a problem with this approach if roi, tid, rt,
 and mood are the same for nonconsecutive rows.

True, but I can use the index of my reshape solution. Aggregate was
the crucial ingredient.  Thanks both!

For the record, this is the full solution:

head(d)
rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 209   4  subj   4  9
58 5523 188   4  subj   4  7
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

index  - c(TRUE,diff(d$roi)!=0)
d2 - d[index,]

index  - cumsum(index)
d2$dur - aggregate(d$dur, list(index=index), sum)[2]
d2$x   - aggregate(d$x, list(index=index), mean)[2]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Titus von der Malsburg
On Tue, Dec 8, 2009 at 5:19 PM, Nikhil Kaza nikhil.l...@gmail.com wrote:
 I suppose that is true, but the example data seem to suggest that it is
 sorted by rt.

I was not very clear on that.  Sorry.

 d$count - 1
  a - with(d, aggregate(subset(d, select=c(dur, x, count),
 list(rt=rt,tid=tid,mood=mood,roi=roi), sum))
 a$x - a$x/a$count

This is neat!

 But it would still be nice to get a generic way that uses different
 functions on different columns much like excel's pivot table.

I guess the most straight-forward thing would be to extend aggregate
to also accept instead of a FUN a list of FUNs where the first is
applied to value of the first column of x (the data frame), the second
to the second column, and so on.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave and license message when loading mclust package

2009-12-07 Thread Titus von der Malsburg
When loading mclust, it shows a license agreement message.  This
message shows up in my document when I use Sweave.  I did the
following:

echo=FALSE,include=FALSE=
  library(mclust)
@

Is this a problem with mclust, Sweave or with me?  How can it be fixed?

Thanks for any suggestions!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg

Hi list!

An operation that I often need is splicing two vectors:

   splice(1:3, 4:6)
  [1] 1 4 2 5 3 6

For numeric vectors I use this hack:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
return(xy)
  }

So far, so good (?).  But I also need splicing for factors and I tried
this:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
if (is.factor(x)  is.factor(y)) {
  xy - as.factor(xy)
  levels(xy) - levels(x)
}
return(xy)
  }

This, however, doesn't work because the level name to integer mapping
gets mixed up when copying the levels from x to xy.

My questions:

 1.) How can this be fixed?
 2.) What's the best way to do splicing of vectors and factors in R?
 (I couldn't find a prefdefined function for this although it seems to be
 such a basic and useful operation.)

Thanks!!

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg
On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
 For factors, you better convert them first back to character strings.
 
   splice - function(x, y) {
   x - levels(x)[x]
   y - levels(y)[y]
   factor(as.vector(rbind(x, y)))
   } 

Thank you very much, Thierry!

I failed to mention something important in my last mail: x and y have
the same levels.  (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)

Say

 x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d))

then

 x
[1] b b d d
Levels: a b c d

 as.integer(x)
[1] 2 2 4 4

but

 splice(x,x)
[1] b b b b d d d d
Levels: b d

 as.integer(splice(x,x))
[1] 1 1 1 1 2 2 2 2

I'd like to have a splice function that retains the level to label
mapping.  One candidate for a solution is:

splice - function(x,y) {
  xy - as.vector(rbind(x, y))
  if (is.factor(x)  is.factor(y))
xy - factor(xy, levels=1:length(levels(x)), labels=levels(x))
  xy
}

However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels).  levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).

Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.

I'd be happy if somebody could clarify this issue!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R command to join data.frames rows with identical keys?

2009-06-09 Thread Titus von der Malsburg
Have a look at the merge function.

  Merge two data frames by common columns or row names, or do other
 versions of database _join_ operations.


  Titus

On Tue, Jun 09, 2009 at 05:48:05AM -0700, Jason Rupert wrote:
 
 I've got two data.frames and, when certain keys match,  I would like to add 
 the column values from one data frame to the other data.frame.
 
 Below I list the two data.frames, i.e. neighborhoodInfo_df, and 
 schoolZone_df.  Based on the address key I would like to add the 
 schoolZone key to the neighborhoodInfo_df data.frame.  
 
 By any chance is there an R command to accomplish this in one or two steps.  
 I think I could do this in a for loop or something, but think there is 
 might be another way in R to accomplish it smarter.  
 
 Thanks for any info.  
 
 
 neighborhoodInfo1_df-data.frame(address-c(101),squareFootage-c(2000),lotsize-c(0.75))
 
 neighborhoodInfo2_df-data.frame(address-c(108),squareFootage-c(3000), 
 lotsize-c(1.25))
 
 neighborhoodInfo_df-rbind(neighborhoodInfo1_df, neighborhoodInfo2_df)
   
   
 schoolZone1_df-data.frame(address-c(101), schoolZone-c(Sherman))
 
 schoolZone2_df-data.frame(address-c(108), schoolZone-c(Baker))   
 
 
 schoolZone_df-rbind(schoolZone1_df, schoolZone2_df)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R command to join data.frames rows with identical keys?

2009-06-09 Thread Titus von der Malsburg


An example:

 schoolZone1_df - data.frame(address=101, schoolZone=Sherman)
 schoolZone2_df - data.frame(address=108, schoolZone=Baker)
 schoolZone_df - rbind(schoolZone1_df, schoolZone2_df)
 schoolZone_df
  address schoolZone
1 101Sherman
2 108  Baker
 neighborhoodInfo1_df - data.frame(address=101, squareFootage=2000, 
 lotsize=0.75)
 neighborhoodInfo2_df - data.frame(address=108, squareFootage=3000, 
 lotsize=1.25)
 neighborhoodInfo_df - rbind(neighborhoodInfo1_df, neighborhoodInfo2_df)
 neighborhoodInfo_df
  address squareFootage lotsize
1 101  20000.75
2 108  30001.25
 merge(schoolZone_df, neighborhoodInfo_df, by=address)
  address schoolZone squareFootage lotsize
1 101Sherman  20000.75
2 108  Baker  30001.25


  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is it possible to combine multiple barplots?

2009-06-09 Thread Titus von der Malsburg

You can use barchart in package lattice.  Here's a rough sketch:

library(lattice)

dataA - rep(1:4, c(3,2,2,4))
dataB - rep(1:4, c(5,4,3,2))

da - data.frame(table(dataA))
db - data.frame(table(dataB))

da$cond - a
db$cond - b

colnames(da)[1] - data
colnames(db)[1] - data

d - rbind(da, db)
barchart(Freq~data|cond, d)

Titus

On Tue, Jun 09, 2009 at 04:23:32PM +0200, Philipp Schmidt wrote:
 i am working with two sets of likert scale type (4 distinct values) data:
 
 dataA - rep(1:4, c(3,2,2,4))
 dataB - rep(1:4, c(5,4,3,2))
 
 i can now (bar)plot both of these separately and compare the distributions.
 
 plot(table(dataA), type='h')
 plot(table(dataB), type='h')
 
 is there a way to plot both of them in one plot, so that the bars for
 value 1 (dataA: 3, dataB: 5) would appear side by side, followed by
 the bars for value 2 etc.?
 
 thanks!
 
 best - P
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is it possible to combine multiple barplots?

2009-06-09 Thread Titus von der Malsburg
On Tue, Jun 09, 2009 at 04:39:29PM +0200, Titus von der Malsburg wrote:
  is there a way to plot both of them in one plot, so that the bars for
  value 1 (dataA: 3, dataB: 5) would appear side by side, followed by
  the bars for value 2 etc.?

Oh, I see you want something different.  I should've read your message
more closely.

I found this example in the gallery:

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=54

Maybe it's close enough to what you want to do.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg
On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote:
 This may seem like a minor point, but I think it is worthwhile using
 descriptive names for functions.

Makes sense.  I thought I've seen this use somewhere else (probably in
Lisp?).  What better name do you suggest for this operation?

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What's the best way to tell a function about relevant fields in data frames

2009-05-12 Thread Titus von der Malsburg

Hi list,

I have a function that detects saccadic eye movements in a time series
of eye positions sampled at a rate of 250Hz.  This function needs
three vectors: x-coordinate, y-coordinate, trial-id.  This information
is usually contained in a data frame that also has some other fields.
The names of the fields are not standardized.

 head(eyemovements)
time x  y trial
51 880446504 53.18 375.73 1
52 880450686 53.20 375.79 1
53 880454885 53.35 376.14 1
54 880459060 53.92 376.39 1
55 880463239 54.14 376.52 1
56 880467426 54.46 376.74 1

There are now several possibilities for the signature of the function:

1. Passing the columns separately:

detect(eyemovements$x, eyemovements$y, eyemovements$trial)

  or:

with(eyemovements,
 detect(x, y, trial))

2. Passing the data frame plus the names of the fields:

detect(eyemovements, x, y, trial)

3. Passing the data frame plus a formula specifying the relevant
fields:

detect(eyemovements, ~x+y|trial)

4. Passing a formula and getting the data from the environment:

with(eyemovements,
 detect(~x+y|trial))

I saw instances of all those variants (and others) in the wild.

Is there a canonical way to tell a function which fields in a data
frame are relevant?  What other alternatives are possible?  What are
the pros and cons of the alternatives?

Thanks, Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What's the best way to tell a function about relevant fields in data frames

2009-05-12 Thread Titus von der Malsburg

Hi Zeljko,

thanks for your suggestion!

On Tue, May 12, 2009 at 12:26:48PM +0200, Zeljko Vrba wrote:
 Why not simply rearrange your data frames to have standardized column names
 (see names() function), and write functions that operate on the standardized
 format?

Actually that's what I'm currently doing.  And if the code was only
for my personal use I would stick with this solution.  However, I want
to publish my stuff as a package and want to make its use as
convenient as possible for the users.  The drawbacks of the current
solution are: The users have to perform the additional processing step
and the users have to know the correct format of the data frame.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Applying functions to partitions

2009-02-16 Thread Titus von der Malsburg

Hi list!  I have a large matrix which I'd like to partition into blocks
and for each block I'd like to compute the mean.  Following a example
where each letter marks a block of the partition:

 a a a d g g 
 a a a d g g
 a a a d g g
 b b b e h h
 b b b e h h
 c c c f i i

I'm only interested in the resulting matrix of means.  How can this be
done efficiently?

Thanks!  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying functions to partitions

2009-02-16 Thread Titus von der Malsburg
On Mon, Feb 16, 2009 at 01:45:52PM -0500, Stavros Macrakis wrote:
 How are the blocks defined? As a priori index ranges? By factors? By
 some property of i,j? Or...?

Ok, I should have been more specific.

The blocks are defined by factors.  There's a factor for the columns and
a factor for the rows.  In the example below the column factor would be
c(1,1,1,2,3,3) and the row factor c(1,1,1,2,2,3).  In the particular
case I'm working on the matrix is square and symmetric and there's only
one factor for both.

I can figure out ways to subset the matrix, similar to what Jorge
proposed, but I'm looking for a way to get the means more or less at
once because the matrix is pretty large and doing it block-wise is too
slow.

Thanks again!

 Titus


 On 2/16/09, Titus von der Malsburg malsb...@gmail.com wrote:
 
  Hi list!  I have a large matrix which I'd like to partition into blocks
  and for each block I'd like to compute the mean.  Following a example
  where each letter marks a block of the partition:
 
   a a a d g g
   a a a d g g
   a a a d g g
   b b b e h h
   b b b e h h
   c c c f i i
 
  I'm only interested in the resulting matrix of means.  How can this be
  done efficiently?
 
  Thanks!  Titus
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adjusting the Axis in a histogram to the prespecified breaks

2009-02-16 Thread Titus von der Malsburg
On Mon, Feb 16, 2009 at 11:08:24AM -0800, Christian Langkamp wrote:
 I could of course log the whole data
 set, but then explaining that transformation within a presentation is
 generally not a pleasant exercise.

You don't have to explain it.  Just calculate the hist of the log and
label the axis with 0, 1, 2, 4, 8, 16

See ?axis

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variables captured in closures get copied?

2009-02-11 Thread Titus von der Malsburg

Hi list!  I have a data frame called fix and a list of index vectors
called rois:

   head(rois, 3)
  [[1]]
  [1] 2 1

  [[2]]
  [1] 3

  [[3]]
  [1]  6  7 28 26 27 24 25

The part that's causing the issue is the following line:

  lapply(rois, function(roi) fix$x[roi] - 100)

So for every index vector I'd like to set the respective entries in the
data frame (fix) to 100.

I expected the data frame would be changed after lapply but instead it
remains unchanged.  I understand that when I pass an argument into a
function it gets passed as a value and not as a reference.  But here fix
is not an argument but captured in the closure.  Do my questions are:
What's going on here and what is the idiomatic way of achieving my goal?

Thanks for any help!

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting a data frame with break points where factor changes value

2009-01-31 Thread Titus von der Malsburg
I have a data frame called s3.  This data frame has a column called
saccade which has two levels 1 and -1.

  head(s3$saccade, 100)
 [1] NA NA NA NA -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 [26] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1
 [51]  1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 [76] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

How can I split this data frame into blocks such that a new block
begins when the value in s3$saccade changes?  Split doesn't seem to work
here.  It's important the solution is efficient because the data frame
is huge.

Thanks!

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multidimensional scalling

2009-01-31 Thread Titus von der Malsburg
Hi Tomek, have a look at R News, Volume 3/3, December 2003.  There you
find an article about different algorithms that are available in R.

  Titus

On Sat, Jan 31, 2009 at 01:36:29AM +0100, Tomek Wlodarski wrote:
 now I see that cmdscale is not the best option for my problem
 So I am wondering if you can advice me other method of MDS or
 different approach to my problem:
 I have matrix which describes distances between object and I would
 like to visualise this matrix onto 2D in such way that distance
 between each object on this map would be somehow proportional to
 distances between respective points in the matrix.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] melt stumbles over deleted columns

2009-01-22 Thread Titus von der Malsburg
I have a data frame that is the result of a cast (reshape) operation.  I
deleted the variable column and tried to melt the resulting data frame.
Depending on which method I use to delete the column I get different
error messages when melting:

 head(tinfos)
  vpn group trial_no item relation trial_type   rt variable  #
1 102 21 4351diag1 distractor 8471fix_d 27
2 102 22 1214   id target 4072fix_d 17
3 102 23 4213diag1 distractor 7040fix_d 27
4 102 24 1314   id target 4370fix_d 15
5 102 25 2655 vert distractor 4397fix_d 17
6 102 26 3322horiz distractor 6132fix_d 26
 tinfos$variable - NULL
 melt(tinfos)
Error: id variables not found in data: variable

Or:

 tinfos2 - tinfos[,-match(variable,names(tinfos))]
 melt(tinfos2)
Error in `rownames-`(`*tmp*`, value = character(0)) :
  attempt to set rownames on object with no dimensions
In addition: Warning messages:
1: In `[-.factor`(`*tmp*`, ri, value = c(8471L, 4072L, 7040L, 4370L,  :
  invalid factor level, NAs generated
2: In `[-.factor`(`*tmp*`, ri, value = c(0L, 0L, 1L, 0L, 0L, 0L, 0L,  :
  invalid factor level, NAs generated

I figure there must be some internal inconsistency in the data frame
after deletion.  Does anybody have an idea how to fix that?

Thanks!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.