date:20080924

[Rd] splitting strings efficiently

2008-09-24 Thread Mark Kimpel

I have a very long list of strings. Each string actually contains multiple
values separated by a semi-colon. I need to turn each string into a vector
of the values delimited by the semi-colons. I know I can do this very
laboriously by using loops, nchar, and substr, but it is terribly slow. Is
there a basic R function that handles this situation? If not, is there
perhaps a faster way to do it than I currently am, which is to lapply the
following function? Thanks, Mark

###
string.tokenizer.func-function(string, separator){
  new.vec- NULL
  newString- 
  if(is.null(string)) {new.vec-} else {
for(i in 1:(nchar(string) + 1)){
  if(substr(string, i, i) == separator){
new.vec-c(new.vec,newString)
newString - 
  } else {
newString-paste(newString, substr(string, i, i), sep=)
  }
}
new.vec-c(new.vec,newString)
  }
  new.vec
}

Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN 46074

(317) 490-5129 Work,  Mobile  VoiceMail
(317) 399-1219 Home
Skype: mkimpel

**

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] splitting strings efficiently

2008-09-24 Thread Mark Kimpel

I knew there HAD to be a basic function, but 'help.search(split string)'
and 'help(string) did not find it. Thanks for the help on this elementary
question.
Mark

Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN 46074

(317) 490-5129 Work,  Mobile  VoiceMail
(317) 399-1219 Home
Skype: mkimpel

**


On Wed, Sep 24, 2008 at 12:17 PM, Erik Iverson [EMAIL PROTECTED]wrote:

 ?strsplit

 Mark Kimpel wrote:

 I have a very long list of strings. Each string actually contains multiple
 values separated by a semi-colon. I need to turn each string into a vector
 of the values delimited by the semi-colons. I know I can do this very
 laboriously by using loops, nchar, and substr, but it is terribly slow. Is
 there a basic R function that handles this situation? If not, is there
 perhaps a faster way to do it than I currently am, which is to lapply the
 following function? Thanks, Mark


 ###
 string.tokenizer.func-function(string, separator){
  new.vec- NULL
  newString- 
  if(is.null(string)) {new.vec-} else {
for(i in 1:(nchar(string) + 1)){
  if(substr(string, i, i) == separator){
new.vec-c(new.vec,newString)
newString - 
  } else {
newString-paste(newString, substr(string, i, i), sep=)
  }
}
new.vec-c(new.vec,newString)
  }
  new.vec
 }
 
 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN 46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219 Home
 Skype: mkimpel

 **

[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] splitting strings efficiently

2008-09-24 Thread Gabor Grothendieck

Also one can create a text connection and read it using read.table, scan, etc.

s - c(12;13;14, 15;16;17)

read.table(textConnection(s), sep = ;)
# or
scan(textConnection(s), sep = ;)


On Wed, Sep 24, 2008 at 12:20 PM, Mark Kimpel [EMAIL PROTECTED] wrote:
 I knew there HAD to be a basic function, but 'help.search(split string)'
 and 'help(string) did not find it. Thanks for the help on this elementary
 question.
 Mark
 
 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN 46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219 Home
 Skype: mkimpel

 **


 On Wed, Sep 24, 2008 at 12:17 PM, Erik Iverson [EMAIL PROTECTED]wrote:

 ?strsplit

 Mark Kimpel wrote:

 I have a very long list of strings. Each string actually contains multiple
 values separated by a semi-colon. I need to turn each string into a vector
 of the values delimited by the semi-colons. I know I can do this very
 laboriously by using loops, nchar, and substr, but it is terribly slow. Is
 there a basic R function that handles this situation? If not, is there
 perhaps a faster way to do it than I currently am, which is to lapply the
 following function? Thanks, Mark


 ###
 string.tokenizer.func-function(string, separator){
  new.vec- NULL
  newString- 
  if(is.null(string)) {new.vec-} else {
for(i in 1:(nchar(string) + 1)){
  if(substr(string, i, i) == separator){
new.vec-c(new.vec,newString)
newString - 
  } else {
newString-paste(newString, substr(string, i, i), sep=)
  }
}
new.vec-c(new.vec,newString)
  }
  new.vec
 }
 
 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN 46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219 Home
 Skype: mkimpel

 **

[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] function can permanently modify calling function via substitute?

2008-09-24 Thread Luke Tierney


On Wed, 24 Sep 2008, Peter Dalgaard wrote:


Perry de Valpine wrote:

Dear R-devel:

The following code seems to allow one function to permanently modify a
calling function.  I did not expect this would be allowed (short of
more creative gymnastics) and wonder if it is really intended.  (I can
see other ways to accomplish the intended task of this code [e.g. via
match.call instead of substitute below] that do not trigger the
problem, but I don't think that is the point.)

do.nothing - function(blah) {force(blah)}

do.stuff.with.call - function(mycall) {
  raw.mycall - substitute(mycall);   # expected raw.mycall would be local
  print( sys.call() )

  # do.nothing( raw.mycall );  # See below re: commented lines.
  # .Call( showNAMED, raw.mycall[[2]] )

  force( mycall );  # not relevant where (or whether) this is done
  raw.mycall[[2]] - runif(1); # permanently modifies try.me on the
first time only

  # .Call( showNAMED, raw.mycall[[2]] )

  raw.mycall
}

gumbo - function(x) {
  writeLines( paste( gumbo : x = ,  x ) )
  return(x);
}

try.me - function() {
  one.val - 111;
  one.ans - do.stuff.with.call( mycall = gumbo( x = one.val ) );
  one.ans
}

# after source()ing the above:


deparse(try.me)


[1] function () 
[2] {
[3] one.val - 111
[4] one.ans - do.stuff.with.call(mycall = gumbo(x = one.val))
[5] one.ans
[6] }


try.me()


do.stuff.with.call(mycall = gumbo(x = one.val))
gumbo : x = 0.396524668671191
gumbo(x = 0.396524668671191)


deparse(try.me)


[1] function () 
[2] {
[3] one.val - 111
[4] one.ans - do.stuff.with.call(mycall = gumbo(x = 
0.396524668671191))

[5] one.ans
[6] }


try.me()


do.stuff.with.call(mycall = gumbo(x = 0.396524668671191))
gumbo : x = 0.396524668671191
gumbo(x = 0.0078618151601404)


deparse(try.me)


[1] function () 
[2] {
[3] one.val - 111
[4] one.ans - do.stuff.with.call(mycall = gumbo(x = 
0.396524668671191))

[5] one.ans
[6] }

So, after the first call of try.me(), do.stuff.with.call has
permanently replaced the name one.val in line 2 of try.me with a
numeric (0.396...).  Subsequent calls from try.me to
do.stuff.with.call now reflect that change, but do.stuff.with.call
does not modify the try.me object again. (Note this means one needs to
keep reloading try.me to investigate).

If this is a problem worth investigating, here are a couple of other
observations that may be relevant but are obviously speculative.

1. If the third line of do.stuff.with.call is uncommented (and try.me
also reloaded), the unexpected behavior does not occur.  Since
do.nothing is eponymous, I was surprised because I believed it should
not impact any other behavior.  Speculating with limited knowledge, I
thought this might implicate something that is supposed to stay
under-the-hood, such as the `call by value' illusion described in
the R internals documentation.

2. Poking slightly further, I looked at the NAMED values using this C
code via R CMD SHLIB and dyn.load:
#include R.h
#include Rdefines.h
SEXP showNAMED(SEXP obj) {
  Rprintf(%i\n, NAMED(obj));
  return(R_NilValue);
}
Uncommenting the .Call lines in do.stuff.with.call (with the
do.nothing line re-commented) reveals that on the first time
do.stuff.with.call is called from try.me, raw.mycall[[2]] has NAMED ==
1 both before and after the `[[-` line.  On subsequent calls it has
NAMED == 2 before and NAMED == 1 after.  If I follow how NAMED is
used, this seems relevant.


Yes and no. This does sound like a bug and NAMED is likely involved, but I 
don't think raw.mycall[[2]] is the thing to look at. More likely, the issue 
is that raw.mycall[ itself has NAMED == 1 because otherwise [[- assignment 
would duplicate it first. This suggests that substitute has the bug.


Our extraction functions, like [[, bump up the NAMED value for
components to the value for the container (or to 2 -- doesn't look
like we are consistent here).  substitute() doesn't do that, and
perhaps could.  But arguably it is the point where the promise (from
which substitute gets the expression) is created that is the
extraction point. We could have mkPromise test for NAMED == 2 and bump
up if it isn't.  We could also have parse create all LANGSXPs with
NAMED == 2 but that leaves out programmatically created functions.
Either change fixes this bug; not sure which is the best one (or
whether we should do both).  Changing mkPromise is more conservative
and potentially a little more costly but probably not enough to
notice.

luke

--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list

Re: [Rd] splitting strings efficiently

2008-09-24 Thread Henrik Bengtsson

For strsplit(), note that fixed=TRUE is much faster.  /HB

On Wed, Sep 24, 2008 at 9:20 AM, Mark Kimpel [EMAIL PROTECTED] wrote:
 I knew there HAD to be a basic function, but 'help.search(split string)'
 and 'help(string) did not find it. Thanks for the help on this elementary
 question.
 Mark
 
 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN 46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219 Home
 Skype: mkimpel

 **


 On Wed, Sep 24, 2008 at 12:17 PM, Erik Iverson [EMAIL PROTECTED]wrote:

 ?strsplit

 Mark Kimpel wrote:

 I have a very long list of strings. Each string actually contains multiple
 values separated by a semi-colon. I need to turn each string into a vector
 of the values delimited by the semi-colons. I know I can do this very
 laboriously by using loops, nchar, and substr, but it is terribly slow. Is
 there a basic R function that handles this situation? If not, is there
 perhaps a faster way to do it than I currently am, which is to lapply the
 following function? Thanks, Mark


 ###
 string.tokenizer.func-function(string, separator){
  new.vec- NULL
  newString- 
  if(is.null(string)) {new.vec-} else {
for(i in 1:(nchar(string) + 1)){
  if(substr(string, i, i) == separator){
new.vec-c(new.vec,newString)
newString - 
  } else {
newString-paste(newString, substr(string, i, i), sep=)
  }
}
new.vec-c(new.vec,newString)
  }
  new.vec
 }
 
 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN 46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219 Home
 Skype: mkimpel

 **

[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] splitting strings efficiently

Re: [Rd] splitting strings efficiently

Re: [Rd] splitting strings efficiently

Re: [Rd] function can permanently modify calling function via substitute?

Re: [Rd] splitting strings efficiently

5 matches

Site Navigation

Mail list logo

Footer information