Re: [R] extract fixed width fields from a string

2012-01-22 Thread Sam Steingold
 * Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]:
 On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote:

 then I need to split the two strings by 6/8 characters -- how?
 This makes no sense to me. strsplit takes care of this.

I want to convert

 c(abcd,de,fghijk)
[1] abcd   de fghijk

to

[1] ab cd de fg hi jk

i.e., split strings into substrings of a given length (2 in the above
example, 9 in my real problem).

actually, better yet, from

 data.frame(id=1:3,data=c(abcd,de,fghijk))
  id   data
1  1   abcd
2  2 de
3  3 fghijk

to
  id data
1  1 ab
2  1 cd
3  2 de
4  3 fg
5  3 hi
6  3 jk

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://camera.org http://honestreporting.com http://memri.org
http://truepeace.org http://www.PetitionOnline.com/tap12009/ http://ffii.org
OK, so you're a Ph.D.  Just don't touch anything.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread David Winsemius


On Jan 22, 2012, at 2:31 PM, Sam Steingold wrote:


* Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]:
On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote:


then I need to split the two strings by 6/8 characters -- how?

This makes no sense to me. strsplit takes care of this.


I want to convert


c(abcd,de,fghijk)

[1] abcd   de fghijk

to

[1] ab cd de fg hi jk

i.e., split strings into substrings of a given length (2 in the above
example, 9 in my real problem).


 unlist( strsplit( gsub((..), \\1,, c(abcd,de,fghijk)),  
, ) )

[1] ab cd de fg hi jk

Change .. to .{9} for you problem.



actually, better yet, from


data.frame(id=1:3,data=c(abcd,de,fghijk))

 id   data
1  1   abcd
2  2 de
3  3 fghijk


rep(1:3, lapply( strsplit( gsub((..), \\1,,  
c(abcd,de,fghijk)), , ) , length)

+ )
[1] 1 1 2 3 3 3



data.frame(id = rep(1:3, lapply( strsplit( gsub((..), \\1,,  
c(abcd,de,fghijk)), , ) , length) ),
 data= unlist( strsplit( gsub((..), \\1,,  
c(abcd,de,fghijk)), , ) )

  )
  id data
1  1   ab
2  1   cd
3  2   de
4  3   fg
5  3   hi
6  3   jk



to
 id data
1  1 ab
2  1 cd
3  2 de
4  3 fg
5  3 hi
6  3 jk


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread Sam Steingold
 * Petr Savicky fniv...@pf.pnf.pm [2012-01-20 21:59:51 +0100]:

 Try the following.

   x -
 tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
   x - strsplit(x, )[[1]]
   digits - 0:35
   names(digits) - c(0:9, letters)
   y - digits[x]
  
   # solution using gmp package
   library(gmp)
   b - as.bigz(36)
   sum(y * b^(length(y):1 - 1))
  
   [1]
 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994

thanks, here is what I wrote:

## convert a string to an integer in the given base
digits - 0:63
names(digits) - c(0:9, letters, toupper(letters), -_)
string2int - function (str, base=10) {
  d - digits[strsplit(str,)[[1]]]
  sum(d * base^(length(d):1 - 1))
}

and it appears to work.
however, I want to be able to apply it to all elements of a vector.
I can use apply:

 unlist(lapply(c(100,12,213),string2int))
[1] 100  12 213

but not directly:

 string2int(c(100,12,213))
[1] 100

thanks a lot for your help!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://honestreporting.com http://thereligionofpeace.com http://camera.org
http://www.memritv.org http://openvotingconsortium.org
A man paints with his brains and not with his hands.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread Jorge I Velez
What is wrong with as.numeric()?

 as.numeric(c(100,12,213))
[1] 100  12 213
 sum(as.numeric(c(100,12,213)))
[1] 325

HTH,
Jorge


On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold  wrote:

  * Petr Savicky  [2012-01-20 21:59:51 +0100]:
 
  Try the following.
 
x -
  tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
x - strsplit(x, )[[1]]
digits - 0:35
names(digits) - c(0:9, letters)
y - digits[x]
 
# solution using gmp package
library(gmp)
b - as.bigz(36)
sum(y * b^(length(y):1 - 1))
 
[1]
 
 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994

 thanks, here is what I wrote:

 ## convert a string to an integer in the given base
 digits - 0:63
 names(digits) - c(0:9, letters, toupper(letters), -_)
 string2int - function (str, base=10) {
  d - digits[strsplit(str,)[[1]]]
  sum(d * base^(length(d):1 - 1))
 }

 and it appears to work.
 however, I want to be able to apply it to all elements of a vector.
 I can use apply:

  unlist(lapply(c(100,12,213),string2int))
 [1] 100  12 213

 but not directly:

  string2int(c(100,12,213))
 [1] 100

 thanks a lot for your help!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
 11.0.11004000
 http://honestreporting.com http://thereligionofpeace.com http://camera.org
 http://www.memritv.org http://openvotingconsortium.org
 A man paints with his brains and not with his hands.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread Sam Steingold
 * Jorge I Velez wbetrvinair...@tznvy.pbz [2012-01-22 15:40:09 -0500]:

 What is wrong with as.numeric()?

 as.numeric(c(100,12,213))
 [1] 100  12 213
 sum(as.numeric(c(100,12,213)))
 [1] 325

as.numeric handles only decimals; I need other bases too (36  64)

 HTH,
 Jorge


 On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold  wrote:

  * Petr Savicky  [2012-01-20 21:59:51 +0100]:
 
  Try the following.
 
x -
  tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
x - strsplit(x, )[[1]]
digits - 0:35
names(digits) - c(0:9, letters)
y - digits[x]
 
# solution using gmp package
library(gmp)
b - as.bigz(36)
sum(y * b^(length(y):1 - 1))
 
[1]
 
 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994

 thanks, here is what I wrote:

 ## convert a string to an integer in the given base
 digits - 0:63
 names(digits) - c(0:9, letters, toupper(letters), -_)
 string2int - function (str, base=10) {
  d - digits[strsplit(str,)[[1]]]
  sum(d * base^(length(d):1 - 1))
 }

 and it appears to work.
 however, I want to be able to apply it to all elements of a vector.
 I can use apply:

  unlist(lapply(c(100,12,213),string2int))
 [1] 100  12 213

 but not directly:

  string2int(c(100,12,213))
 [1] 100

 thanks a lot for your help!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
 11.0.11004000
 http://honestreporting.com http://thereligionofpeace.com http://camera.org
 http://www.memritv.org http://openvotingconsortium.org
 A man paints with his brains and not with his hands.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://palestinefacts.org http://honestreporting.com http://mideasttruth.com
http://pmw.org.il http://www.PetitionOnline.com/tap12009/ http://jihadwatch.org
Bill Gates is not god and Microsoft is not heaven.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread R. Michael Weylandt
To Jorge: non-necessarily base 10.

To Sam: your problem is the sum() function which collapses vectors: if
this is really performance critical to your overall task, I'd look at
doing it with Rcpp+inline. Otherwise, it might be possible to do this
using matrix operations (i.e., split your strings characterwise,
assemble into a matrix, then multiply the matrix by c(b^2, b^1, b^0)
or somesuch and finally use rowSums/colSums) if you really want to
avoid apply()

Michael

On Sun, Jan 22, 2012 at 3:40 PM, Jorge I Velez jorgeivanve...@gmail.com wrote:
 What is wrong with as.numeric()?

 as.numeric(c(100,12,213))
 [1] 100  12 213
 sum(as.numeric(c(100,12,213)))
 [1] 325

 HTH,
 Jorge


 On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold  wrote:

  * Petr Savicky  [2012-01-20 21:59:51 +0100]:
 
  Try the following.
 
    x -
  tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
    x - strsplit(x, )[[1]]
    digits - 0:35
    names(digits) - c(0:9, letters)
    y - digits[x]
 
    # solution using gmp package
    library(gmp)
    b - as.bigz(36)
    sum(y * b^(length(y):1 - 1))
 
    [1]
 
 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994

 thanks, here is what I wrote:

 ## convert a string to an integer in the given base
 digits - 0:63
 names(digits) - c(0:9, letters, toupper(letters), -_)
 string2int - function (str, base=10) {
  d - digits[strsplit(str,)[[1]]]
  sum(d * base^(length(d):1 - 1))
 }

 and it appears to work.
 however, I want to be able to apply it to all elements of a vector.
 I can use apply:

  unlist(lapply(c(100,12,213),string2int))
 [1] 100  12 213

 but not directly:

  string2int(c(100,12,213))
 [1] 100

 thanks a lot for your help!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
 11.0.11004000
 http://honestreporting.com http://thereligionofpeace.com http://camera.org
 http://www.memritv.org http://openvotingconsortium.org
 A man paints with his brains and not with his hands.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread Petr Savicky
On Sun, Jan 22, 2012 at 03:34:12PM -0500, Sam Steingold wrote:
  * Petr Savicky fniv...@pf.pnf.pm [2012-01-20 21:59:51 +0100]:
 
  Try the following.
 
x -
  tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
x - strsplit(x, )[[1]]
digits - 0:35
names(digits) - c(0:9, letters)
y - digits[x]
   
# solution using gmp package
library(gmp)
b - as.bigz(36)
sum(y * b^(length(y):1 - 1))
   
[1]
  7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994
 
 thanks, here is what I wrote:
 
 ## convert a string to an integer in the given base
 digits - 0:63
 names(digits) - c(0:9, letters, toupper(letters), -_)
 string2int - function (str, base=10) {
   d - digits[strsplit(str,)[[1]]]
   sum(d * base^(length(d):1 - 1))
 }
 
 and it appears to work.
 however, I want to be able to apply it to all elements of a vector.
 I can use apply:
 
  unlist(lapply(c(100,12,213),string2int))
 [1] 100  12 213
 
 but not directly:
 
  string2int(c(100,12,213))
 [1] 100

Hi.

Here, you get the result only for the first string due
to [[1]] applied to strsplit(str,).

As suggested by Michael, a matrix can be used, if
the input is a character vector, whose components
have the same character length (nchar).

  strings2int - function (str, base=10) {
m - length(str)
n - unique(nchar(str))
stopifnot(length(n) == 1) # test of all nchar() equal
ch - strsplit(str, )
ch - unlist(ch)
d - matrix(digits[ch], nrow=m, ncol=n, byrow=TRUE)
c(d %*% base^(n:1 - 1))
  }

  strings2int(c(100,012,213,453))

  [1] 100  12 213 453

  strings2int(c(100,12,213,453))

  Error: length(n) == 1 is not TRUE

Petr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-22 Thread Bert Gunter
?substr ## for extracting fixed width substrings
(along with some apply functions)

-- Bert

On Sun, Jan 22, 2012 at 11:31 AM, Sam Steingold s...@gnu.org wrote:
 * Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]:
 On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote:

 then I need to split the two strings by 6/8 characters -- how?
 This makes no sense to me. strsplit takes care of this.

 I want to convert

 c(abcd,de,fghijk)
 [1] abcd   de     fghijk

 to

 [1] ab cd de fg hi jk

 i.e., split strings into substrings of a given length (2 in the above
 example, 9 in my real problem).

 actually, better yet, from

 data.frame(id=1:3,data=c(abcd,de,fghijk))
  id   data
 1  1   abcd
 2  2     de
 3  3 fghijk

 to
  id data
 1  1 ab
 2  1 cd
 3  2 de
 4  3 fg
 5  3 hi
 6  3 jk

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 
 11.0.11004000
 http://camera.org http://honestreporting.com http://memri.org
 http://truepeace.org http://www.PetitionOnline.com/tap12009/ http://ffii.org
 OK, so you're a Ph.D.  Just don't touch anything.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extract fixed width fields from a string

2012-01-20 Thread Sam Steingold
Hi,
I have a data frame with one column containing string of the form 
ABC...|XYZ...
where ABC etc are fields of 6 alphanumeric characters each
and XYZ etc are fields of 8 alphanumeric characters each;
| is a mandatory separator;
I do not know in advance how many fields of each kind will each row contain.
I need to extract these fields from the string.

=== How do I do that?

first I need to split the string in 2 on '|' - how?
then I need to split the two strings by 6/8 characters -- how?
then I need to convert each 6/8 character string into an integer base 36
or 64 (depending on the field) - how?

=== What do I do with them once I extract them?

First thing I want to do is to have a count table of them.
Then I thought of adding an extra column for each field value and
putting 0/1 there, e.g., frame
1,AB
2,BCD
will turn into
1,1,1,0,0
2,0,1,1,1
however this would work only if the number of different field values is
manageable.
What do people do?
Can I have a columns of sets in data frame?
Does R support the set data type?

Thanks!

PS. thanks to Sarah Goslee who answered my previous question in so much detail!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://camera.org http://openvotingconsortium.org http://iris.org.il
http://mideasttruth.com http://memri.org http://honestreporting.com
Don't take life too seriously, you'll never get out of it alive!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread Sarah Goslee
Reproducible example, please. This doesn't make a whole lot of sense
otherwise.

On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote:
 Hi,
 I have a data frame with one column containing string of the form 
 ABC...|XYZ...
 where ABC etc are fields of 6 alphanumeric characters each
 and XYZ etc are fields of 8 alphanumeric characters each;
 | is a mandatory separator;
 I do not know in advance how many fields of each kind will each row contain.
 I need to extract these fields from the string.

This is already a data frame, so you don't need to import it into R,
just process
it?

 === How do I do that?

 first I need to split the string in 2 on '|' - how?

strsplit()

 then I need to split the two strings by 6/8 characters -- how?

substring() perhaps


 then I need to convert each 6/8 character string into an integer base 36
 or 64 (depending on the field) - how?

base 36? Really? How are you representing that? Somehow I think you
mean something other than what you said. Either way, please clarify.

 === What do I do with them once I extract them?

I don't know. Save them as a list, most likely.

 First thing I want to do is to have a count table of them.
 Then I thought of adding an extra column for each field value and
 putting 0/1 there, e.g., frame
 1,AB
 2,BCD

I thought we had integers at this point?

 will turn into
 1,1,1,0,0
 2,0,1,1,1
 however this would work only if the number of different field values is
 manageable.

But we have no idea, because you haven't told us.

 What do people do?
 Can I have a columns of sets in data frame?
 Does R support the set data type?

factor() seems to be what you're looking for.

 PS. thanks to Sarah Goslee who answered my previous question in so much 
 detail!

You're welcome, but you'd be even more welcome if you'd listened to
the parts of my reply about reproducible examples, clear problem
statements, and reading the posting guide.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread Bert Gunter
Sam:

On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote:
 Hi,
 I have a data frame with one column containing string of the form 
 ABC...|XYZ...
 where ABC etc are fields of 6 alphanumeric characters each
 and XYZ etc are fields of 8 alphanumeric characters each;
 | is a mandatory separator;
 I do not know in advance how many fields of each kind will each row contain.
 I need to extract these fields from the string.

 === How do I do that?

 first I need to split the string in 2 on '|' - how?
?strsplit
strsplit(thecolumn, |,fixed=TRUE)

 then I need to split the two strings by 6/8 characters -- how?
This makes no sense to me. strsplit takes care of this.

 then I need to convert each 6/8 character string into an integer base 36
 or 64 (depending on the field) - how?
No clue. Depends on the encoding AFAICS.

-- Bert


 === What do I do with them once I extract them?

 First thing I want to do is to have a count table of them.
 Then I thought of adding an extra column for each field value and
 putting 0/1 there, e.g., frame
 1,AB
 2,BCD
 will turn into
 1,1,1,0,0
 2,0,1,1,1
 however this would work only if the number of different field values is
 manageable.
 What do people do?
 Can I have a columns of sets in data frame?
 Does R support the set data type?

 Thanks!

 PS. thanks to Sarah Goslee who answered my previous question in so much 
 detail!
 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 
 11.0.11004000
 http://camera.org http://openvotingconsortium.org http://iris.org.il
 http://mideasttruth.com http://memri.org http://honestreporting.com
 Don't take life too seriously, you'll never get out of it alive!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread Sam Steingold
On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote:
 then I need to convert each 6/8 character string into an integer base 36
 or 64 (depending on the field) - how?

 base 36?

10 decimal digits + 26 english characters = 36.
ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36
(case insensitive).
So, how do I convert the above long word to a bignum?
actually, my numbers will fit into int64, no bignum support is necessary.

thanks.

-- 
Sam Steingold http://sds.podval.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread Sam Steingold
On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote:
 Reproducible example, please. This doesn't make a whole lot of sense
 otherwise.

here is the string:
1288915200|070400905a0A118

I want the following data extracted from it:
1. the decimal number before |: 1288915200
2. the string after | split into 3 parts, each of length 9 bytes,
and then split into 3 more parts:
id: the first 6 bytes, int, base 36;
count: the next 2 bytes, int, base 10;
offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_)
i.e., the above line is:
id=7, count=4, days=0
id=9; count=5; offset=10
id=10; count=11; offset=8

thanks.

 On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote:
 Hi,
 I have a data frame with one column containing string of the form 
 ABC...|XYZ...
 where ABC etc are fields of 6 alphanumeric characters each
 and XYZ etc are fields of 8 alphanumeric characters each;
 | is a mandatory separator;
 I do not know in advance how many fields of each kind will each row contain.
 I need to extract these fields from the string.

 This is already a data frame, so you don't need to import it into R,
 just process it?

yes.

 I don't know. Save them as a list, most likely.

can a column contain lists?

 First thing I want to do is to have a count table of them.
 Then I thought of adding an extra column for each field value and
 putting 0/1 there, e.g., frame
 1,AB
 2,BCD

 I thought we had integers at this point?

yes, A..D are placeholders for integers

 What do people do?
 Can I have a columns of sets in data frame?
 Does R support the set data type?

 factor() seems to be what you're looking for.

no, a column of factors will contain a single factor item in each row.
e.g.:
1 A
2 B
3 A
4 C
I want each row to contain a set of factor items:
1 AB
2 A
3 C
4 void


-- 
Sam Steingold http://sds.podval.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread jim holtman
Here part of it.  This is the conversion of base 36 to numeric that is
case insensitive.  This makes use of mapping the alphabetics to
characters that start just after '9' and then doing the conversion.
You can extend it to base 64 using the same approach.


 base36ToInteger - function (Str)
+ {
+ common - chartr(
+ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ  # input
+   , :;=?@ABCDEFGHIJKLMNOPQRS:;=?@ABCDEFGHIJKLMNOPQRS  #
'magic' translation
+   , Str
+   )
+ x - as.numeric(charToRaw(common)) - 48
+ sum(x * 36 ^ rev(seq(length(x)) - 1))
+ }
 base36ToInteger('1')
[1] 1
 base36ToInteger('12')
[1] 38
 base36ToInteger('123')
[1] 1371
 base36ToInteger('1234')
[1] 49360
 base36ToInteger('12345')
[1] 1776965
 base36ToInteger('123456')
[1] 63970746




On Fri, Jan 20, 2012 at 3:25 PM, Sam Steingold s...@gnu.org wrote:
 On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote:
 Reproducible example, please. This doesn't make a whole lot of sense
 otherwise.

 here is the string:
 1288915200|070400905a0A118

 I want the following data extracted from it:
 1. the decimal number before |: 1288915200
 2. the string after | split into 3 parts, each of length 9 bytes,
 and then split into 3 more parts:
 id: the first 6 bytes, int, base 36;
 count: the next 2 bytes, int, base 10;
 offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_)
 i.e., the above line is:
 id=7, count=4, days=0
 id=9; count=5; offset=10
 id=10; count=11; offset=8

 thanks.

 On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote:
 Hi,
 I have a data frame with one column containing string of the form 
 ABC...|XYZ...
 where ABC etc are fields of 6 alphanumeric characters each
 and XYZ etc are fields of 8 alphanumeric characters each;
 | is a mandatory separator;
 I do not know in advance how many fields of each kind will each row contain.
 I need to extract these fields from the string.

 This is already a data frame, so you don't need to import it into R,
 just process it?

 yes.

 I don't know. Save them as a list, most likely.

 can a column contain lists?

 First thing I want to do is to have a count table of them.
 Then I thought of adding an extra column for each field value and
 putting 0/1 there, e.g., frame
 1,AB
 2,BCD

 I thought we had integers at this point?

 yes, A..D are placeholders for integers

 What do people do?
 Can I have a columns of sets in data frame?
 Does R support the set data type?

 factor() seems to be what you're looking for.

 no, a column of factors will contain a single factor item in each row.
 e.g.:
 1 A
 2 B
 3 A
 4 C
 I want each row to contain a set of factor items:
 1 AB
 2 A
 3 C
 4 void


 --
 Sam Steingold http://sds.podval.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract fixed width fields from a string

2012-01-20 Thread Petr Savicky
On Fri, Jan 20, 2012 at 03:14:21PM -0500, Sam Steingold wrote:
 On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote:
  then I need to convert each 6/8 character string into an integer base 36
  or 64 (depending on the field) - how?
 
  base 36?
 
 10 decimal digits + 26 english characters = 36.
 ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36
 (case insensitive).
 So, how do I convert the above long word to a bignum?

Hi.

Try the following.

  x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36)
  x - strsplit(x, )[[1]]
  digits - 0:35
  names(digits) - c(0:9, letters)
  y - digits[x]
 
  # solution using gmp package
  library(gmp)
  b - as.bigz(36)
  sum(y * b^(length(y):1 - 1))
 
  [1] 
7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994
 
  # solution using Rmpfr package
  library(Rmpfr)
  b - mpfr(36, precBits=500)
  sum(y * b^(length(y):1 - 1))
 
  [1] 
7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994

actually, my numbers will fit into int64, no bignum support is necessary.

The default R numeric data type is double precision,
which represents integers up to 53 bits, so the
largest exactly representable integer is 2^53.
The integer type is 32 bits.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.