Re: [R] extract fixed width fields from a string
* Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]: On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote: then I need to split the two strings by 6/8 characters -- how? This makes no sense to me. strsplit takes care of this. I want to convert c(abcd,de,fghijk) [1] abcd de fghijk to [1] ab cd de fg hi jk i.e., split strings into substrings of a given length (2 in the above example, 9 in my real problem). actually, better yet, from data.frame(id=1:3,data=c(abcd,de,fghijk)) id data 1 1 abcd 2 2 de 3 3 fghijk to id data 1 1 ab 2 1 cd 3 2 de 4 3 fg 5 3 hi 6 3 jk -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://camera.org http://honestreporting.com http://memri.org http://truepeace.org http://www.PetitionOnline.com/tap12009/ http://ffii.org OK, so you're a Ph.D. Just don't touch anything. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
On Jan 22, 2012, at 2:31 PM, Sam Steingold wrote: * Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]: On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote: then I need to split the two strings by 6/8 characters -- how? This makes no sense to me. strsplit takes care of this. I want to convert c(abcd,de,fghijk) [1] abcd de fghijk to [1] ab cd de fg hi jk i.e., split strings into substrings of a given length (2 in the above example, 9 in my real problem). unlist( strsplit( gsub((..), \\1,, c(abcd,de,fghijk)), , ) ) [1] ab cd de fg hi jk Change .. to .{9} for you problem. actually, better yet, from data.frame(id=1:3,data=c(abcd,de,fghijk)) id data 1 1 abcd 2 2 de 3 3 fghijk rep(1:3, lapply( strsplit( gsub((..), \\1,, c(abcd,de,fghijk)), , ) , length) + ) [1] 1 1 2 3 3 3 data.frame(id = rep(1:3, lapply( strsplit( gsub((..), \\1,, c(abcd,de,fghijk)), , ) , length) ), data= unlist( strsplit( gsub((..), \\1,, c(abcd,de,fghijk)), , ) ) ) id data 1 1 ab 2 1 cd 3 2 de 4 3 fg 5 3 hi 6 3 jk to id data 1 1 ab 2 1 cd 3 2 de 4 3 fg 5 3 hi 6 3 jk -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
* Petr Savicky fniv...@pf.pnf.pm [2012-01-20 21:59:51 +0100]: Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 thanks, here is what I wrote: ## convert a string to an integer in the given base digits - 0:63 names(digits) - c(0:9, letters, toupper(letters), -_) string2int - function (str, base=10) { d - digits[strsplit(str,)[[1]]] sum(d * base^(length(d):1 - 1)) } and it appears to work. however, I want to be able to apply it to all elements of a vector. I can use apply: unlist(lapply(c(100,12,213),string2int)) [1] 100 12 213 but not directly: string2int(c(100,12,213)) [1] 100 thanks a lot for your help! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://honestreporting.com http://thereligionofpeace.com http://camera.org http://www.memritv.org http://openvotingconsortium.org A man paints with his brains and not with his hands. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
What is wrong with as.numeric()? as.numeric(c(100,12,213)) [1] 100 12 213 sum(as.numeric(c(100,12,213))) [1] 325 HTH, Jorge On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold wrote: * Petr Savicky [2012-01-20 21:59:51 +0100]: Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 thanks, here is what I wrote: ## convert a string to an integer in the given base digits - 0:63 names(digits) - c(0:9, letters, toupper(letters), -_) string2int - function (str, base=10) { d - digits[strsplit(str,)[[1]]] sum(d * base^(length(d):1 - 1)) } and it appears to work. however, I want to be able to apply it to all elements of a vector. I can use apply: unlist(lapply(c(100,12,213),string2int)) [1] 100 12 213 but not directly: string2int(c(100,12,213)) [1] 100 thanks a lot for your help! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://honestreporting.com http://thereligionofpeace.com http://camera.org http://www.memritv.org http://openvotingconsortium.org A man paints with his brains and not with his hands. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
* Jorge I Velez wbetrvinair...@tznvy.pbz [2012-01-22 15:40:09 -0500]: What is wrong with as.numeric()? as.numeric(c(100,12,213)) [1] 100 12 213 sum(as.numeric(c(100,12,213))) [1] 325 as.numeric handles only decimals; I need other bases too (36 64) HTH, Jorge On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold wrote: * Petr Savicky [2012-01-20 21:59:51 +0100]: Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 thanks, here is what I wrote: ## convert a string to an integer in the given base digits - 0:63 names(digits) - c(0:9, letters, toupper(letters), -_) string2int - function (str, base=10) { d - digits[strsplit(str,)[[1]]] sum(d * base^(length(d):1 - 1)) } and it appears to work. however, I want to be able to apply it to all elements of a vector. I can use apply: unlist(lapply(c(100,12,213),string2int)) [1] 100 12 213 but not directly: string2int(c(100,12,213)) [1] 100 thanks a lot for your help! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://honestreporting.com http://thereligionofpeace.com http://camera.org http://www.memritv.org http://openvotingconsortium.org A man paints with his brains and not with his hands. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://palestinefacts.org http://honestreporting.com http://mideasttruth.com http://pmw.org.il http://www.PetitionOnline.com/tap12009/ http://jihadwatch.org Bill Gates is not god and Microsoft is not heaven. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
To Jorge: non-necessarily base 10. To Sam: your problem is the sum() function which collapses vectors: if this is really performance critical to your overall task, I'd look at doing it with Rcpp+inline. Otherwise, it might be possible to do this using matrix operations (i.e., split your strings characterwise, assemble into a matrix, then multiply the matrix by c(b^2, b^1, b^0) or somesuch and finally use rowSums/colSums) if you really want to avoid apply() Michael On Sun, Jan 22, 2012 at 3:40 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: What is wrong with as.numeric()? as.numeric(c(100,12,213)) [1] 100 12 213 sum(as.numeric(c(100,12,213))) [1] 325 HTH, Jorge On Sun, Jan 22, 2012 at 3:34 PM, Sam Steingold wrote: * Petr Savicky [2012-01-20 21:59:51 +0100]: Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 thanks, here is what I wrote: ## convert a string to an integer in the given base digits - 0:63 names(digits) - c(0:9, letters, toupper(letters), -_) string2int - function (str, base=10) { d - digits[strsplit(str,)[[1]]] sum(d * base^(length(d):1 - 1)) } and it appears to work. however, I want to be able to apply it to all elements of a vector. I can use apply: unlist(lapply(c(100,12,213),string2int)) [1] 100 12 213 but not directly: string2int(c(100,12,213)) [1] 100 thanks a lot for your help! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://honestreporting.com http://thereligionofpeace.com http://camera.org http://www.memritv.org http://openvotingconsortium.org A man paints with his brains and not with his hands. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
On Sun, Jan 22, 2012 at 03:34:12PM -0500, Sam Steingold wrote: * Petr Savicky fniv...@pf.pnf.pm [2012-01-20 21:59:51 +0100]: Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 thanks, here is what I wrote: ## convert a string to an integer in the given base digits - 0:63 names(digits) - c(0:9, letters, toupper(letters), -_) string2int - function (str, base=10) { d - digits[strsplit(str,)[[1]]] sum(d * base^(length(d):1 - 1)) } and it appears to work. however, I want to be able to apply it to all elements of a vector. I can use apply: unlist(lapply(c(100,12,213),string2int)) [1] 100 12 213 but not directly: string2int(c(100,12,213)) [1] 100 Hi. Here, you get the result only for the first string due to [[1]] applied to strsplit(str,). As suggested by Michael, a matrix can be used, if the input is a character vector, whose components have the same character length (nchar). strings2int - function (str, base=10) { m - length(str) n - unique(nchar(str)) stopifnot(length(n) == 1) # test of all nchar() equal ch - strsplit(str, ) ch - unlist(ch) d - matrix(digits[ch], nrow=m, ncol=n, byrow=TRUE) c(d %*% base^(n:1 - 1)) } strings2int(c(100,012,213,453)) [1] 100 12 213 453 strings2int(c(100,12,213,453)) Error: length(n) == 1 is not TRUE Petr. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
?substr ## for extracting fixed width substrings (along with some apply functions) -- Bert On Sun, Jan 22, 2012 at 11:31 AM, Sam Steingold s...@gnu.org wrote: * Bert Gunter thagre.ore...@trar.pbz [2012-01-20 11:06:31 -0800]: On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote: then I need to split the two strings by 6/8 characters -- how? This makes no sense to me. strsplit takes care of this. I want to convert c(abcd,de,fghijk) [1] abcd de fghijk to [1] ab cd de fg hi jk i.e., split strings into substrings of a given length (2 in the above example, 9 in my real problem). actually, better yet, from data.frame(id=1:3,data=c(abcd,de,fghijk)) id data 1 1 abcd 2 2 de 3 3 fghijk to id data 1 1 ab 2 1 cd 3 2 de 4 3 fg 5 3 hi 6 3 jk -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://camera.org http://honestreporting.com http://memri.org http://truepeace.org http://www.PetitionOnline.com/tap12009/ http://ffii.org OK, so you're a Ph.D. Just don't touch anything. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract fixed width fields from a string
Hi, I have a data frame with one column containing string of the form ABC...|XYZ... where ABC etc are fields of 6 alphanumeric characters each and XYZ etc are fields of 8 alphanumeric characters each; | is a mandatory separator; I do not know in advance how many fields of each kind will each row contain. I need to extract these fields from the string. === How do I do that? first I need to split the string in 2 on '|' - how? then I need to split the two strings by 6/8 characters -- how? then I need to convert each 6/8 character string into an integer base 36 or 64 (depending on the field) - how? === What do I do with them once I extract them? First thing I want to do is to have a count table of them. Then I thought of adding an extra column for each field value and putting 0/1 there, e.g., frame 1,AB 2,BCD will turn into 1,1,1,0,0 2,0,1,1,1 however this would work only if the number of different field values is manageable. What do people do? Can I have a columns of sets in data frame? Does R support the set data type? Thanks! PS. thanks to Sarah Goslee who answered my previous question in so much detail! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://camera.org http://openvotingconsortium.org http://iris.org.il http://mideasttruth.com http://memri.org http://honestreporting.com Don't take life too seriously, you'll never get out of it alive! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
Reproducible example, please. This doesn't make a whole lot of sense otherwise. On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote: Hi, I have a data frame with one column containing string of the form ABC...|XYZ... where ABC etc are fields of 6 alphanumeric characters each and XYZ etc are fields of 8 alphanumeric characters each; | is a mandatory separator; I do not know in advance how many fields of each kind will each row contain. I need to extract these fields from the string. This is already a data frame, so you don't need to import it into R, just process it? === How do I do that? first I need to split the string in 2 on '|' - how? strsplit() then I need to split the two strings by 6/8 characters -- how? substring() perhaps then I need to convert each 6/8 character string into an integer base 36 or 64 (depending on the field) - how? base 36? Really? How are you representing that? Somehow I think you mean something other than what you said. Either way, please clarify. === What do I do with them once I extract them? I don't know. Save them as a list, most likely. First thing I want to do is to have a count table of them. Then I thought of adding an extra column for each field value and putting 0/1 there, e.g., frame 1,AB 2,BCD I thought we had integers at this point? will turn into 1,1,1,0,0 2,0,1,1,1 however this would work only if the number of different field values is manageable. But we have no idea, because you haven't told us. What do people do? Can I have a columns of sets in data frame? Does R support the set data type? factor() seems to be what you're looking for. PS. thanks to Sarah Goslee who answered my previous question in so much detail! You're welcome, but you'd be even more welcome if you'd listened to the parts of my reply about reproducible examples, clear problem statements, and reading the posting guide. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
Sam: On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold s...@gnu.org wrote: Hi, I have a data frame with one column containing string of the form ABC...|XYZ... where ABC etc are fields of 6 alphanumeric characters each and XYZ etc are fields of 8 alphanumeric characters each; | is a mandatory separator; I do not know in advance how many fields of each kind will each row contain. I need to extract these fields from the string. === How do I do that? first I need to split the string in 2 on '|' - how? ?strsplit strsplit(thecolumn, |,fixed=TRUE) then I need to split the two strings by 6/8 characters -- how? This makes no sense to me. strsplit takes care of this. then I need to convert each 6/8 character string into an integer base 36 or 64 (depending on the field) - how? No clue. Depends on the encoding AFAICS. -- Bert === What do I do with them once I extract them? First thing I want to do is to have a count table of them. Then I thought of adding an extra column for each field value and putting 0/1 there, e.g., frame 1,AB 2,BCD will turn into 1,1,1,0,0 2,0,1,1,1 however this would work only if the number of different field values is manageable. What do people do? Can I have a columns of sets in data frame? Does R support the set data type? Thanks! PS. thanks to Sarah Goslee who answered my previous question in so much detail! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://camera.org http://openvotingconsortium.org http://iris.org.il http://mideasttruth.com http://memri.org http://honestreporting.com Don't take life too seriously, you'll never get out of it alive! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote: then I need to convert each 6/8 character string into an integer base 36 or 64 (depending on the field) - how? base 36? 10 decimal digits + 26 english characters = 36. ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36 (case insensitive). So, how do I convert the above long word to a bignum? actually, my numbers will fit into int64, no bignum support is necessary. thanks. -- Sam Steingold http://sds.podval.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote: Reproducible example, please. This doesn't make a whole lot of sense otherwise. here is the string: 1288915200|070400905a0A118 I want the following data extracted from it: 1. the decimal number before |: 1288915200 2. the string after | split into 3 parts, each of length 9 bytes, and then split into 3 more parts: id: the first 6 bytes, int, base 36; count: the next 2 bytes, int, base 10; offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_) i.e., the above line is: id=7, count=4, days=0 id=9; count=5; offset=10 id=10; count=11; offset=8 thanks. On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote: Hi, I have a data frame with one column containing string of the form ABC...|XYZ... where ABC etc are fields of 6 alphanumeric characters each and XYZ etc are fields of 8 alphanumeric characters each; | is a mandatory separator; I do not know in advance how many fields of each kind will each row contain. I need to extract these fields from the string. This is already a data frame, so you don't need to import it into R, just process it? yes. I don't know. Save them as a list, most likely. can a column contain lists? First thing I want to do is to have a count table of them. Then I thought of adding an extra column for each field value and putting 0/1 there, e.g., frame 1,AB 2,BCD I thought we had integers at this point? yes, A..D are placeholders for integers What do people do? Can I have a columns of sets in data frame? Does R support the set data type? factor() seems to be what you're looking for. no, a column of factors will contain a single factor item in each row. e.g.: 1 A 2 B 3 A 4 C I want each row to contain a set of factor items: 1 AB 2 A 3 C 4 void -- Sam Steingold http://sds.podval.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
Here part of it. This is the conversion of base 36 to numeric that is case insensitive. This makes use of mapping the alphabetics to characters that start just after '9' and then doing the conversion. You can extend it to base 64 using the same approach. base36ToInteger - function (Str) + { + common - chartr( + abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ # input + , :;=?@ABCDEFGHIJKLMNOPQRS:;=?@ABCDEFGHIJKLMNOPQRS # 'magic' translation + , Str + ) + x - as.numeric(charToRaw(common)) - 48 + sum(x * 36 ^ rev(seq(length(x)) - 1)) + } base36ToInteger('1') [1] 1 base36ToInteger('12') [1] 38 base36ToInteger('123') [1] 1371 base36ToInteger('1234') [1] 49360 base36ToInteger('12345') [1] 1776965 base36ToInteger('123456') [1] 63970746 On Fri, Jan 20, 2012 at 3:25 PM, Sam Steingold s...@gnu.org wrote: On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote: Reproducible example, please. This doesn't make a whole lot of sense otherwise. here is the string: 1288915200|070400905a0A118 I want the following data extracted from it: 1. the decimal number before |: 1288915200 2. the string after | split into 3 parts, each of length 9 bytes, and then split into 3 more parts: id: the first 6 bytes, int, base 36; count: the next 2 bytes, int, base 10; offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_) i.e., the above line is: id=7, count=4, days=0 id=9; count=5; offset=10 id=10; count=11; offset=8 thanks. On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold s...@gnu.org wrote: Hi, I have a data frame with one column containing string of the form ABC...|XYZ... where ABC etc are fields of 6 alphanumeric characters each and XYZ etc are fields of 8 alphanumeric characters each; | is a mandatory separator; I do not know in advance how many fields of each kind will each row contain. I need to extract these fields from the string. This is already a data frame, so you don't need to import it into R, just process it? yes. I don't know. Save them as a list, most likely. can a column contain lists? First thing I want to do is to have a count table of them. Then I thought of adding an extra column for each field value and putting 0/1 there, e.g., frame 1,AB 2,BCD I thought we had integers at this point? yes, A..D are placeholders for integers What do people do? Can I have a columns of sets in data frame? Does R support the set data type? factor() seems to be what you're looking for. no, a column of factors will contain a single factor item in each row. e.g.: 1 A 2 B 3 A 4 C I want each row to contain a set of factor items: 1 AB 2 A 3 C 4 void -- Sam Steingold http://sds.podval.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract fixed width fields from a string
On Fri, Jan 20, 2012 at 03:14:21PM -0500, Sam Steingold wrote: On Fri, Jan 20, 2012 at 14:05, Sarah Goslee sarah.gos...@gmail.com wrote: then I need to convert each 6/8 character string into an integer base 36 or 64 (depending on the field) - how? base 36? 10 decimal digits + 26 english characters = 36. ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36 (case insensitive). So, how do I convert the above long word to a bignum? Hi. Try the following. x - tolower(ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36) x - strsplit(x, )[[1]] digits - 0:35 names(digits) - c(0:9, letters) y - digits[x] # solution using gmp package library(gmp) b - as.bigz(36) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 # solution using Rmpfr package library(Rmpfr) b - mpfr(36, precBits=500) sum(y * b^(length(y):1 - 1)) [1] 7045519072280024341066246294410591724807773749367607882253153084991978813070206061584038994 actually, my numbers will fit into int64, no bignum support is necessary. The default R numeric data type is double precision, which represents integers up to 53 bits, so the largest exactly representable integer is 2^53. The integer type is 32 bits. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.