How long is it taking? Can you send me the code that you are using.
Another technique is to recode you characters into numbers and store
them as integers. You can then sample the values and reconstruct the
output. Here is a faster way:
# create some test data -- might be read in the readLines
# use 'raw' class for the data.
sdata - sapply(1:10, function(x){
charToRaw(paste(sample(LETTERS, 50, TRUE), collapse=))#
encode the character as a number
})
# now create 10 sample of size 20 and write in files
for (i in 1:10){
x - sample(sdata, 10, TRUE)
# convert back to characters
writeLines(rawToChar(x), con=paste(file, i, sep=''))
}
On Sun, Mar 30, 2008 at 6:06 PM, Suraaga Kulkarni
[EMAIL PROTECTED] wrote:
Jim,
Thanks very much. I am very new to R and am trying to understand your code.
It works perfectly on your sample data of course. I tried your code on my
data. While it works, it takes too much time to generate each replicate.
At present I'm outputting the replicates with only 2000 resampled
characters. I actually need to resample something like 1-5 million
characters. I work with the human genome, and need to generate 500
bootstrap replicates of a scaled down version (about 2%) of each chromosome
by means of resampling with replacement.
Sorry about the cryptic code but I thought my initial description of the
problem explained it. In any case, your guess was correct.
Let me see if I can rework your code to suit my purposes. In the meanwhile,
if you have any other suggestions, I'll be happy to hear them.
Thanks again for the prompt response.
S.
On Sun, Mar 30, 2008 at 6:15 PM, jim holtman [EMAIL PROTECTED] wrote:
Here is one way of doing it. I would suggest that you read in the
data with readLines and then combine into one single string so that
you can use substring on it. Since you did not provide provide
commented, minimal, self-contained, reproducible code, I will take a
guess at that your data looks like:
# create some test data -- might be read in the readLines
sdata - sapply(1:10, function(x){ # 10 lines of strings with 50
characters
paste(sample(LETTERS, 50, TRUE), collapse='')
})
# put into one large string so you can do substring on it
sdata - paste(sdata, collapse='')
# now create 10 sample of size 20 and write in files (file1, file2, ...
file10)
for (i in 1:10){
x - sample(nchar(sdata), 20)
writeLines(paste(substring(sdata, x, x), collapse=''),
con=paste(file, i, sep=''))
}
On Sun, Mar 30, 2008 at 3:41 PM, Suraaga Kulkarni
[EMAIL PROTECTED] wrote:
Hi,
I need to resample characters from a dataset that consists of an
extremely
long string that is written over hundreds of thousands of lines, each of
length 50 characters. I am currently doing this by first inserting a
space
after each character in the dataset and then using the following
commands:
y - as.matrix(read.table(data.txt), stringsAsFactors=FALSE)
bstrap - sample(length(y), 10, TRUE)
write(y[bstrap], file=Rep1.txt, ncolumns=50, append=FALSE)
bstrap - sample(length(y), 10, TRUE)
write(y[bstrap], file=Rep2.txt, ncolumns=50, append=FALSE)
bstrap - sample(length(y), 10, TRUE)
.
.
.
and so on for 500 reps.
I think there should be a better way of doing this. My specific
questions:
1. Is there a way to avoid inserting spaces between the characters
before
calling the sample command (because I don't want spaces between the
resampled characters in the output either; see number 2 below)?
2. If I have no choice but to insert the spaces in my data before
resampling, is there a way to output the resampled data without spaces,
but
simply as 50-character long strings one below the other)? I tried
inserting
the following command: strip.white=TRUE in the write command line, but
it
gave me an error as it did not understand the command.
3. Finally, since I have to get 500 such resampled reps from each
dataset
(and there are over 20 such huge datasets) is there a way around having
to
write a separate write command for each rep?
Any suggestions will be greatly appreciated.
Thanks,
S.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide