> On Jun 10, 2015, at 2:21 PM, Marc Schwartz <marc_schwa...@me.com> wrote: > > >> On Jun 10, 2015, at 7:39 AM, Liz Hare <dogg...@earthlink.net> wrote: >> >> Hi R-Experts, >> >> I have a data.frame like this: >> >>> head(map) >> chr snp poscm posbp dist >> 1 1 M1 2.99043 3249189 NA >> 2 1 M2 3.06457 3273096 0.07414 >> 3 1 M3 3.17018 3307151 0.10561 >> 4 1 M4 3.20892 3319643 0.03874 >> 5 1 M5 3.28120 3342947 0.07228 >> 6 1 M6 3.29624 3347798 0.01504 >> >> I need to split this into chunks of 250 rows (there will usually be a last >> chunk with < 250 rows). >> >> If I only had to extract one 250-line chunk, it would be easy: >> >> map1 <- map[1:250, ] >> >> or using subset(). >> >> I tried to make it a loop iterating through num and using beg and nd for >> starting and ending indices, but I couldn’t figure out how to reference all >> the variables I needed in this: >> >>> chunks >> beg nd let num >> 1 1 250 a 1 >> 2 251 500 b 2 >> 3 501 750 c 3 >> 4 751 1000 d 4 >> 5 1001 1250 e 5 >> 6 1251 1500 f 6 >> 7 1501 1750 g 7 >> 8 1751 2000 h 8 >> 9 2001 2250 i 9 >> 10 2251 2500 j 10 >> … >> >> Remembering that loops are not always the best answer in R, I looked at >> other options like split, following this example but not being able to adapt >> it from a vector to a data.frame version >> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r >> <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> >> (Yes, I’ve reviewed the language documentation). I checked out ddply and >> data.table, but couldn’t find a way to use them with index positions instead >> of column values. >> >> Thanks, >> Liz > > > Hi, > > map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)
Shoot, typo in the above, it should be ‘map’, not ‘x’: map.split <- split(map, (as.numeric(rownames(map)) - 1) %/% 250) Marc > > That will create a list of data frames comprised of subsets of ‘map’, each of > which will have 250 records except, of course, for the last one. > > Essentially, you are creating a grouping variable based upon the numeric row > names modulo the length of the chunks that you want. For example, using the > built-in ‘iris’ dataset, which has 150 rows: > >> (as.numeric(rownames(iris)) - 1) %/% 50 > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50) > >> length(iris.split) > [1] 3 > >> lapply(iris.split, nrow) > $`0` > [1] 50 > > $`1` > [1] 50 > > $`2` > [1] 50 > > >> lapply(iris.split, head) > $`0` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 1 5.1 3.5 1.4 0.2 setosa > 2 4.9 3.0 1.4 0.2 setosa > 3 4.7 3.2 1.3 0.2 setosa > 4 4.6 3.1 1.5 0.2 setosa > 5 5.0 3.6 1.4 0.2 setosa > 6 5.4 3.9 1.7 0.4 setosa > > $`1` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 51 7.0 3.2 4.7 1.4 versicolor > 52 6.4 3.2 4.5 1.5 versicolor > 53 6.9 3.1 4.9 1.5 versicolor > 54 5.5 2.3 4.0 1.3 versicolor > 55 6.5 2.8 4.6 1.5 versicolor > 56 5.7 2.8 4.5 1.3 versicolor > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 101 6.3 3.3 6.0 2.5 virginica > 102 5.8 2.7 5.1 1.9 virginica > 103 7.1 3.0 5.9 2.1 virginica > 104 6.3 2.9 5.6 1.8 virginica > 105 6.5 3.0 5.8 2.2 virginica > 106 7.6 3.0 6.6 2.1 virginica > > > > Regards, > > Marc Schwartz > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.