Agreed… but I’d like to parallelize it… Eventually I’ll just have too much data to do it on one server… plus, I need suspend/resume and this way if I’m doing like 10MB at a time I’ll be able to suspend / resume as well as track progress.
On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan <[email protected]> wrote: > Use the java driver and paging feature: > http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int) > > 1) Do you "SELECT * FROM" without any selection > 2) Set fetchSize to a sensitive value > 3) Execute the query and get an iterator from the ResultSet > 4) Iterate > > > > On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton <[email protected]> wrote: > >> I need a way to do a full table scan across all of our data. >> >> Can’t I just use token() for this? >> >> This way I could split up our entire keyspace into say 1024 chunks, and >> then have one activemq task work with range 0, then range 1, etc… that way >> I can easily just map() my whole table. >> >> and since it’s token() I should (generally) read a contiguous range from >> a given table. >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> >> > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>
