Task's map reading more record than CFIF's inputSplitSize

2011-09-07 Thread Mck
Cassandra-0.8.4 w/ ByteOrderedPartitioner

CFIF's inputSplitSize=196608

3 map tasks (from 4013) is still running after read 25 million rows.

Can this be a bug in StorageService.getSplits(..) ?

With this data I've had general headache with using tokens that are
longer than usual (and trying to move nodes around to balance the ring).

 nodetool ring gives
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])


~mck



Re: Task's map reading more record than CFIF's inputSplitSize

2011-09-07 Thread Jonathan Ellis
getSplits looks pretty foolproof to me but I guess we'd need to add
more debug logging to rule out a bug there for sure.

I guess the main alternative would be a bug in the recordreader paging.

On Wed, Sep 7, 2011 at 6:35 AM, Mck m...@apache.org wrote:
 Cassandra-0.8.4 w/ ByteOrderedPartitioner

 CFIF's inputSplitSize=196608

 3 map tasks (from 4013) is still running after read 25 million rows.

 Can this be a bug in StorageService.getSplits(..) ?

 With this data I've had general headache with using tokens that are
 longer than usual (and trying to move nodes around to balance the ring).

  nodetool ring gives
 Address         Status State   Load            Owns    Token
                                                       
 Token(bytes[76118303760208547436305468318170713656])
 152.90.241.22   Up     Normal  270.46 GB       33.33%  
 Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
 152.90.241.24   Up     Normal  247.89 GB       33.33%  
 Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
 152.90.241.23   Up     Normal  1.1 TB          33.33%  
 Token(bytes[76118303760208547436305468318170713656])


 ~mck





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Task's map reading more record than CFIF's inputSplitSize

2011-09-07 Thread Mick Semb Wever

  3 map tasks (from 4013) is still running after read 25 million rows.
  Can this be a bug in StorageService.getSplits(..) ? 

 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 
 I guess the main alternative would be a bug in the recordreader paging.

Entered https://issues.apache.org/jira/browse/CASSANDRA-3150

~mck

-- 
“People only see what they're prepared to see.” - Ralph Waldo Emerson 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |


signature.asc
Description: This is a digitally signed message part