Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jon Graham
Hello,

I am running a 32-bit linux version 2.6.27.24. My original data set was
copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am
trying to load balance the data on a 32-bit cluster.

Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
release?

Thanks,
Jon
On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
  Reached an EOL or something bizzare occured. Reading from: /192.168.2.13
  BufferSizeRemaining: 16

 This one is harmless

  java.io.IOException: Value too large for defined data type
  at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
  at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
  at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
  at
 org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
  at
 org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
  Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)

 This one is killing you.

 Are you on windows?  If so
 https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
 That's in both 0.5.1 and 0.6 beta.

 -Jonathan



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jonathan Ellis
Doing some googling, this is a different JRE bug than the on addressed
by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145.
It is marked fixed in JDK 6u18, so try upgrading to that.

-Jonathan

On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote:
 Hello,

 I am running a 32-bit linux version 2.6.27.24. My original data set was
 copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am
 trying to load balance the data on a 32-bit cluster.

 Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
 release?

 Thanks,
 Jon
 On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
  Reached an EOL or something bizzare occured. Reading from: /192.168.2.13
  BufferSizeRemaining: 16

 This one is harmless

  java.io.IOException: Value too large for defined data type
      at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
      at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
      at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
      at
  org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
      at
  org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
  Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
  Source)
      at java.lang.Thread.run(Unknown Source)

 This one is killing you.

 Are you on windows?  If so
 https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
 That's in both 0.5.1 and 0.6 beta.

 -Jonathan




Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jon Graham
Thanks Jonathan,

My 32-bit java version is at: 1.6.0_13-b03. I'll try a java upgrade.
This tracks well with the exact MaxInt -tmp- Data file size

Jon

On Tue, Mar 2, 2010 at 9:15 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Doing some googling, this is a different JRE bug than the on addressed
 by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145.
 It is marked fixed in JDK 6u18, so try upgrading to that.

 -Jonathan

 On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote:
  Hello,
 
  I am running a 32-bit linux version 2.6.27.24. My original data set was
  copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I
 am
  trying to load balance the data on a 32-bit cluster.
 
  Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
  release?
 
  Thanks,
  Jon
  On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
   Reached an EOL or something bizzare occured. Reading from: /
 192.168.2.13
   BufferSizeRemaining: 16
 
  This one is harmless
 
   java.io.IOException: Value too large for defined data type
   at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
   at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
   at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
   at
   org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
   at
   org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
   Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
   Source)
   at java.lang.Thread.run(Unknown Source)
 
  This one is killing you.
 
  Are you on windows?  If so
  https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
  That's in both 0.5.1 and 0.6 beta.
 
  -Jonathan
 
 



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jon Graham
Hello Everyone,

Jonathan,  Thanks for your advice :-)

I have started a loadbalance operation on a busy cassandra node.

The http://wiki.apache.org/cassandra/Operations web page indicates that
nodetool streams can
be used to monitor the status of the load balancing operation.

I can't seem to find the nodetool command for cassandra 0.5.0.

Is this a separate package/tool?

Thanks,
Jon


On Wed, Feb 24, 2010 at 8:17 PM, Jonathan Ellis jbel...@gmail.com wrote:

 nodeprobe loadbalance and/or nodeprobe move

 http://wiki.apache.org/cassandra/Operations

  On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote:
  Hello,
 
  I have 6 node Cassandra 0.5.0 cluster
  using org.apache.cassandra.dht.OrderPreservingPartitioner with
 replication
  factor 3.
 
  I mistakenly set my tokens to the wrong values, and have all the data
 being
  stored on the first node (with replicas on the seconds and third nodes)
 
  Does Cassandra have any tools to reset the token values and re-distribute
  the data?
 
  Thanks for your help,
  Jon



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
nodetool is the 0.6 replacement for nodeprobe.  the stream info is new
in that version.

(0.6 beta release is linked from
http://wiki.apache.org/cassandra/GettingStarted)

-Jonathan

On Mon, Mar 1, 2010 at 12:40 PM, Jon Graham sjclou...@gmail.com wrote:
 Hello Everyone,

 Jonathan,  Thanks for your advice :-)

 I have started a loadbalance operation on a busy cassandra node.

 The http://wiki.apache.org/cassandra/Operations web page indicates that
 nodetool streams can
 be used to monitor the status of the load balancing operation.

 I can't seem to find the nodetool command for cassandra 0.5.0.

 Is this a separate package/tool?

 Thanks,
 Jon


 On Wed, Feb 24, 2010 at 8:17 PM, Jonathan Ellis jbel...@gmail.com wrote:

 nodeprobe loadbalance and/or nodeprobe move

 http://wiki.apache.org/cassandra/Operations

 On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote:
  Hello,
 
  I have 6 node Cassandra 0.5.0 cluster
  using org.apache.cassandra.dht.OrderPreservingPartitioner with
  replication
  factor 3.
 
  I mistakenly set my tokens to the wrong values, and have all the data
  being
  stored on the first node (with replicas on the seconds and third nodes)
 
  Does Cassandra have any tools to reset the token values and
  re-distribute
  the data?
 
  Thanks for your help,
  Jon




Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jon Graham
Jonathan, Thanks for the quick reply.

After starting a loadbalance operation for about 30 minutes,  I can see 3
ColumnFamily-tmp-Data, Filter and Index files
on a lightly loaded node. The Data file has a size of 2,147,483,647 (max
signed int) on the node being loaded. I hope I didn't run out of Int space
during the load balance operation.

The file sizes don't seem to be changing in the column family folder of the
newly loaded node. The system/LocationInfo files haven't changed in 20
minutes. The 3 ColumnFamily files still have the -tmp- name.

Can I tell if the load balancing operaion is still running ok or if it
has terminated?

Is there a rough computation to determine how long the process should take?

If the load balancing is successful, will the cluster ring
information reflect the load balanacing changes?

Thanks,
Jon
On Mon, Mar 1, 2010 at 10:47 AM, Jonathan Ellis jbel...@gmail.com wrote:

 nodetool is the 0.6 replacement for nodeprobe.  the stream info is new
 in that version.

 (0.6 beta release is linked from
 http://wiki.apache.org/cassandra/GettingStarted)

 -Jonathan

 On Mon, Mar 1, 2010 at 12:40 PM, Jon Graham sjclou...@gmail.com wrote:
  Hello Everyone,
 
  Jonathan,  Thanks for your advice :-)
 
  I have started a loadbalance operation on a busy cassandra node.
 
  The http://wiki.apache.org/cassandra/Operations web page indicates that
  nodetool streams can
  be used to monitor the status of the load balancing operation.
 
  I can't seem to find the nodetool command for cassandra 0.5.0.
 
  Is this a separate package/tool?
 
  Thanks,
  Jon
 
 
  On Wed, Feb 24, 2010 at 8:17 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  nodeprobe loadbalance and/or nodeprobe move
 
  http://wiki.apache.org/cassandra/Operations
 
  On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com
 wrote:
   Hello,
  
   I have 6 node Cassandra 0.5.0 cluster
   using org.apache.cassandra.dht.OrderPreservingPartitioner with
   replication
   factor 3.
  
   I mistakenly set my tokens to the wrong values, and have all the data
   being
   stored on the first node (with replicas on the seconds and third
 nodes)
  
   Does Cassandra have any tools to reset the token values and
   re-distribute
   the data?
  
   Thanks for your help,
   Jon
 
 



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jon Graham
Thanks Jonathan.

It seems like the load balance operation isn't moving. I haven't seen any
data file time changes in 2 hours and no location file time
changes in over an hour.

I can see a tcp port # 7000 opened on the node where I ran the loadbalance
command. It is connected to
port 39033 on the node receiving the data. The CPU usage on both systems is
very low. There are about 10
million records on the node where the load balance command was issued.

My six node Cassandra ring consists of tokens for nodes 1-6 of:  0
(ascii 0x30)  6  B  H  O (the letter O)  T

The load balance target node initially had a token of 'H' (using ordered
partitioning). The source node has a key of 0 (ascii 0x30). Most of the data
on the source node has keys starting with '/'. Slash falls between tokens T
and  0 in my ring so most of the data landed on the node with token 0 with
replicas on the next 2 nodes. My token space is badly divided for the data I
have already inserted.

Does the initial token value of the load balance target node selected by
Cassandra need to be cleared or set to a specific value before hand to
accomodate the load balance data transfer?

Would I have better luck decommissioning nodes 4,5,6 and trying to
bootstrapping these nodes one at a time
with better initial token values?

Would the existing data on nodes 1,2,3 be moved to the boot strapped nodes?

I am looking for a good way to move/split/re-balance data from nodes 1,2,3
to nodes 4, 5, 6 while achiving a better token space distribution.

Thanks for your help,
Jon


On Mon, Mar 1, 2010 at 11:55 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 1:44 PM, Jon Graham sjclou...@gmail.com wrote:
  Can I tell if the load balancing operaion is still running ok or if it
  has terminated?
 
  Is there a rough computation to determine how long the process should
 take?

 Not really, although you can guess from cpu/io usage.  This is much
 improved in 0.6.

  If the load balancing is successful, will the cluster ring
  information reflect the load balanacing changes?

 Yes.



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham sjclou...@gmail.com wrote:
 Thanks Jonathan.

 It seems like the load balance operation isn't moving. I haven't seen any
 data file time changes in 2 hours and no location file time
 changes in over an hour.

 I can see a tcp port # 7000 opened on the node where I ran the loadbalance
 command. It is connected to
 port 39033 on the node receiving the data. The CPU usage on both systems is
 very low. There are about 10
 million records on the node where the load balance command was issued.

Did you check logs for exceptions?

 My six node Cassandra ring consists of tokens for nodes 1-6 of:  0
 (ascii 0x30)  6  B  H  O (the letter O)  T

 The load balance target node initially had a token of 'H' (using ordered
 partitioning). The source node has a key of 0 (ascii 0x30). Most of the data
 on the source node has keys starting with '/'. Slash falls between tokens T
 and  0 in my ring so most of the data landed on the node with token 0 with
 replicas on the next 2 nodes. My token space is badly divided for the data I
 have already inserted.

 Does the initial token value of the load balance target node selected by
 Cassandra need to be cleared or set to a specific value before hand to
 accomodate the load balance data transfer?

No.

 Would I have better luck decommissioning nodes 4,5,6 and trying to
 bootstrapping these nodes one at a time
 with better initial token values?

LoadBalance is basically sugar for decommission + bootstrap, so no.

 I am looking for a good way to move/split/re-balance data from nodes 1,2,3
 to nodes 4, 5, 6 while achiving a better token space distribution.

I would upgrade to the 0.6 beta and try loadbalance again.

-Jonathan


Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jon Graham
Hello,

I did find these exceptions. I issued the loadbalance command on node
192.168.2.10.

INFO [MESSAGING-SERVICE-POOL:3] 2010-03-01 10:34:40,764 TcpConnection.java
(line 315) Closing errored connection
java.nio.channels.SocketChannel[connected local=/192.168.2.10:55973 remote=/
192.168.2.13:7000]
 WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-03-01 10:34:40,964
MessagingService.java (line 555) Running on default stage - beware
 WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java
(line 484) Problem reading from socket connected to :
java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/
192.168.2.13:7000]
 WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java
(line 485) Exception was generated at : 03/01/2010 10:34:40 on thread
MESSAGING-SERVICE-POOL:1
Reached an EOL or something bizzare occured. Reading from:
/192.168.2.13BufferSizeRemaining: 16
java.io.IOException: Reached an EOL or something bizzare occured. Reading
from: /192.168.2.13 BufferSizeRemaining: 16
at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
at
org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
 INFO [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java
(line 315) Closing errored connection
java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/
192.168.2.13:7000]
 INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,171 TcpConnection.java
(line 315) Closing errored connection
java.nio.channels.SocketChannel[connected local=/192.168.2.10:56728 remote=/
192.168.2.13:7000]
 INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,221 FileStreamTask.java
(line 79) Exception was generated at : 03/01/2010 10:35:23 on thread
MESSAGE-STREAMING-POOL:1
Value too large for defined data type
java.io.IOException: Value too large for defined data type
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

I can certainly upgrade to 0.6 and try a loadbalance there, do you
still think it is advisable?

All of my key/value entries are well under 1024 bytes but I have millions of
them.

 Do you think I have a data corruption problem?

Thanks,
Jon
On Mon, Mar 1, 2010 at 2:54 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham sjclou...@gmail.com wrote:
  Thanks Jonathan.
 
  It seems like the load balance operation isn't moving. I haven't seen any
  data file time changes in 2 hours and no location file time
  changes in over an hour.
 
  I can see a tcp port # 7000 opened on the node where I ran the
 loadbalance
  command. It is connected to
  port 39033 on the node receiving the data. The CPU usage on both systems
 is
  very low. There are about 10
  million records on the node where the load balance command was issued.

 Did you check logs for exceptions?

  My six node Cassandra ring consists of tokens for nodes 1-6 of:  0
  (ascii 0x30)  6  B  H  O (the letter O)  T
 
  The load balance target node initially had a token of 'H' (using ordered
  partitioning). The source node has a key of 0 (ascii 0x30). Most of the
 data
  on the source node has keys starting with '/'. Slash falls between tokens
 T
  and  0 in my ring so most of the data landed on the node with token 0
 with
  replicas on the next 2 nodes. My token space is badly divided for the
 data I
  have already inserted.
 
  Does the initial token value of the load balance target node selected by
  Cassandra need to be cleared or set to a specific value before hand to
  accomodate the load balance data transfer?

 No.

  Would I have better luck decommissioning nodes 4,5,6 and trying to
  bootstrapping these nodes one at a time
  with better initial token values?

 LoadBalance is basically sugar for decommission + bootstrap, so no.

  I am looking for a good way to move/split/re-balance data from nodes
 1,2,3
  to nodes 4, 5, 6 while achiving a better token space distribution.

 I would upgrade to the 0.6 beta and try loadbalance again.

 -Jonathan



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
 Reached an EOL or something bizzare occured. Reading from: /192.168.2.13
 BufferSizeRemaining: 16

This one is harmless

 java.io.IOException: Value too large for defined data type
     at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
     at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
     at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
     at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
     at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
     at java.lang.Thread.run(Unknown Source)

This one is killing you.

Are you on windows?  If so
https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
That's in both 0.5.1 and 0.6 beta.

-Jonathan


Re: Adjusting Token Spaces and Rebalancing Data

2010-02-24 Thread Jonathan Ellis
nodeprobe loadbalance and/or nodeprobe move

http://wiki.apache.org/cassandra/Operations

On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote:
 Hello,

 I have 6 node Cassandra 0.5.0 cluster
 using org.apache.cassandra.dht.OrderPreservingPartitioner with replication
 factor 3.

 I mistakenly set my tokens to the wrong values, and have all the data being
 stored on the first node (with replicas on the seconds and third nodes)

 Does Cassandra have any tools to reset the token values and re-distribute
 the data?

 Thanks for your help,
 Jon