Adding and removing node procedures

2014-06-10 Thread ng
I just wanted to verify the procedures to add and remove nodes in my
environment, please feel free to comments or advise.



I have 3 node cluster N1, N2, N3 with Vnode configured as (256) on each
node. All are in one data center.

1. Procedure to Change node hardware or replace to new node machines
(N1, N2 and N3) to (N11, N22 and N31)


nodetool -h node2 decommission
Bootstrap N21
nodetool repair
nodetool -h node1 decommission
Bootstrap N11
nodetool repair
nodetool -h node3 decommission
Bootstrap N31
nodetool repair

---
2. Procedure for changing 3 nodes cluster to 2 nodes cluster
(N1, N2 and N3)  to (N1, N3)

nodetool -h node2 decommission
Physically get rid of Node2
---
3. Procedure for adding new node

(N1, N2 and N3)  to (N1, N2, N3, N4)
Bootstrap N4
nodetool repair

---
4. Procedure to remove dead node/crashed node.
(node n2 unable to start)
(n1,n2, n3) to (n1,n3)

Shutdown N2 if possible
nodetool removenode xx_hostid_Of_N2_xx
nodetool repair

---
5. Procedure to remove dead node/crashed node and replace with N21.
(node n2 unable to start)
(n1,n2, n3) to (n1,n3, n21)

Shutdown N2 if possible
nodetool removenode xx_hostid_Of_N2_xx
Bootstrap N21
nodetool repair
---


Thanks in advance for pointing any mistake or advise.


Snapshot the data with 3 node and replicationfactor=3

2014-06-04 Thread ng
Is there any reason you would like to take snapshot of column family on
each node when cluster consists of 3 nodes with keyspace on replication
factor =3?


I am thinking of taking snapshot of CF on only one node.

For restore, I will follow below

1. drop and recreate the CF on node1
2. copy snapshotted files to node 1 data directory of CF
3. perform nodetool refresh on node 1



Any suggestions/advise?

ng


Re: Snapshot the data with 3 node and replicationfactor=3

2014-06-04 Thread ng
I am not worried about eventually consistent data. I just wanted to get
rough data in close proximate.
ng


On Wed, Jun 4, 2014 at 2:49 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 4, 2014 at 1:26 PM, ng pipeli...@gmail.com wrote:

 Is there any reason you would like to take snapshot of column family on
 each node when cluster consists of 3 nodes with keyspace on replication
 factor =3?


 Unless all read/write occurs with CL.ALL (which is an availability
 problem), there is a nonzero chance of any given write not being on any
 given node at any given time.

 =Rob




Cassandra snapshot

2014-06-02 Thread ng
I need to make sure that all the data in sstable before taking the snapshot.

I am thinking of
nodetool cleanup
nodetool repair
nodetool flush
nodetool snapshot

Am I missing anything else?

Thanks in advance for the responses/suggestions.

ng


Backup Solution

2014-05-14 Thread ng
I want to discuss the question asked by Rene last year again.


http://www.mail-archive.com/user%40cassandra.apache.org/msg28465.html

Is the following a good backup solution.
Create two data-centers:
- A live data-center with multiple nodes (commodity hardware) (6 nodes with
replication factor of 3). Clients
connect to this cluster with LOCAL_QUORUM.
- A backup data-center with 1 node (with fast SSDs). Clients do not connect
to this cluster. Cluster only used for creating and storing snapshots.
Advantages:
- No snapshots and bulk network I/O (transfer snapshots) needed on the live
cluster. Also no need to take snapshot on each node.
- Clients are not slowed down because writes to the backup data-center are
async.
- On the backup cluster snapshots are made on a regular basis. This again
does not affect the live cluster.
- The back-up cluster does not need to process client requests/reads, so we
need less machines for the backup cluster than the live cluster.
Are there any disadvantages with this approach?

I don't see any issue with it. It is backup solution...not replication
solution. Both DC can be on physically same location/network. Copy of the
snapshots can be placed to separate shared location on daily basis from
backup DC node.

I must be missing something..please advise.


Datacenter understanding question

2014-05-13 Thread ng
If I have configuration of two data center with one node each.
Replication factor is also 1.
Will these 2 nodes going to be mirrored/replicated?


Row_key from sstable2json to actual value of the key

2014-04-03 Thread ng
sstable2json tomcat-t5-ic-1-Data.db -e
gives me

0021
001f
0020


How do I convert this (hex) to actual value of column so I can do below

select * from tomcat.t5 where c1='concerted value';

Thanks in advance for the help.


Exporting column family data to csv

2014-04-02 Thread ng
I want to export all the data of particular column family to the text file
from Cassandra cluster.

I tried

copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

It gave me timeout error

I tried below in Cassandra.yaml

request_timeout_in_ms: 1000
read_request_timeout_in_ms: 1000
range_request_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 1000

I still have no luck. Any advise how to achieve this? I am NOT limited to
copy command.  What is the best way to achieve this? Thanks in advance for
the help.
ng


Re: Exporting column family data to csv

2014-04-02 Thread ng
Thanks for the reply.  Most of the solutions provided over web involves
some kind of 'where' clause in data extract and then export the next set
until done. I have column family with no time stamp and no other column I
can use to filter the data. One other solution provided was to use
pagination, but I could not find any example any where over web that
achieves this. This can not be that hard! I must be missing something.
Please advise.

On Wednesday, April 2, 2014, Viktor Jevdokimov viktor.jevdoki...@adform.com
wrote:


 http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E




 http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql



 Google for more.




Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: 
 viktor.jevdoki...@adform.comjavascript:_e(%7B%7D,'cvml','viktor.jevdoki...@adform.com');
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Experience Adform DNA http://vimeo.com/76421547
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* ng 
 [mailto:pipeli...@gmail.comjavascript:_e(%7B%7D,'cvml','pipeli...@gmail.com');]

 *Sent:* Wednesday, April 2, 2014 6:04 PM
 *To:* 
 user@cassandra.apache.orgjavascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');
 *Subject:* Exporting column family data to csv




 I want to export all the data of particular column family to the text file
 from Cassandra cluster.

 I tried

 copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

 It gave me timeout error

 I tried below in Cassandra.yaml

 request_timeout_in_ms: 1000
 read_request_timeout_in_ms: 1000
 range_request_timeout_in_ms: 1000
 truncate_request_timeout_in_ms: 1000

 I still have no luck. Any advise how to achieve this? I am NOT limited to
 copy command.  What is the best way to achieve this? Thanks in advance for
 the help.

 ng

inline: signature-best-employer-logo4823.pnginline: signature-logo29.png

Re: Intermittent long application pauses on nodes

2014-02-27 Thread Frank Ng
: [GC pause (G1 Evacuation
 Pause) (young)
 Desired survivor size 37748736 bytes, new threshold 15 (max 15)
 - age   1:   17213632 bytes,   17213632 total
 - age   2:   19391208 bytes,   36604840 total
 , 0.1664300 secs]
   [Parallel Time: 163.9 ms, GC Workers: 2]
  [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max:
 222346218.3, Diff: 0.0]
  [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff:
 1.7, Sum: 13.7]
  [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7,
 Sum: 42.6]
 [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22,
 Sum: 120]
  [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum:
 46.5]
  [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff:
 0.1, Sum: 224.6]
  [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum:
 0.1]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
 Sum: 0.1]
  [GC Worker Total (ms): Min: 163.8, Avg: 163.8, Max: 163.8, Diff:
 0.0, Sum: 327.6]
  [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max:
 222346382.1, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 2.1 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 1.1 ms]
  [Ref Enq: 0.0 ms]
  [Free CSet: 0.4 ms]
   [Eden: 524.0M(524.0M)-0.0B(476.0M) Survivors: 44.0M-68.0M Heap:
 3518.5M(8192.0M)-3018.5M(8192.0M)]
 Heap after GC invocations=4074 (full 1):
 garbage-first heap   total 8388608K, used 3090914K [0x0005f5c0,
 0x0007f5c0, 0x0007f5c0)
  region size 4096K, 17 young (69632K), 17 survivors (69632K)
 compacting perm gen  total 28672K, used 27428K [0x0007f5c0,
 0x0007f780, 0x0008)
   the space 28672K,  95% used [0x0007f5c0, 0x0007f76c9108,
 0x0007f76c9200, 0x0007f780)
 No shared spaces configured.
 }
 [Times: user=0.35 sys=0.00, real=27.58 secs]
 222346.219: G1IncCollectionPause [ 111  0
  0]  [ 0 0 0 0 27586]  0

 And the total thime for which application threads were stopped is
 27.58 seconds.

 CMS behaves in a similar manner. We thought it would be GC, waiting
 for mmaped files being read from disk (the thread cannot reach safepoint
 during this operation), but it doesn't explain the huge time.

 We'll try jhiccup to see if it provides any additional information.
 The test was done on mixed aws/openstack environment, openjdk 1.7.0_45,
 cassandra 1.2.11. Upgrading to 2.0.x is no option for us.

 regards,

 ondrej cernos


 On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng fnt...@gmail.com wrote:

 Sorry, I have not had a chance to file a JIRA ticket.  We have not
 been able to resolve the issue.  But since Joel mentioned that 
 upgrading to
 Cassandra 2.0.X solved it for them, we may need to upgrade.  We are
 currently on Java 1.7 and Cassandra 1.2.8



 On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.com
  wrote:

 You’re running 2.0.* in production?  May I ask what C* version and
 OS?  Any hardware details would be appreciated as well.  Thx!

 From: Joel Samuelsson samuelsson.j...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, February 13, 2014 at 11:39 AM

 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 We have had similar issues and upgrading C* to 2.0.x and Java to
 1.7 seems to have helped our issues.


 2014-02-13 Keith Wright kwri...@nanigans.com:

 Frank did you ever file a ticket for this issue or find the root
 cause?  I believe we are seeing the same issues when attempting to
 bootstrap.

 Thanks

 From: Robert Coli rc...@eventbrite.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Monday, February 3, 2014 at 6:10 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:


 It's possible that this is a JVM issue, but if so there may be
 some remedial action we can take anyway. There are some more flags we
 should add, but we can discuss that once you open a ticket. If you 
 could
 include the strange JMX error as well, that might be helpful.


 It would be appreciated if you could inform this thread of the
 JIRA ticket number, for the benefit of the community and google 
 searchers.
 :)

 =Rob












Re: Intermittent long application pauses on nodes

2014-02-27 Thread Frank Ng
We have swap disabled.  Can death by paging still happen?


On Thu, Feb 27, 2014 at 11:32 AM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 That sounds a lot like death by paging.


 On 27 February 2014 16:29, Frank Ng fnt...@gmail.com wrote:

 I just caught that a node was down based on running nodetool status on a
 different node.  I tried to ssh into the downed node at that time and it
 was very slow logging on.  Looking at the gc.log file, there was a ParNew
 that only took 0.09 secs.  Yet the overall application threads stop time is
 315 seconds (5 minutes).  Our cluster is handling alot of read requests.

 If there were network hiccups, would that cause a delay in the Cassandra
 process when it tries to get to a safepoint?  I assume Cassandra has
 threads running with lots of network activity and maybe taking a long time
 to reach a safepoint.

 thanks,
 Frank


 On Fri, Feb 21, 2014 at 4:24 AM, Joel Samuelsson 
 samuelsson.j...@gmail.com wrote:

 What happens if a ParNew is triggered while CMS is running? Will it wait
 for the CMS to finish? If so, that would be the eplanation of our long
 ParNew above.

 Regards,
 Joel


 2014-02-20 16:29 GMT+01:00 Joel Samuelsson samuelsson.j...@gmail.com:

 Hi Frank,

 We got a (quite) long GC pause today on 2.0.5:
  INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line
 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is
 4253024256
  INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line
 116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used;
 max is 4253024256

 Unfortunately it's a production cluster so I have no additional
 GC-logging enabled. This may be an indication that upgrading is not the
 (complete) solution.

 Regards,
 Joel


 2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith 
 belliottsm...@datastax.com:

 Hi Ondrej,

 It's possible you were hit by the problems in this thread before, but
 it looks potentially like you may have other issues. Of course it may be
 that on G1 you have one issue and CMS another, but 27s is extreme even for
 G1, so it seems unlikely. If you're hitting these pause times in CMS and
 you get some more output from the safepoint tracing, please do contribute
 as I would love to get to the bottom of that, however is it possible 
 you're
 experiencing paging activity? Have you made certain the VM memory is 
 locked
 (and preferably that paging is entirely disabled, as the bloom filters and
 other memory won't be locked, although that shouldn't cause pauses during
 GC)

 Note that mmapped file accesses and other native work shouldn't in
 anyway inhibit GC activity or other safepoint pause times, unless there's 
 a
 bug in the VM. These threads will simply enter a safepoint as they return
 to the VM execution context, and are considered safe for the duration they
 are outside.




 On 17 February 2014 12:30, Ondřej Černoš cern...@gmail.com wrote:

 Hi,

 we tried to switch to G1 because we observed this behaviour on CMS
 too (27 seconds pause in G1 is quite an advise not to use it). Pauses 
 with
 CMS were not easily traceable - JVM stopped even without stop-the-world
 pause scheduled (defragmentation, remarking). We thought the
 go-to-safepoint waiting time might have been involved (we saw waiting for
 safepoint resolution) - especially because access to mmpaped files is not
 preemptive, afaik, but it doesn't explain tens of seconds waiting times,
 even slow IO should read our sstables into memory in much less time. We
 switched to G1 out of desperation - and to try different code paths - not
 that we'd thought it was a great idea. So I think we were hit by the
 problem discussed in this thread, just the G1 report wasn't very clear,
 sorry.

 regards,
 ondrej



 On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Ondrej,

 It seems like your issue is much less difficult to diagnose: your
 collection times are long. At least, the pause you printed the time for 
 is
 all attributable to the G1 pause.

 Note that G1 has not generally performed well with Cassandra in our
 testing. There are a number of changes going in soon that may change 
 that,
 but for the time being it is advisable to stick with CMS. With tuning 
 you
 can no doubt bring your pauses down considerably.


 On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote:

 Hi all,

 we are seeing the same kind of long pauses in Cassandra. We tried
 to switch CMS to G1 without positive result. The stress test is read 
 heavy,
 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in
 latency on 99.99 percentil and higher, caused by threads being stopped 
 in
 JVM.

 The GC in G1 looks like this:

 {Heap before GC invocations=4073 (full 1):
 garbage-first heap   total 8388608K, used 3602914K
 [0x0005f5c0, 0x0007f5c0, 0x0007f5c0)
  region size 4096K, 142 young (581632K), 11 survivors (45056K

Re: Intermittent long application pauses on nodes

2014-02-14 Thread Frank Ng
Sorry, I have not had a chance to file a JIRA ticket.  We have not been
able to resolve the issue.  But since Joel mentioned that upgrading to
Cassandra 2.0.X solved it for them, we may need to upgrade.  We are
currently on Java 1.7 and Cassandra 1.2.8



On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.com wrote:

 You're running 2.0.* in production?  May I ask what C* version and OS?
  Any hardware details would be appreciated as well.  Thx!

 From: Joel Samuelsson samuelsson.j...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, February 13, 2014 at 11:39 AM

 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems
 to have helped our issues.


 2014-02-13 Keith Wright kwri...@nanigans.com:

 Frank did you ever file a ticket for this issue or find the root cause?
  I believe we are seeing the same issues when attempting to bootstrap.

 Thanks

 From: Robert Coli rc...@eventbrite.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Monday, February 3, 2014 at 6:10 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:


 It's possible that this is a JVM issue, but if so there may be some
 remedial action we can take anyway. There are some more flags we should
 add, but we can discuss that once you open a ticket. If you could include
 the strange JMX error as well, that might be helpful.


 It would be appreciated if you could inform this thread of the JIRA
 ticket number, for the benefit of the community and google searchers. :)

 =Rob





Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
All,

We've been having intermittent long application pauses (version 1.2.8) and
not sure if it's a cassandra bug.  During these pauses, there are dropped
messages in the cassandra log file along with the node seeing other nodes
as down.  We've turned on gc logging and the following is an example of a
long stopped or pause event in the gc.log file.

2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application
threads were stopped: 0.091450 seconds
2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application
threads were stopped: 51.8190260 seconds
2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application
threads were stopped: 0.005470 seconds

As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
pause.  There were no GC log events between those 2 log statements.  Since
there's no GC logs in between, something else must be causing the long stop
time to reach a safepoint.

Could there be a Cassandra thread that is taking a long time to reach a
safepoint and what is it trying to do? Along with the node seeing other
nodes as down in the cassandra log file, the StatusLogger shows 1599
Pending in ReadStage and 9 Pending in MutationStage.

There is mention of cassandra batch revoke bias locks as a possible cause
(not GC) via:
http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

We have JNA, no swap, and the cluster runs fine besides there intermittent
long pause that can cause a node to appear down to other nodes.  Any ideas
as the cause of the long pause above? It seems not related to GC.

thanks.


Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Thanks for the update.  Our logs indicated that there were 0 pending for
CompactionManager at that time.  Also, there were no nodetool repairs
running at that time.  The log statements above state that the application
had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
the safepoint.


On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up or
 repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible cause
 (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.





Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Benedict,
Thanks for the advice.  I've tried turning on PrintSafepointStatistics.
However, that info is only sent to the STDOUT console.  The cassandra
startup script closes the STDOUT when it finishes, so nothing is shown for
safepoint statistics once it's done starting up.  Do you know how to
startup cassandra and send all stdout to a log file and tell cassandra not
to close stdout?

Also, we have swap turned off as recommended.

thanks


On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 Frank,


 The same advice for investigating holds: add the VM flags 
 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1   (you 
 could put something above 1 there, to reduce the amount of logging, since a 
 pause of 52s will be pretty obvious even if aggregated with lots of other 
 safe points; the count is the number of safepoints to aggregate into one log 
 message)


 52s is a very extreme pause, and I would be surprised if revoke bias could 
 cause this. I wonder if the VM is swapping out.



 On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote:

 Thanks for the update.  Our logs indicated that there were 0 pending for
 CompactionManager at that time.  Also, there were no nodetool repairs
 running at that time.  The log statements above state that the application
 had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
 the safepoint.


 On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up
 or repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible
 cause (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.







Fat Client Commit Log

2012-06-22 Thread Frank Ng
Hi All,

We are using the Fat Client and notice that there are files written to the
commit log directory on the Fat Client.  Does anyone know what these files
are storing? Are these hinted handoff data?  The Fat Client has no files in
the data directory, as expected.

thanks


Re: user Digest of: get.23021

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation.  It seems that
the repair is flaky.  It either completes relatively fast in a TEST
environment (7 minutes) or gets stuck trying to receive a merkle tree from
a peer that is already sending it the merkle tree.

Only solution is to restart cassandra.  But, we that's not good.

On Thu, Apr 26, 2012 at 2:12 PM, user-h...@cassandra.apache.org wrote:


 user Digest of: get.23021

 Topics (messages 23021 through 23021)

 repair waiting for something
23021 by: Igor



 Return-Path: buzzt...@gmail.com
 Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 -
 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10
 +
 X-ASF-Spam-Status: No, hits=1.5 required=5.0
tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
 X-Spam-Check-By: apache.org
 Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates
 209.85.213.44 as permitted sender)
 Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com)
 (209.85.213.44)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03
 +
 Received: by yhkk25 with SMTP id k25so1353248yhk.31
for user-get.23...@cassandra.apache.org; Thu, 26 Apr 2012
 11:11:42 -0700 (PDT)
 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:date:message-id:subject:from:to:content-type;
bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=;
b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM

 q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8

 ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL

 OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+

 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM
 ie9Q==
 MIME-Version: 1.0
 Received: by 10.182.113.106 with SMTP id
 ix10mr10045510obb.26.1335463902287;
  Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
 Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
 Date: Thu, 26 Apr 2012 14:11:42 -0400
 Message-ID: 
 caal7ocavuw1rtaqwlddzbnzosv7-qxqfhot7w6uj8q08m03...@mail.gmail.com
 Subject: Get
 From: Frank Ng buzzt...@gmail.com
 To: user-get.23...@cassandra.apache.org
 Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e


 --


  Hi,

 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for
 unusual long time for node 10.254.237.2.

 output log on this node:
  INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges.
  INFO 11:19:42,053 [repair #040aae00-28a1-11e1--e378018944ff] new
 session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, /
 10.254.217.2, /10.254.94.2* on range
 (85070591730234615865843651857942052864,85070591730234615865843651857942052865]
 for meter.[eventschema, schema, ids, transaction]
  INFO 11:19:42,055 [repair #040aae00-28a1-11e1--e378018944ff] requests
 for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2,
 localhost/10.254.237.2, /10.254.217.2, /10.254.94.2])
  INFO 11:19:42,063 Enqueuing flush of 
 Memtable-eventschema@1509399856(18748/23435
 serialized/live bytes, 4 ops)
  INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435
 serialized/live bytes, 4 ops)
  INFO 11:19:42,072 Completed flushing
 /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes)
  INFO 11:19:42,073 Discarding obsolete commit
 log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log)
  INFO 11:19:42,076 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from localhost/10.254.237.2
  INFO 11:19:42,102 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.254.221.2
  INFO 11:19:42,128 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.254.217.2
  INFO 11:19:42,228 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.253.2.2

 And nothing after that for long time. So node sent request for trees to
 other nodes and received all but from the 10.254.94.2*

 *On that 10.254.94.2 node:
 INFO 11:19:42,083 [repair #040aae00-28a1-11e1--e378018944ff] Sending
 completed merkle tree to /10.254.237.2 for (meter,eventschema)

 So merkle tree were lost somewhere. Will this waiting break somehow or I
 need to restart node?




Re: repair waiting for something

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation.  It seems that
the repair is flaky.  It either completes relatively fast in a TEST
environment (7 minutes) or gets stuck trying to receive a merkle tree from
a peer that is already sending it the merkle tree.

Only solution is to restart cassandra.  But, we that's not good.


Re: Repair Process Taking too long

2012-04-12 Thread Frank Ng
I also noticed that if I use the -pr option, the repair process went down
from 30 hours to 9 hours.  Is the -pr option safe to use if I want to run
repair processes in parallel on nodes that are not replication peers?

thanks

On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng berryt...@gmail.com wrote:

 Thank you for confirming that the per node data size is most likely
 causing the long repair process.  I have tried a repair on smaller column
 families and it was significantly faster.

 On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.comwrote:

 If you have 1TB of data it will take a long time to repair. Every bit of
 data has to be read and a hash generated. This is one of the reasons we
 often suggest that around 300 to 400Gb per node is a good load in the
 general case.

 Look at nodetool compactionstats .Is there a validation compaction
 running ? If so it is still building the merkle  hash tree.

 Look at nodetool netstats . Is it streaming data ? If so all hash trees
 have been calculated.

 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12/04/2012, at 2:16 AM, Frank Ng wrote:

 Can you expand further on your issue? Were you using Random Patitioner?

 thanks

 On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.comwrote:

 I had this happen when I had really poorly generated tokens for the
 ring.  Cassandra seems to accept numbers that are too big.  You get hot
 spots when you think you should be balanced and repair never ends (I think
 there is a 48 hour timeout).


 On Tuesday, April 10, 2012, Frank Ng wrote:

 I am not using tier-sized compaction.


 On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote:

 Data size, number of nodes, RF?

 Are you using size-tiered compaction on any of the column families
 that hold a lot of your data?

 Do your cassandra logs say you are streaming a lot of ranges?
 zgrep -E (Performing streaming repair|out of sync)


 On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:

  On 04/10/2012 07:16 PM, Frank Ng wrote:

 Short answer - yes.
 But you are asking wrong question.


 I think both processes are taking a while.  When it starts up,
 netstats and compactionstats show nothing.  Anyone out there successfully
 using ext3 and their repair processes are faster than this?

  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30
 hours to complete.  Is it normal for the repair process to take this 
 long?
  I wonder if it's because I am using the ext3 file system.

 thanks







 --
 Jonathan Rhone
 Software Engineer

 *TinyCo*
 800 Market St., Fl 6
 San Francisco, CA 94102
 www.tinyco.com








Re: Repair Process Taking too long

2012-04-12 Thread Frank Ng
Thanks for the clarification.  I'm running repairs as in case 2 (to avoid
deleted data coming back).

On Thu, Apr 12, 2012 at 10:59 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Apr 12, 2012 at 4:06 PM, Frank Ng buzzt...@gmail.com wrote:
  I also noticed that if I use the -pr option, the repair process went down
  from 30 hours to 9 hours.  Is the -pr option safe to use if I want to run
  repair processes in parallel on nodes that are not replication peers?

 There is pretty much two use case for repair:
 1) to rebuild a node: if say a node has lost some data due to a hard
 drive corruption or the like and you want to to rebuild what's missing
 2) the periodic repairs to avoid problem with deleted data coming back
 from the dead (basically:
 http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair)

 In case 1) you want to run 'nodetool repair' (without -pr) against the
 node to rebuild.
 In case 2) (which I suspect is the case your talking now), you *want*
 to use 'nodetool repair -pr' on *every* node of the cluster. I.e.
 that's the most efficient way to do it. The only reason not to use -pr
 in this case would be that it's not available because you're using an
 old version of Cassandra. And yes, it's is safe to run with -pr in
 parallel on nodes that are not replication peers.

 --
 Sylvain


 
  thanks
 
 
  On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng berryt...@gmail.com wrote:
 
  Thank you for confirming that the per node data size is most likely
  causing the long repair process.  I have tried a repair on smaller
 column
  families and it was significantly faster.
 
  On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.com
  wrote:
 
  If you have 1TB of data it will take a long time to repair. Every bit
 of
  data has to be read and a hash generated. This is one of the reasons we
  often suggest that around 300 to 400Gb per node is a good load in the
  general case.
 
  Look at nodetool compactionstats .Is there a validation compaction
  running ? If so it is still building the merkle  hash tree.
 
  Look at nodetool netstats . Is it streaming data ? If so all hash trees
  have been calculated.
 
  Cheers
 
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 12/04/2012, at 2:16 AM, Frank Ng wrote:
 
  Can you expand further on your issue? Were you using Random Patitioner?
 
  thanks
 
  On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com
  wrote:
 
  I had this happen when I had really poorly generated tokens for the
  ring.  Cassandra seems to accept numbers that are too big.  You get
 hot
  spots when you think you should be balanced and repair never ends (I
 think
  there is a 48 hour timeout).
 
 
  On Tuesday, April 10, 2012, Frank Ng wrote:
 
  I am not using tier-sized compaction.
 
 
  On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com
  wrote:
 
  Data size, number of nodes, RF?
 
  Are you using size-tiered compaction on any of the column families
  that hold a lot of your data?
 
  Do your cassandra logs say you are streaming a lot of ranges?
  zgrep -E (Performing streaming repair|out of sync)
 
 
  On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:
 
  On 04/10/2012 07:16 PM, Frank Ng wrote:
 
  Short answer - yes.
  But you are asking wrong question.
 
 
  I think both processes are taking a while.  When it starts up,
  netstats and compactionstats show nothing.  Anyone out there
 successfully
  using ext3 and their repair processes are faster than this?
 
  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua
 wrote:
 
  Hi
 
  You can check with nodetool  which part of repair process is slow
 -
  network streams or verify compactions. use nodetool netstats or
  compactionstats.
 
 
  On 04/10/2012 05:16 PM, Frank Ng wrote:
 
  Hello,
 
  I am on Cassandra 1.0.7.  My repair processes are taking over 30
  hours to complete.  Is it normal for the repair process to take
 this long?
   I wonder if it's because I am using the ext3 file system.
 
  thanks
 
 
 
 
 
 
 
  --
  Jonathan Rhone
  Software Engineer
 
  TinyCo
  800 Market St., Fl 6
  San Francisco, CA 94102
  www.tinyco.com
 
 
 
 
 
 



Re: Repair Process Taking too long

2012-04-11 Thread Frank Ng
Can you expand further on your issue? Were you using Random Patitioner?

thanks

On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com wrote:

 I had this happen when I had really poorly generated tokens for the ring.
  Cassandra seems to accept numbers that are too big.  You get hot spots
 when you think you should be balanced and repair never ends (I think there
 is a 48 hour timeout).


 On Tuesday, April 10, 2012, Frank Ng wrote:

 I am not using tier-sized compaction.


 On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote:

 Data size, number of nodes, RF?

 Are you using size-tiered compaction on any of the column families that
 hold a lot of your data?

 Do your cassandra logs say you are streaming a lot of ranges?
 zgrep -E (Performing streaming repair|out of sync)


 On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:

  On 04/10/2012 07:16 PM, Frank Ng wrote:

 Short answer - yes.
 But you are asking wrong question.


 I think both processes are taking a while.  When it starts up, netstats
 and compactionstats show nothing.  Anyone out there successfully using ext3
 and their repair processes are faster than this?

  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30
 hours to complete.  Is it normal for the repair process to take this 
 long?
  I wonder if it's because I am using the ext3 file system.

 thanks







 --
 Jonathan Rhone
 Software Engineer

 *TinyCo*
 800 Market St., Fl 6
 San Francisco, CA 94102
 www.tinyco.com





Re: Repair Process Taking too long

2012-04-11 Thread Frank Ng
Thank you for confirming that the per node data size is most likely causing
the long repair process.  I have tried a repair on smaller column families
and it was significantly faster.

On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.comwrote:

 If you have 1TB of data it will take a long time to repair. Every bit of
 data has to be read and a hash generated. This is one of the reasons we
 often suggest that around 300 to 400Gb per node is a good load in the
 general case.

 Look at nodetool compactionstats .Is there a validation compaction running
 ? If so it is still building the merkle  hash tree.

 Look at nodetool netstats . Is it streaming data ? If so all hash trees
 have been calculated.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12/04/2012, at 2:16 AM, Frank Ng wrote:

 Can you expand further on your issue? Were you using Random Patitioner?

 thanks

 On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com wrote:

 I had this happen when I had really poorly generated tokens for the ring.
  Cassandra seems to accept numbers that are too big.  You get hot spots
 when you think you should be balanced and repair never ends (I think there
 is a 48 hour timeout).


 On Tuesday, April 10, 2012, Frank Ng wrote:

 I am not using tier-sized compaction.


 On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote:

 Data size, number of nodes, RF?

 Are you using size-tiered compaction on any of the column families that
 hold a lot of your data?

 Do your cassandra logs say you are streaming a lot of ranges?
 zgrep -E (Performing streaming repair|out of sync)


 On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:

  On 04/10/2012 07:16 PM, Frank Ng wrote:

 Short answer - yes.
 But you are asking wrong question.


 I think both processes are taking a while.  When it starts up,
 netstats and compactionstats show nothing.  Anyone out there successfully
 using ext3 and their repair processes are faster than this?

  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30
 hours to complete.  Is it normal for the repair process to take this 
 long?
  I wonder if it's because I am using the ext3 file system.

 thanks







 --
 Jonathan Rhone
 Software Engineer

 *TinyCo*
 800 Market St., Fl 6
 San Francisco, CA 94102
 www.tinyco.com







Repair Process Taking too long

2012-04-10 Thread Frank Ng
Hello,

I am on Cassandra 1.0.7.  My repair processes are taking over 30 hours to
complete.  Is it normal for the repair process to take this long?  I wonder
if it's because I am using the ext3 file system.

thanks


Re: Repair Process Taking too long

2012-04-10 Thread Frank Ng
I think both processes are taking a while.  When it starts up, netstats and
compactionstats show nothing.  Anyone out there successfully using ext3 and
their repair processes are faster than this?

On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30 hours to
 complete.  Is it normal for the repair process to take this long?  I wonder
 if it's because I am using the ext3 file system.

 thanks





Re: Repair Process Taking too long

2012-04-10 Thread Frank Ng
I have 12 nodes with approximately 1TB load per node.  The RF is 3.  I am
considering moving to ext4.

I checked the ranges and the numbers go from 1 to the 9000s .

On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com wrote:

 Data size, number of nodes, RF?

 Are you using size-tiered compaction on any of the column families that
 hold a lot of your data?

 Do your cassandra logs say you are streaming a lot of ranges?
 zgrep -E (Performing streaming repair|out of sync)


 On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:

  On 04/10/2012 07:16 PM, Frank Ng wrote:

 Short answer - yes.
 But you are asking wrong question.


 I think both processes are taking a while.  When it starts up, netstats
 and compactionstats show nothing.  Anyone out there successfully using ext3
 and their repair processes are faster than this?

  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30 hours
 to complete.  Is it normal for the repair process to take this long?  I
 wonder if it's because I am using the ext3 file system.

 thanks







 --
 Jonathan Rhone
 Software Engineer

 *TinyCo*
 800 Market St., Fl 6
 San Francisco, CA 94102
 www.tinyco.com




Re: Repair Process Taking too long

2012-04-10 Thread Frank Ng
I am not using tier-sized compaction.


On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com wrote:

 Data size, number of nodes, RF?

 Are you using size-tiered compaction on any of the column families that
 hold a lot of your data?

 Do your cassandra logs say you are streaming a lot of ranges?
 zgrep -E (Performing streaming repair|out of sync)


 On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote:

  On 04/10/2012 07:16 PM, Frank Ng wrote:

 Short answer - yes.
 But you are asking wrong question.


 I think both processes are taking a while.  When it starts up, netstats
 and compactionstats show nothing.  Anyone out there successfully using ext3
 and their repair processes are faster than this?

  On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote:

 Hi

 You can check with nodetool  which part of repair process is slow -
 network streams or verify compactions. use nodetool netstats or
 compactionstats.


 On 04/10/2012 05:16 PM, Frank Ng wrote:

 Hello,

 I am on Cassandra 1.0.7.  My repair processes are taking over 30 hours
 to complete.  Is it normal for the repair process to take this long?  I
 wonder if it's because I am using the ext3 file system.

 thanks







 --
 Jonathan Rhone
 Software Engineer

 *TinyCo*
 800 Market St., Fl 6
 San Francisco, CA 94102
 www.tinyco.com