Retrieving all row keys of a CF

2015-01-16 Thread Ruchir Jha
We have a column family that has about 800K rows and on an average about a
million columns. I am interested in getting all the row keys in this column
family and I am using the following Astyanax code snippet to do this.

This query never finishes (ran it for 2 days but did not finish).


This query however works with CF's that have lesser number of columns. This
leads me to believe that there might be an API that just retrieves the row
keys and does not depend on the number of columns in the CF. Any
suggestions are appreciated.



I am running Cassandra 2.0.9 and this is a 4 node cluster.



keyspace.prepareQuery(*this*
.wideRowTables.get(group)).setConsistencyLevel(ConsistencyLevel.CL_QUORUM).getAllRows().setRowLimit(1000)

.setRepeatLastToken(*false*
).withColumnRange(*new*
RangeBuilder().setLimit(1).build()).executeWithCallback(*new*
RowCallbackString, T() {



@Override

*public*
*boolean* failure(ConnectionException e)

{


*return* *true*;

}



@Override

*public* *void*
success(RowsString, T rows)

{

//
iterating over rows here

}

});


ConnectionException while trying to connect with Astyanax over Java driver

2014-10-06 Thread Ruchir Jha
All,

I am trying to use the new astyanax over java driver to connect to
cassandra version 1.2.12,

Following settings are turned on in cassandra.yaml:

start_rpc: true
native_transport_port: 9042
start_native_transport: true

*Code to connect:*

final SupplierListHost hostSupplier = new SupplierListHost() {

@Override
public ListHost get()
{
ListHost hosts = new ArrayList();
for(String hostPort :
StringUtil.getSetFromDelimitedString(seedHosts, ,))
{
String[] pair = hostPort.split(:);
Host host = new Host(pair[0],
Integer.valueOf(pair[1]).intValue());
host.setRack(rack1);
hosts.add(host);
}
return hosts;
}
};

// get keyspace
AstyanaxContextKeyspace context = new AstyanaxContext.Builder()
.forCluster(clusterName)
.forKeyspace(keyspace)
.withHostSupplier(hostSupplier)
.withAstyanaxConfiguration(
new AstyanaxConfigurationImpl()

.setDiscoveryType(NodeDiscoveryType.DISCOVERY_SERVICE)

.setDiscoveryDelayInSeconds(6).setCqlVersion(3.0.0).setTargetCassandraVersion(1.2.12)
)
.withConnectionPoolConfiguration(
new *JavaDriverConfigBuilder*().withPort(9042)
.build())
.buildKeyspace(CqlFamilyFactory.getInstance());

context.start();

*Exception in Cassandra Server logs:*

 WARN [New I/O server boss #1 ([id: 0x6815d6c5, /0.0.0.0:9042])] 2014-10-06
11:11:37,826 Slf4JLogger.java (line 82) Failed to accept a connection.
java.lang.NoSuchMethodError:
org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.init(IZ)V
at
org.apache.cassandra.transport.Frame$Decoder.init(Frame.java:147)
at
org.apache.cassandra.transport.Server$PipelineFactory.getPipeline(Server.java:232)
at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.registerAcceptedChannel(NioServerSocketPipelineSink.java:276)
at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:246)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)


I also tried using the Java Driver 2.1.1, but I see the
NoHostAvailableException, and I feel the underlying reason is the same as
during connecting with astyanax java driver.


Re: ConnectionException while trying to connect with Astyanax over Java driver

2014-10-06 Thread Ruchir Jha
That exception is on the cassandra server and not on the client.

On Mon, Oct 6, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote:

 java.lang.NoSuchMethodError - Jar dependency issue probably. Did you try
 to create an issue on the Astyanax github repo ?

 On Mon, Oct 6, 2014 at 6:01 PM, Ruchir Jha ruchir@gmail.com wrote:

 All,

 I am trying to use the new astyanax over java driver to connect to
 cassandra version 1.2.12,

 Following settings are turned on in cassandra.yaml:

 start_rpc: true
 native_transport_port: 9042
 start_native_transport: true

 *Code to connect:*

 final SupplierListHost hostSupplier = new SupplierListHost() {

 @Override
 public ListHost get()
 {
 ListHost hosts = new ArrayList();
 for(String hostPort :
 StringUtil.getSetFromDelimitedString(seedHosts, ,))
 {
 String[] pair = hostPort.split(:);
 Host host = new Host(pair[0],
 Integer.valueOf(pair[1]).intValue());
 host.setRack(rack1);
 hosts.add(host);
 }
 return hosts;
 }
 };

 // get keyspace
 AstyanaxContextKeyspace context = new AstyanaxContext.Builder()
 .forCluster(clusterName)
 .forKeyspace(keyspace)
 .withHostSupplier(hostSupplier)
 .withAstyanaxConfiguration(
 new AstyanaxConfigurationImpl()

 .setDiscoveryType(NodeDiscoveryType.DISCOVERY_SERVICE)

 .setDiscoveryDelayInSeconds(6).setCqlVersion(3.0.0).setTargetCassandraVersion(1.2.12)
 )
 .withConnectionPoolConfiguration(
 new *JavaDriverConfigBuilder*().withPort(9042)
 .build())
 .buildKeyspace(CqlFamilyFactory.getInstance());

 context.start();

 *Exception in Cassandra Server logs:*

  WARN [New I/O server boss #1 ([id: 0x6815d6c5, /0.0.0.0:9042])]
 2014-10-06 11:11:37,826 Slf4JLogger.java (line 82) Failed to accept a
 connection.
 java.lang.NoSuchMethodError:
 org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.init(IZ)V
 at
 org.apache.cassandra.transport.Frame$Decoder.init(Frame.java:147)
 at
 org.apache.cassandra.transport.Server$PipelineFactory.getPipeline(Server.java:232)
 at
 org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.registerAcceptedChannel(NioServerSocketPipelineSink.java:276)
 at
 org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:246)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)


 I also tried using the Java Driver 2.1.1, but I see the
 NoHostAvailableException, and I feel the underlying reason is the same as
 during connecting with astyanax java driver.





OpsCenter_rollups*

2014-08-25 Thread Ruchir Jha
Hi,

I see a lot of activity around the OpsCenter_rollups CFs in the logs. Why
is there so much OpsCenter work happening? Is there a way to disable it,
and whats the impact?

Ruchir.


Re: Compression during bootstrap

2014-08-18 Thread Ruchir Jha
On Wednesday, August 13, 2014, Robert Coli  wrote:

 On Wed, Aug 13, 2014 at 5:53 AM, Ruchir Jha ruchir@gmail.com
 javascript:_e(%7B%7D,'cvml','ruchir@gmail.com'); wrote:

 We are adding nodes currently and it seems like compression is falling
 behind. I judge that by the fact that the new node which has a 4.5T disk
 fills up to 100% while its bootstrapping. Can we avoid this problem with
 the LZ4 compressor because of better compression or do we just need a
 bigger disk?


 2TB per node is a lot of data. 4.5 would be a huge amount of data.
 Sure. Do you mean we should have started adding nodes before we got here?



 Do you mean compaction is falling behind? Do you setcompactionthroughput
 0 while bootstrapping new nodes?

 I did a nodetool getcompactionthroughput and I got 0 MB/ s. It seems like
that just disables compaction throttling which seems like a good thingin my
scenario. Is that correct?


 I don't think compression is involved here? Why do you think it does?

 This is why I thought compression is involved:

http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_about_config_compress_c.html


Side note : we also increased number of concurrent compactors from 3 to 10.
Because we had a lot of idle CPU lying around but that's not helping
everytime we start bootstrapping we are still hitting 4.5 tb and them we
run out of disk space.


=Rob




Compression during bootstrap

2014-08-13 Thread Ruchir Jha
Hello,

We currently are at C* 1.2 and are using the SnappyCompressor for all our
CFs. Total data size is at 24 TB, and its a 12 node cluster. Avg node size
is 2 TB.

We are adding nodes currently and it seems like compression is falling
behind. I judge that by the fact that the new node which has a 4.5T disk
fills up to 100% while its bootstrapping. Can we avoid this problem with
the LZ4 compressor because of better compression or do we just need a
bigger disk?

The reason why we started with 4.5 TB was because we were assuming that
while a new node is bootstrapping it may not need more than 2 times the avg
data size. Is that a weak assumption?

Ruchir.


Re: Node bootstrap

2014-08-12 Thread Ruchir Jha
Still having issues with node bootstrapping. The new node just died,
because it Full Gced, the nodes it had actual streams with noticed its
down. After the full gc finished the new node printed this log :

ERROR 02:52:36,259 Stream failed because /10.10.20.35 died or was
restarted/removed (streams may still be active in background, but further
streams won't be started)

Here 10.10.20.35 is an existing node, the new guy was streaming from. A
similar log was printed for every other node on the cluster. Why did the
new node just exit after the FGC pause?

We have heap dumps enabled on Full GC's and this are the top offenders on
the new node. A new entry that I noticed is the CompressionMetaData chunks.
Anything I can do to optimize that?

 num #instances #bytes  class name
--
   1:  42508421 4818885752  [B
   2:  65860543 3161306064  java.nio.HeapByteBuffer
   3: 124361093 2984666232
 org.apache.cassandra.io.compress.CompressionMetadata$Chunk
   4:  29745665 1427791920
 edu.stanford.ppl.concurrent.SnapTreeMap$Node
   5:  29810362  953931584  org.apache.cassandra.db.Column
   6: 31623  498012768
 [Lorg.apache.cassandra.io.compress.CompressionMetadata$Chunk;



On Tue, Aug 5, 2014 at 2:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also, right now the top command shows that we are at 500-700% CPU, and
 we have 23 total processors, which means we have a lot of idle CPU left
 over, so throwing more threads at compaction and flush should alleviate the
 problem?


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:


 Right now, we have 6 flush writers and compaction_throughput_mb_per_sec
 is set to 0, which I believe disables throttling.

 Also, Here is the iostat -x 5 5 output:


 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda  10.00  1450.35   50.79   55.92  9775.97 12030.14
 204.34 1.56   14.62   1.05  11.21
 dm-0  0.00 0.003.59   18.82   166.52   150.35
  14.14 0.44   19.49   0.54   1.22
 dm-1  0.00 0.002.325.3718.5642.98
 8.00 0.76   98.82   0.43   0.33
 dm-2  0.00 0.00  162.17 5836.66 32714.46 47040.87
  13.30 5.570.90   0.06  36.00
 sdb   0.40  4251.90  106.72  107.35 23123.61 35204.09
 272.46 4.43   20.68   1.29  27.64

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 14.64   10.751.81   13.500.00   59.29

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda  15.40  1344.60   68.80  145.60  4964.80 11790.40
  78.15 0.381.80   0.80  17.10
 dm-0  0.00 0.00   43.00 1186.20  2292.80  9489.60
 9.59 4.883.90   0.09  11.58
 dm-1  0.00 0.001.600.0012.80 0.00
 8.00 0.03   16.00   2.00   0.32
 dm-2  0.00 0.00  197.20 17583.80 35152.00 140664.00
 9.89  2847.50  109.52   0.05  93.50
 sdb  13.20 16552.20  159.00  742.20 32745.60 129129.60
 179.6272.88   66.01   1.04  93.42

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   15.51   19.771.975.020.00   57.73

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda  16.20   523.40   60.00  285.00  5220.80  5913.60
  32.27 0.250.72   0.60  20.86
 dm-0  0.00 0.000.801.4032.0011.20
  19.64 0.013.18   1.55   0.34
 dm-1  0.00 0.001.600.0012.80 0.00
 8.00 0.03   21.00   2.62   0.42
 dm-2  0.00 0.00  339.40 5886.80 66219.20 47092.80
  18.20   251.66  184.72   0.10  63.48
 sdb   1.00  5025.40  264.20  209.20 60992.00 50422.40
 235.35 5.98   40.92   1.23  58.28

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   16.59   16.342.039.010.00   56.04

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda   5.40   320.00   37.40  159.80  2483.20  3529.60
  30.49 0.100.52   0.39   7.76
 dm-0  0.00 0.000.203.60 1.6028.80
 8.00 0.000.68   0.68   0.26
 dm-1  0.00 0.000.000.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-2  0.00 0.00  287.20 13108.20 53985.60 104864.00
  11.86   869.18   48.82   0.06  76.96
 sdb   5.20 12163.40  238.20  532.00 51235.20 93753.60
 188.2521.46   23.75   0.97  75.08



 On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com
 wrote:

 Hi Ruchir,

 With the large number of blocked flushes and the number of pending
 compactions would still indicate IO contention. Can you post the output of
 'iostat -x 5 5

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Thanks Patricia for your response!

On the new node, I just see a lot of the following:

INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
Writing Memtable
INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
(line 262) Compacted 12 sstables to

so basically it is just busy flushing, and compacting. Would you have any
ideas on why the 2x disk space blow up. My understanding was that if
initial_token is left empty on the new node, it just contacts the heaviest
node and bisects its token range. And the heaviest node is around 2.1 TB,
and the new node is already at 4 TB. Could this be because compaction is
falling behind?

Ruchir


On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com
wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major compactions
 on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster where
 the average data size per node is about 2.1 TB. The bootstrap streaming has
 been going on for 2 days now, and the disk size on the new node is already
 above 4 TB and still going. Is this because the new node is running major
 compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com



Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Yes num_tokens is set to 256. initial_token is blank on all nodes including
the new one.


On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote:

 My understanding was that if initial_token is left empty on the new node,
 it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new node
 will take token ranges dynamically. What is the configuration of your other
 nodes, are you setting num_tokens or initial_token on those?


 Mark


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:

 Thanks Patricia for your response!

 On the new node, I just see a lot of the following:

 INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
 Writing Memtable
 INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
 (line 262) Compacted 12 sstables to

 so basically it is just busy flushing, and compacting. Would you have any
 ideas on why the 2x disk space blow up. My understanding was that if
 initial_token is left empty on the new node, it just contacts the heaviest
 node and bisects its token range. And the heaviest node is around 2.1 TB,
 and the new node is already at 4 TB. Could this be because compaction is
 falling behind?

 Ruchir


 On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major
 compactions on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster where
 the average data size per node is about 2.1 TB. The bootstrap streaming has
 been going on for 2 days now, and the disk size on the new node is already
 above 4 TB and still going. Is this because the new node is running major
 compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com






Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also not sure if this is relevant but just noticed the nodetool tpstats
output:

Pool NameActive   Pending  Completed   Blocked  All
time blocked
FlushWriter   0 0   1136 0
  512

Looks like about 50% of flushes are blocked.


On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 My understanding was that if initial_token is left empty on the new node,
 it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new node
 will take token ranges dynamically. What is the configuration of your other
 nodes, are you setting num_tokens or initial_token on those?


 Mark


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:

 Thanks Patricia for your response!

 On the new node, I just see a lot of the following:

 INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
 Writing Memtable
 INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
 (line 262) Compacted 12 sstables to

 so basically it is just busy flushing, and compacting. Would you have
 any ideas on why the 2x disk space blow up. My understanding was that if
 initial_token is left empty on the new node, it just contacts the heaviest
 node and bisects its token range. And the heaviest node is around 2.1 TB,
 and the new node is already at 4 TB. Could this be because compaction is
 falling behind?

 Ruchir


 On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major
 compactions on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster
 where the average data size per node is about 2.1 TB. The bootstrap
 streaming has been going on for 2 days now, and the disk size on the new
 node is already above 4 TB and still going. Is this because the new node 
 is
 running major compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the 
 thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com







Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
nodetool status:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns (effective)  Host ID
Rack
UN  10.10.20.27  1.89 TB256 25.4%
76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
UN  10.10.20.62  1.83 TB256 25.5%
84b47313-da75-4519-94f3-3951d554a3e5  rack1
UN  10.10.20.47  1.87 TB256 24.7%
bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
UN  10.10.20.45  1.7 TB 256 22.6%
8d6bce33-8179-4660-8443-2cf822074ca4  rack1
UN  10.10.20.15  1.86 TB256 24.5%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
UN  10.10.20.31  1.87 TB256 24.9%
1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
UN  10.10.20.35  1.86 TB256 25.8%
17cb8772-2444-46ff-8525-33746514727d  rack1
UN  10.10.20.51  1.89 TB256 25.0%
0343cd58-3686-465f-8280-56fb72d161e2  rack1
UN  10.10.20.19  1.91 TB256 25.5%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
UN  10.10.20.39  1.93 TB256 26.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
UN  10.10.20.52  1.81 TB256 25.4%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
UN  10.10.20.22  1.89 TB256 24.8%
46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


Note: The new node is not part of the above list.

nodetool compactionstats:

pending tasks: 1649
  compaction typekeyspace   column family   completed
total  unit  progress
   Compaction   iprod   customerorder  1682804084
  17956558077 bytes 9.37%
   Compactionprodgatecustomerorder  1664239271
 1693502275 bytes98.27%
   Compaction  qa_config_bkupfixsessionconfig_hist
 2443   27253 bytes 8.96%
   Compactionprodgatecustomerorder_hist
 1770577280  5026699390 bytes35.22%
   Compaction   iprodgatecustomerorder_hist
 2959560205312350192622 bytes 0.95%




On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 Ok so you have num_tokens set to 256 for all nodes with initial_token
 commented out, this means you are using vnodes and the new node will
 automatically grab a list of tokens to take over responsibility for.

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
 512

 Looks like about 50% of flushes are blocked.


 This is a problem as it indicates that the IO system cannot keep up.

 Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


 This is normal as the new node will most likely take tokens from all nodes
 in the cluster.

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 Seeds are only used when a new node is bootstrapping into the cluster and
 needs a set of ips to contact and discover the cluster, so this would have
 no impact on data sizes or streaming. In general it would be considered
 best practice to have a set of 2-3 seeds from each data center, with all
 nodes having the same seed list.


 What is the current output of 'nodetool compactionstats'? Could you also
 paste the output of nodetool status keyspace?

 Mark



 On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote:

 Just ran this on the new node:

 nodetool netstats | grep Streaming from | wc -l
 10

 Seems like the new node is receiving data from 10 other nodes. Is that
 expected in a vnodes enabled environment?

 Ruchir.



 On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Also not sure if this is relevant but just noticed the nodetool tpstats
 output:

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
   512

 Looks like about 50% of flushes are blocked.


 On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 My understanding was that if initial_token is left empty on the new
 node, it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new
 node will take token ranges dynamically

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also Mark to your comment on my tpstats output, below is my iostat output,
and the iowait is at 4.59%, which means no IO pressure, but we are still
seeing the bad flush performance. Should we try increasing the flush
writers?


Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
 _x86_64_(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  5.80   10.250.654.590.00   78.72

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 103.83  9630.62 11982.60 3231174328 4020290310
dm-0 13.57   160.1781.12   53739546   27217432
dm-1  7.5916.9443.775682200   14686784
dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
sdb 206.09 22789.19 33569.27 7646015080 11262843224



On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns (effective)  Host ID
   Rack
 UN  10.10.20.27  1.89 TB256 25.4%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.62  1.83 TB256 25.5%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.47  1.87 TB256 24.7%
 bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
 UN  10.10.20.45  1.7 TB 256 22.6%
 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
 UN  10.10.20.15  1.86 TB256 24.5%
 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
 UN  10.10.20.31  1.87 TB256 24.9%
 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
 UN  10.10.20.35  1.86 TB256 25.8%
 17cb8772-2444-46ff-8525-33746514727d  rack1
 UN  10.10.20.51  1.89 TB256 25.0%
 0343cd58-3686-465f-8280-56fb72d161e2  rack1
 UN  10.10.20.19  1.91 TB256 25.5%
 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
 UN  10.10.20.39  1.93 TB256 26.0%
 b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
 UN  10.10.20.52  1.81 TB256 25.4%
 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
 UN  10.10.20.22  1.89 TB256 24.8%
 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


 Note: The new node is not part of the above list.

 nodetool compactionstats:

 pending tasks: 1649
   compaction typekeyspace   column family   completed
   total  unit  progress
Compaction   iprod   customerorder  1682804084
 17956558077 bytes 9.37%
Compactionprodgatecustomerorder  1664239271
  1693502275 bytes98.27%
Compaction  qa_config_bkupfixsessionconfig_hist
  2443   27253 bytes 8.96%
Compactionprodgatecustomerorder_hist
  1770577280  5026699390 bytes35.22%
Compaction   iprodgatecustomerorder_hist
  2959560205312350192622 bytes 0.95%




 On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 Ok so you have num_tokens set to 256 for all nodes with initial_token
 commented out, this means you are using vnodes and the new node will
 automatically grab a list of tokens to take over responsibility for.

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
 512

 Looks like about 50% of flushes are blocked.


 This is a problem as it indicates that the IO system cannot keep up.

 Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


 This is normal as the new node will most likely take tokens from all
 nodes in the cluster.

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 Seeds are only used when a new node is bootstrapping into the cluster and
 needs a set of ips to contact and discover the cluster, so this would have
 no impact on data sizes or streaming. In general it would be considered
 best practice to have a set of 2-3 seeds from each data center, with all
 nodes having the same seed list.


 What is the current output of 'nodetool compactionstats'? Could you also
 paste the output of nodetool status keyspace?

 Mark



 On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Just ran this on the new node:

 nodetool netstats | grep Streaming from | wc -l
 10

 Seems like

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
set to 0, which I believe disables throttling.

Also, Here is the iostat -x 5 5 output:


Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
1.56   14.62   1.05  11.21
dm-0  0.00 0.003.59   18.82   166.52   150.3514.14
0.44   19.49   0.54   1.22
dm-1  0.00 0.002.325.3718.5642.98 8.00
0.76   98.82   0.43   0.33
dm-2  0.00 0.00  162.17 5836.66 32714.46 47040.8713.30
5.570.90   0.06  36.00
sdb   0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
4.43   20.68   1.29  27.64

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
14.64   10.751.81   13.500.00   59.29

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  15.40  1344.60   68.80  145.60  4964.80 11790.4078.15
0.381.80   0.80  17.10
dm-0  0.00 0.00   43.00 1186.20  2292.80  9489.60 9.59
4.883.90   0.09  11.58
dm-1  0.00 0.001.600.0012.80 0.00 8.00
0.03   16.00   2.00   0.32
dm-2  0.00 0.00  197.20 17583.80 35152.00 140664.00
9.89  2847.50  109.52   0.05  93.50
sdb  13.20 16552.20  159.00  742.20 32745.60 129129.60   179.62
   72.88   66.01   1.04  93.42

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  15.51   19.771.975.020.00   57.73

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  16.20   523.40   60.00  285.00  5220.80  5913.6032.27
0.250.72   0.60  20.86
dm-0  0.00 0.000.801.4032.0011.2019.64
0.013.18   1.55   0.34
dm-1  0.00 0.001.600.0012.80 0.00 8.00
0.03   21.00   2.62   0.42
dm-2  0.00 0.00  339.40 5886.80 66219.20 47092.8018.20
  251.66  184.72   0.10  63.48
sdb   1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
5.98   40.92   1.23  58.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.59   16.342.039.010.00   56.04

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   5.40   320.00   37.40  159.80  2483.20  3529.6030.49
0.100.52   0.39   7.76
dm-0  0.00 0.000.203.60 1.6028.80 8.00
0.000.68   0.68   0.26
dm-1  0.00 0.000.000.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-2  0.00 0.00  287.20 13108.20 53985.60 104864.00
 11.86   869.18   48.82   0.06  76.96
sdb   5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
   21.46   23.75   0.97  75.08



On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Ruchir,

 With the large number of blocked flushes and the number of pending
 compactions would still indicate IO contention. Can you post the output of
 'iostat -x 5 5'

 If you do in fact have spare IO, there are several configuration options
 you can tune such as increasing the number of flush writers and
 compaction_throughput_mb_per_sec

 Mark


 On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also Mark to your comment on my tpstats output, below is my iostat
 output, and the iowait is at 4.59%, which means no IO pressure, but we are
 still seeing the bad flush performance. Should we try increasing the flush
 writers?


 Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
  _x86_64_(24 CPU)

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.80   10.250.654.590.00   78.72

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda 103.83  9630.62 11982.60 3231174328 4020290310
 dm-0 13.57   160.1781.12   53739546   27217432
 dm-1  7.5916.9443.775682200   14686784
 dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
 sdb 206.09 22789.19 33569.27 7646015080 11262843224



 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  10.10.20.27  1.89 TB256 25.4%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.62  1.83 TB256 25.5%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.47  1.87 TB256 24.7%
 bcd51a92-3150-41ae-9c51

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also, right now the top command shows that we are at 500-700% CPU, and we
have 23 total processors, which means we have a lot of idle CPU left over,
so throwing more threads at compaction and flush should alleviate the
problem?


On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:


 Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
 set to 0, which I believe disables throttling.

 Also, Here is the iostat -x 5 5 output:


 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
 1.56   14.62   1.05  11.21
 dm-0  0.00 0.003.59   18.82   166.52   150.3514.14
 0.44   19.49   0.54   1.22
 dm-1  0.00 0.002.325.3718.5642.98 8.00
 0.76   98.82   0.43   0.33
 dm-2  0.00 0.00  162.17 5836.66 32714.46 47040.8713.30
 5.570.90   0.06  36.00
 sdb   0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
 4.43   20.68   1.29  27.64

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 14.64   10.751.81   13.500.00   59.29

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  15.40  1344.60   68.80  145.60  4964.80 11790.4078.15
 0.381.80   0.80  17.10
 dm-0  0.00 0.00   43.00 1186.20  2292.80  9489.60 9.59
 4.883.90   0.09  11.58
 dm-1  0.00 0.001.600.0012.80 0.00 8.00
 0.03   16.00   2.00   0.32
 dm-2  0.00 0.00  197.20 17583.80 35152.00 140664.00
 9.89  2847.50  109.52   0.05  93.50
 sdb  13.20 16552.20  159.00  742.20 32745.60 129129.60
 179.6272.88   66.01   1.04  93.42

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   15.51   19.771.975.020.00   57.73

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  16.20   523.40   60.00  285.00  5220.80  5913.6032.27
 0.250.72   0.60  20.86
 dm-0  0.00 0.000.801.4032.0011.2019.64
 0.013.18   1.55   0.34
 dm-1  0.00 0.001.600.0012.80 0.00 8.00
 0.03   21.00   2.62   0.42
 dm-2  0.00 0.00  339.40 5886.80 66219.20 47092.8018.20
   251.66  184.72   0.10  63.48
 sdb   1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
 5.98   40.92   1.23  58.28

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   16.59   16.342.039.010.00   56.04

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   5.40   320.00   37.40  159.80  2483.20  3529.6030.49
 0.100.52   0.39   7.76
 dm-0  0.00 0.000.203.60 1.6028.80 8.00
 0.000.68   0.68   0.26
 dm-1  0.00 0.000.000.00 0.00 0.00 0.00
 0.000.00   0.00   0.00
 dm-2  0.00 0.00  287.20 13108.20 53985.60 104864.00
  11.86   869.18   48.82   0.06  76.96
 sdb   5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
21.46   23.75   0.97  75.08



 On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Ruchir,

 With the large number of blocked flushes and the number of pending
 compactions would still indicate IO contention. Can you post the output of
 'iostat -x 5 5'

 If you do in fact have spare IO, there are several configuration options
 you can tune such as increasing the number of flush writers and
 compaction_throughput_mb_per_sec

 Mark


 On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also Mark to your comment on my tpstats output, below is my iostat
 output, and the iowait is at 4.59%, which means no IO pressure, but we are
 still seeing the bad flush performance. Should we try increasing the flush
 writers?


 Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
  _x86_64_(24 CPU)

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.80   10.250.654.590.00   78.72

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda 103.83  9630.62 11982.60 3231174328 4020290310
 dm-0 13.57   160.1781.12   53739546   27217432
 dm-1  7.5916.9443.775682200   14686784
 dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
 sdb 206.09 22789.19 33569.27 7646015080 11262843224



 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com
 wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State

Node bootstrap

2014-08-04 Thread Ruchir Jha
I am trying to bootstrap the thirteenth node in a 12 node cluster where the
average data size per node is about 2.1 TB. The bootstrap streaming has
been going on for 2 days now, and the disk size on the new node is already
above 4 TB and still going. Is this because the new node is running major
compactions while the streaming is going on?

One thing that I noticed that seemed off was the seeds property in the yaml
of the 13th node comprises of 1..12. Where as the seeds property on the
existing 12 nodes consists of all the other nodes except the thirteenth
node. Is this an issue?

Any other insight is appreciated?

Ruchir.


Full GC in cassandra

2014-07-28 Thread Ruchir Jha
Really curious to know what's causing the spike in Columns and
DeletedColums below :


2014-07-28T09:30:27.471-0400: 127335.928: [Full GC 127335.928: [Class
Histogram:
 num #instances #bytes  class name
--
   1: 132626060 6366050880  java.nio.HeapByteBuffer
   2:  28194918 3920045528  [B
   3:  78124737 3749987376
 edu.stanford.ppl.concurrent.SnapTreeMap$Node
*   4:  67650128 2164804096
2164804096  org.apache.cassandra.db.Column*
*   5:  16315310  522089920  org.apache.cassandra.db.DeletedColumn*
   6:  6818  392489608  [I
   7:   2844374  273059904
 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
   8:   5727000  22908  java.util.TreeMap$Entry
   9:767742  182921376  [J
  10:   2932832  140775936
 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
  11:   2844375   9102
 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
  12:   4145131   66322096
 java.util.concurrent.atomic.AtomicReference
  13:437874   64072392  [C
  14:   2660844   63860256
 java.util.concurrent.ConcurrentSkipListMap$Node
  15:  4920   62849864  [[B
  16:   1632063   52226016  edu.stanford.ppl.concurrent.SnapTreeMap


Re: Full GC in cassandra

2014-07-28 Thread Ruchir Jha
Also we do subsequent updates (atleat 4) for each piece of data that we
write.


On Mon, Jul 28, 2014 at 10:36 AM, Ruchir Jha ruchir@gmail.com wrote:

 Doing about 5K writes / second. Avg Data Size = 1.6 TB / node. Total Data
 Size = 21 TB.

 And this is the nodetool cfstats output for one of our busiest column
 families:

   SSTable count: 10
 Space used (live): 43239294899
 Space used (total): 43239419603
 SSTable Compression Ratio: 0.2954468408497778
 Number of Keys (estimate): 63729152
 Memtable Columns Count: 1921620
 Memtable Data Size: 257680020
 Memtable Switch Count: 9
 Read Count: 6167
 Read Latency: NaN ms.
 Write Count: 770984
 Write Latency: 0.098 ms.
 Pending Tasks: 0
 Bloom Filter False Positives: 370
 Bloom Filter False Ratio: 0.0
 Bloom Filter Space Used: 80103200
 Compacted row minimum size: 180
 Compacted row maximum size: 3311
 Compacted row mean size: 2631
 Average live cells per slice (last five minutes): 73.0
 Average tombstones per slice (last five minutes): 13.0



 On Mon, Jul 28, 2014 at 10:14 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 What is your data size and number of columns in Cassandra. Do you do many
 deletions?


 On Mon, Jul 28, 2014 at 2:53 PM, Ruchir Jha ruchir@gmail.com wrote:

 Really curious to know what's causing the spike in Columns and
 DeletedColums below :


 2014-07-28T09:30:27.471-0400: 127335.928: [Full GC 127335.928: [Class
 Histogram:
   num #instances #bytes  class name
 --
1: 132626060 6366050880  java.nio.HeapByteBuffer
2:  28194918 3920045528  [B
3:  78124737 3749987376
  edu.stanford.ppl.concurrent.SnapTreeMap$Node
 *   4:  67650128 2164804096
 2164804096  org.apache.cassandra.db.Column*
 *   5:  16315310  522089920
  org.apache.cassandra.db.DeletedColumn*
6:  6818  392489608  [I
7:   2844374  273059904
  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
8:   5727000  22908  java.util.TreeMap$Entry
9:767742  182921376  [J
   10:   2932832  140775936
  edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
   11:   2844375   9102
  edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
   12:   4145131   66322096
  java.util.concurrent.atomic.AtomicReference
   13:437874   64072392  [C
   14:   2660844   63860256
  java.util.concurrent.ConcurrentSkipListMap$Node
   15:  4920   62849864  [[B
   16:   1632063   52226016
  edu.stanford.ppl.concurrent.SnapTreeMap






Re: UnavailableException

2014-07-14 Thread Ruchir Jha
Mark,

Here you go:

*NodeTool status:*

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns   Host ID
  Rack
UN  10.10.20.15  1.62 TB256 8.1%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
UN  10.10.20.19  1.66 TB256 8.3%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
UN  10.10.20.35  1.62 TB256 9.0%
17cb8772-2444-46ff-8525-33746514727d  rack1
UN  10.10.20.31  1.64 TB256 8.3%
1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
UN  10.10.20.52  1.59 TB256 9.1%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
UN  10.10.20.27  1.66 TB256 7.7%
76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
UN  10.10.20.22  1.66 TB256 8.9%
46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
UN  10.10.20.39  1.68 TB256 8.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
UN  10.10.20.45  1.49 TB256 7.7%
8d6bce33-8179-4660-8443-2cf822074ca4  rack1
UN  10.10.20.47  1.64 TB256 7.9%
bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
UN  10.10.20.62  1.59 TB256 8.2%
84b47313-da75-4519-94f3-3951d554a3e5  rack1
UN  10.10.20.51  1.66 TB256 8.9%
0343cd58-3686-465f-8280-56fb72d161e2  rack1


*Astyanax Connection Settings:*

seeds   :12
maxConns   :16
maxConnsPerHost:16
connectTimeout :2000
socketTimeout  :6
maxTimeoutCount:16
maxBlockedThreadsPerHost:16
maxOperationsPerConnection:16
DiscoveryType: RING_DESCRIBE
ConnectionPoolType: TOKEN_AWARE
DefaultReadConsistencyLevel: CL_QUORUM
DefaultWriteConsistencyLevel: CL_QUORUM



On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy mark.re...@boxever.com wrote:

 Can you post the output of nodetool status and your Astyanax connection
 settings?


 On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha ruchir@gmail.com wrote:

 This is how we create our keyspace. We just ran this command once through
 a cqlsh session on one of the nodes, so don't quite understand what you
 mean by check that your DC names match up

 CREATE KEYSPACE prod WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'datacenter1': '3'
 };



 On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink clohf...@blackbirdit.com
 wrote:

 What replication strategy are you using? if using NetworkTopolgyStrategy
 double check that your DC names match up (case sensitive)

 Chris

 On Jul 11, 2014, at 9:38 AM, Ruchir Jha ruchir@gmail.com wrote:

 Here's the complete stack trace:

 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException:
 [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
 attempts=3]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
 at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
 Caused by: UnavailableException()
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 ... 12 more



 On Fri, Jul 11, 2014 at 9:11 AM, Prem Yadav ipremya...@gmail.com
 wrote:

 Please post the full exception.


 On Fri, Jul 11, 2014 at 1:50 PM, Ruchir Jha ruchir@gmail.com
 wrote:

 We have a 12 node cluster and we are consistently seeing this
 exception being thrown during peak write traffic. We have a replication
 factor of 3 and a write consistency level of QUORUM. Also note there is no
 unusual Or Full GC activity during this time. Appreciate any help.

 Sent from my iPhone









Re: UnavailableException

2014-07-14 Thread Ruchir Jha
Yes the line is : Datacenter: datacenter1 which matches with my create
keyspace command. As for the NodeDiscoveryType, we will follow it but I
don't believe it to be the root of my issue here because the nodes start up
atleast 6 hours before the UnavailableException and as far as adding nodes
is concerned we would only do it after hours.


On Mon, Jul 14, 2014 at 2:34 PM, Chris Lohfink clohf...@blackbirdit.com
wrote:

 If you list all 12 nodes in seeds list, you can try using
 NodeDiscoveryType.NONE instead of RING_DESCRIBE.

 Its been recommended that way by some anyway so if you add nodes to
 cluster your app wont start using it until all bootstrapping and
 everythings settled down.

 Chris

 On Jul 14, 2014, at 12:04 PM, Ruchir Jha ruchir@gmail.com wrote:

 Mark,

 Here you go:

 *NodeTool status:*

 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns   Host ID
   Rack
 UN  10.10.20.15  1.62 TB256 8.1%
 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
 UN  10.10.20.19  1.66 TB256 8.3%
 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
 UN  10.10.20.35  1.62 TB256 9.0%
 17cb8772-2444-46ff-8525-33746514727d  rack1
 UN  10.10.20.31  1.64 TB256 8.3%
 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
 UN  10.10.20.52  1.59 TB256 9.1%
 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
 UN  10.10.20.27  1.66 TB256 7.7%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.22  1.66 TB256 8.9%
 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
 UN  10.10.20.39  1.68 TB256 8.0%
 b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
 UN  10.10.20.45  1.49 TB256 7.7%
 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
 UN  10.10.20.47  1.64 TB256 7.9%
 bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
 UN  10.10.20.62  1.59 TB256 8.2%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.51  1.66 TB256 8.9%
 0343cd58-3686-465f-8280-56fb72d161e2  rack1


 *Astyanax Connection Settings:*

 seeds   :12
 maxConns   :16
 maxConnsPerHost:16
 connectTimeout :2000
 socketTimeout  :6
 maxTimeoutCount:16
 maxBlockedThreadsPerHost:16
 maxOperationsPerConnection:16
 DiscoveryType: RING_DESCRIBE
 ConnectionPoolType: TOKEN_AWARE
 DefaultReadConsistencyLevel: CL_QUORUM
 DefaultWriteConsistencyLevel: CL_QUORUM



 On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy mark.re...@boxever.com
 wrote:

 Can you post the output of nodetool status and your Astyanax connection
 settings?


 On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha ruchir@gmail.com wrote:

 This is how we create our keyspace. We just ran this command once
 through a cqlsh session on one of the nodes, so don't quite understand what
 you mean by check that your DC names match up

 CREATE KEYSPACE prod WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'datacenter1': '3'
 };



 On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink clohf...@blackbirdit.com
  wrote:

 What replication strategy are you using? if using
 NetworkTopolgyStrategy double check that your DC names match up (case
 sensitive)

 Chris

 On Jul 11, 2014, at 9:38 AM, Ruchir Jha ruchir@gmail.com wrote:

 Here's the complete stack trace:

 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException:
 [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
 attempts=3]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
 at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
 Caused by: UnavailableException()
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute

UnavailableException

2014-07-11 Thread Ruchir Jha
We have a 12 node cluster and we are consistently seeing this exception being 
thrown during peak write traffic. We have a replication factor of 3 and a write 
consistency level of QUORUM. Also note there is no unusual Or Full GC activity 
during this time. Appreciate any help. 

Sent from my iPhone

Re: UnavailableException

2014-07-11 Thread Ruchir Jha
Here's the complete stack trace:

com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
TokenRangeOfflineException:
[host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
attempts=3]UnavailableException()
at
com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
at
com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
at
com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
at
com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
at
com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
at
com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
at
com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
at
com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
at
com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
Caused by: UnavailableException()
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
at
com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
at
com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
at
com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
... 12 more



On Fri, Jul 11, 2014 at 9:11 AM, Prem Yadav ipremya...@gmail.com wrote:

 Please post the full exception.


 On Fri, Jul 11, 2014 at 1:50 PM, Ruchir Jha ruchir@gmail.com wrote:

 We have a 12 node cluster and we are consistently seeing this exception
 being thrown during peak write traffic. We have a replication factor of 3
 and a write consistency level of QUORUM. Also note there is no unusual Or
 Full GC activity during this time. Appreciate any help.

 Sent from my iPhone





Re: UnavailableException

2014-07-11 Thread Ruchir Jha
This is how we create our keyspace. We just ran this command once through a
cqlsh session on one of the nodes, so don't quite understand what you mean
by check that your DC names match up

CREATE KEYSPACE prod WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'datacenter1': '3'
};



On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink clohf...@blackbirdit.com
wrote:

 What replication strategy are you using? if using NetworkTopolgyStrategy
 double check that your DC names match up (case sensitive)

 Chris

 On Jul 11, 2014, at 9:38 AM, Ruchir Jha ruchir@gmail.com wrote:

 Here's the complete stack trace:

 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException:
 [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
 attempts=3]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
 at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
 Caused by: UnavailableException()
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 ... 12 more



 On Fri, Jul 11, 2014 at 9:11 AM, Prem Yadav ipremya...@gmail.com wrote:

 Please post the full exception.


 On Fri, Jul 11, 2014 at 1:50 PM, Ruchir Jha ruchir@gmail.com wrote:

 We have a 12 node cluster and we are consistently seeing this exception
 being thrown during peak write traffic. We have a replication factor of 3
 and a write consistency level of QUORUM. Also note there is no unusual Or
 Full GC activity during this time. Appreciate any help.

 Sent from my iPhone







Re: TTransportException (java.net.SocketException: Broken pipe)

2014-07-09 Thread Ruchir Jha
We have these precise settings but are still seeing the broken pipe exception 
in our gc logs. Any clues?

Sent from my iPhone

 On Jul 8, 2014, at 1:17 PM, Bhaskar Singhal bhaskarsing...@yahoo.com wrote:
 
 Thanks Mark. Yes the 1024 is the limit. I haven't changed it as per the 
 recommended production settings.
 
 But I am wondering why does Cassandra need to keep 3000+ commit log segment 
 files open?
 
 Regards,
 Bhaskar
 
 
 On Tuesday, 8 July 2014 1:50 PM, Mark Reddy mark.re...@boxever.com wrote:
 
 
 Hi Bhaskar,
 
 Can you check your limits using 'ulimit -a'? The default is 1024, which needs 
 to be increased if you have not done so already.
 
 Here you will find a list of recommended production settings: 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
 
 
 Mark
 
 On Tue, Jul 8, 2014 at 5:30 AM, Bhaskar Singhal bhaskarsing...@yahoo.com 
 wrote:
 Hi,
 
 I am using Cassandra 2.0.7 (with default settings and 16GB heap on quad core 
 ubuntu server with 32gb ram) and trying to ingest 1MB values using 
 cassandra-stress. It works fine for a while(1600secs) but after ingesting 
 around 120GB data, I start getting the following error:
 Operation [70668] retried 10 times - error inserting key 0070668 
 ((TTransportException): java.net.SocketException: Broken pipe)
 
 The cassandra server is still running but in the system.log I see the below 
 mentioned errors.
 
 ERROR [COMMIT-LOG-ALLOCATOR] 2014-07-07 22:39:23,617 CassandraDaemon.java 
 (line 198) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.lang.NoClassDefFoundError: org/apache/cassandra/db/commitlog/CommitLog$4
 at 
 org.apache.cassandra.db.commitlog.CommitLog.handleCommitError(CommitLog.java:374)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:116)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.cassandra.db.commitlog.CommitLog$4
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
 Caused by: java.io.FileNotFoundException: 
 /path/2.0.7/cassandra/build/classes/main/org/apache/cassandra/db/commitlog/CommitLog$4.class
  (Too many open files)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.init(FileInputStream.java:146)
 at 
 sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086)
 at sun.misc.Resource.cachedInputStream(Resource.java:77)
 at sun.misc.Resource.getByteBuffer(Resource.java:160)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:436)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
 ERROR [FlushWriter:7] 2014-07-07 22:39:24,924 CassandraDaemon.java (line 198) 
 Exception in thread Thread[FlushWriter:7,5,main]
 FSWriteError in 
 /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
 at 
 org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
 at 
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417)
 at 
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.FileNotFoundException: 
 /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db 
 (Too many open files)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:221)
 at java.io.FileOutputStream.init(FileOutputStream.java:110)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:466)
 ... 9 more
 
 There are around 9685 open files by the Cassandra server process (using 
 lsof), 

A

2014-05-30 Thread Ruchir Jha


Sent from my iPhone


Re: clearing tombstones?

2014-05-16 Thread Ruchir Jha
I tried to do this, however the doubling in disk space is not temporary
as you state in your note. What am I missing?


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
ober...@civicscience.comwrote:

 So, if I was impatient and just wanted to make this happen now, I could:

 1.) Change GCGraceSeconds of the CF to 0
 2.) run nodetool compact (*)
 3.) Change GCGraceSeconds of the CF back to 10 days

 Since I have ~900M tombstones, even if I miss a few due to impatience, I
 don't care *that* much as I could re-run my clean up tool against the now
 much smaller CF.

 (*) A long long time ago I seem to recall reading advice about don't ever
 run nodetool compact, but I can't remember why.  Is there any bad long
 term consequence?  Short term there are several:
 -a heavy operation
 -temporary 2x disk space
 -one big SSTable afterwards
 But moving forward, everything is ok right?  CommitLog/MemTable-SStables,
 minor compactions that merge SSTables, etc...  The only flaw I can think of
 is it will take forever until the SSTable minor compactions build up enough
 to consider including the big SSTable in a compaction, making it likely
 I'll have to self manage compactions.



 On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy mark.re...@boxever.comwrote:

 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman 
 ober...@civicscience.com wrote:

 I'm seeing a lot of articles about a dependency between removing
 tombstones and GCGraceSeconds, which might be my problem (I just checked,
 and this CF has GCGraceSeconds of 10 days).


 On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
 tbarbu...@gmail.comwrote:

 compaction should take care of it; for me it never worked so I run
 nodetool compaction on every node; that does it.


 2014-04-11 16:05 GMT+02:00 William Oberman ober...@civicscience.com:

 I'm wondering what will clear tombstoned rows?  nodetool cleanup,
 nodetool repair, or time (as in just wait)?

 I had a CF that was more or less storing session information.  After
 some time, we decided that one piece of this information was pointless to
 track (and was 90%+ of the columns, and in 99% of those cases was ALL
 columns for a row).   I wrote a process to remove all of those columns
 (which again in a vast majority of cases had the effect of removing the
 whole row).

 This CF had ~1 billion rows, so I expect to be left with ~100m rows.
  After I did this mass delete, everything was the same size on disk (which
 I expected, knowing how tombstoning works).  It wasn't 100% clear to me
 what to poke to cause compactions to clear the tombstones.  First I tried
 nodetool cleanup on a candidate node.  But, afterwards the disk usage was
 the same.  Then I tried nodetool repair on that same node.  But again, 
 disk
 usage is still the same.  The CF has no snapshots.

 So, am I misunderstanding something?  Is there another operation to
 try?  Do I have to just wait?  I've only done cleanup/repair on one 
 node.
  Do I have to run one or the other over all nodes to clear tombstones?

 Cassandra 1.2.15 if it matters,

 Thanks!

 will










Re: Cassanda JVM GC defaults question

2014-04-23 Thread Ruchir Jha
Lowering CMSInitiatingOccupancyFraction to less than 0.75 will lead to
more GC interference and will impact write performance. If you're not
sensitive to this impact, your expectation is correct, however make
sure your flush_largest_memtables_at is always set to less than or
equal to the occupancy fraction.

On 4/23/14, Ken Hancock ken.hanc...@schange.com wrote:
 I'm in the process of trying to tune the GC and I'm far from an expert in
 this area, so hoping someone can tell me I'm either out in left field or
 on-track.

 Cassandra's default GC settings are (abbreviated):
 +UseConcMarkSweepGC
 +CMSInitiaitingOccupancyFraction=75
 +UseCMSInitiatingOccupancyOnly

 Also in cassandra.yaml:
 flush_largest_memtables_at: 0.75

 Since the new heap is relatively small, if I'm understanding this correctly
 CMS will normally not kick in until it's at roughly 75% of the heap (75% of
 size-new, new being relatively small compared to the overall heap).  These
 two settings being very close would seem that both trigger at nearly the
 same point which might be undesirable as the flushing would also create
 more GC pressure (in addition to FlushWriter blocking if multiple tables
 are queued for flushing because of this).

 Clearly more heap will give us more peak running room, but would also
 lowering the CMSInitiatingOccupancyFraction help at the expense of some
 added CPU for more frequent, smaller collections?

 Mikio Bruan's blog had some interesting tests in this area
 http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html



GC histogram analysis

2014-04-16 Thread Ruchir Jha
Hi,

I am trying to investigate ParNew promotion failures happening routinely in
production. As part of this exercise, I enabled
-XX:PrintHistogramBeforeFullGC and saw the following output. As you can see
there are a ton of Columns, ExpiringColumns and DeletedColumns before GC
ran and these numbers go down significantly right after GC. Why are there
so many expiring and deleted columns?



*Before GC:* num #instances #bytes  class name
--
   1: 113539896 5449915008  java.nio.*HeapByteBuffer*
   2:  15979061 2681431488  [B
   3:  36364545 1745498160
edu.stanford.ppl.concurrent.SnapTreeMap$Node
   4:  23583282  754665024  org.apache.cassandra.db.*Column*
   5:   8745428  209890272
java.util.concurrent.ConcurrentSkipListMap$Node
   6:   5062619  202504760  org.apache.cassandra.db.*ExpiringColumn*
   7: 45261  198998216  [I
   8:   1801535  172947360
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
   9:   1473677  169570040  [J
  10:   4713304  113119296  java.lang.Double
  11:   3246729  103895328  org.apache.cassandra.db.*DeletedColumn*

*After GC:*
num #instances #bytes  class name
--
1:  11807204 1505962728  [B
2:  12525536  601225728  java.nio.*HeapByteBuffer*
3:   8839073  424275504
edu.stanford.ppl.concurrent.SnapTreeMap$Node
4:   8194496  262223872  org.apache.cassandra.db.*Column*
cache.KeyCacheKey
17:432119   17284760  org.apache.cassandra.db.*ExpiringColumn*
21:351096   11235072  org.apache.cassandra.db.*DeletedColumn*


Re: GC histogram analysis

2014-04-16 Thread Ruchir Jha
No we don't. 

Sent from my iPhone

 On Apr 16, 2014, at 9:21 AM, Mark Reddy mark.re...@boxever.com wrote:
 
 Do you delete and/or set TTLs on your data?
 
 
 On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha ruchir@gmail.com wrote:
 Hi,
 
 I am trying to investigate ParNew promotion failures happening routinely in 
 production. As part of this exercise, I enabled 
 -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see 
 there are a ton of Columns, ExpiringColumns and DeletedColumns before GC ran 
 and these numbers go down significantly right after GC. Why are there so 
 many expiring and deleted columns? 
 
 Before GC:
 
  num #instances #bytes  class name
 --
1: 113539896 5449915008  java.nio.HeapByteBuffer
2:  15979061 2681431488  [B
3:  36364545 1745498160  
 edu.stanford.ppl.concurrent.SnapTreeMap$Node
4:  23583282  754665024  org.apache.cassandra.db.Column
5:   8745428  209890272  
 java.util.concurrent.ConcurrentSkipListMap$Node
6:   5062619  202504760  org.apache.cassandra.db.ExpiringColumn
7: 45261  198998216  [I
8:   1801535  172947360  
 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
9:   1473677  169570040  [J
   10:   4713304  113119296  java.lang.Double
   11:   3246729  103895328  org.apache.cassandra.db.DeletedColumn
 
 After GC:
 num #instances #bytes  class name
 --
 1:  11807204 1505962728  [B
 2:  12525536  601225728  java.nio.HeapByteBuffer
 3:   8839073  424275504  edu.stanford.ppl.concurrent.SnapTreeMap$Node
 4:   8194496  262223872  org.apache.cassandra.db.Column
 cache.KeyCacheKey
 17:432119   17284760  org.apache.cassandra.db.ExpiringColumn
 21:351096   11235072  org.apache.cassandra.db.DeletedColumn