AW: secondery indexes TTL - strange issues

2012-09-17 Thread Roland Gude
Issue created.

Will attach debug logs asap
CASSANDRA-4670https://issues.apache.org/jira/browse/CASSANDRA-4670

Von: aaron morton [mailto:aa...@thelastpickle.com]
Gesendet: Montag, 17. September 2012 03:46
An: user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

 Date gets inserted and accessible via index query for some time. At some point 
in time Indexes are completely empty and start filling again (while new data 
enters the system).
If you can reproduce this please create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA .

If you can include DEBUG level logs that would be helpful.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 10:08 PM, Roland Gude 
roland.g...@ez.nomailto:roland.g...@ez.no wrote:


I am not sure it is compacting an old file: the same thing happens eeverytime I 
rebuild the index. New Files appear, get compacted and vanish.

We have set up a new smaller cluster with fresh data. Same thing happens here 
as well. Date gets inserted and accessible via index query for some time. At 
some point in time Indexes are completely empty and start filling again (while 
new data enters the system).

I am currently testing with SizeTiered on both the fresh set and the imported 
set.

For the fresh set (which is significantly smaller) first results imply that the 
issue is not happening with SizeTieredCompaction - I have not yet tested 
everything that comes into my mind and will update if something new comes up.

As for the failing query it is from the cli:
get EventsByItem where 0003--1000--=utf8('someValue');
0003--1000-- is a TUUID we use as a marker for a 
TimeSeries.
(and equivalent queries with astyanax and hector as well)

This is a cf with the issue:

create column family EventsByItem
  with column_type = 'Standard'
  and comparator = 'TimeUUIDType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and read_repair_chance = 0.5
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and column_metadata = [
{column_name : '--1000--',
validation_class : BytesType,
index_name : 'ebi_mandatorIndex',
index_type : 0},
{column_name : '0002--1000--',
validation_class : BytesType,
index_name : 'ebi_itemidIndex',
index_type : 0},
{column_name : '0003--1000--',
validation_class : BytesType,
index_name : 'ebi_eventtypeIndex',
index_type : 0}]
  and compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};

Von: aaron morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com]
Gesendet: Freitag, 14. September 2012 10:46
An: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
There is a lot of weird things here.
It could be levelled compaction compacting an older file for the first time. 
But that would be a guess.

Rebuilding the index gives us back the data for a couple of minutes - then it 
vanishes again.
Are you able to do a test with SiezedTieredCompaction ?

Are you able to replicate the problem with a fresh testing CF and some test 
Data?

If it's only a problem with imported data can you provide a sample of the 
failing query ? Any maybe the CF definition ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 2:46 AM, Roland Gude 
roland.g...@ez.nomailto:roland.g...@ez.no wrote:



Hi,

we have been running a system on Cassandra 0.7 heavily relying on secondary 
indexes for columns with TTL.
This has been working like a charm, but we are trying hard to move forward with 
Cassandra and are struggling at that point:

When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , 
rebuild indexes and run our system, everything seems to work good - until in 
some point of time index queries do not return any data at all anymore (note 
that the TTL has not yet expired for several months).
Rebuilding the index gives us back the data for a couple of minutes - then it 
vanishes again.

What seems strange is that compaction apparently is very aggressive:

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to 

Re: nodetool cfstats and compression

2012-09-17 Thread aaron morton
Yes. 
It is the space taken up on disk, including compaction. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 6:30 AM, Jim Ancona j...@anconafamily.com wrote:

 Do the row size stats reported by 'nodetool cfstats' include the
 effect of compression?
 
 Thanks,
 
 Jim



Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread aaron morton
 Does minor compaction delete expired column-tombstones when the row is
 also present in another table which is
No. 
Compaction is per Column Family. 

Tombstones will be expired by Minor Compaction if all fragments of the row are 
contained in the SSTables being compacted. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote:

 Hi all,
 
 Does minor compaction delete expired column-tombstones when the row is
 also present in another table which is not subject to the minor
 compaction?
 
 Example:
 
 Say there are 5 SStables:
 
 - Customers_0 (10 MB)
 - Customers_1 (10 MB)
 - Customers_2 (10 MB)
 - Customers_3 (10 MB)
 - Customers_4 (30 MB)
 
 A minor compaction is triggered which will compact the similar sized
 tables 0 to 3. In these tables is a customer record with key C1 with
 an expired column tombstone. Customer C1 is also present in table 4.
 Will the minor compaction delete the column (i.e. will the tombstone
 be present in the newly created table)?
 
 Thanks,
 
 Rene



Re: Disk configuration in new cluster node

2012-09-17 Thread aaron morton
  4 drives for data and 1 drive for commitlog, 
How are you configuring the drives ? It's normally best to present one big data 
volume, e.g. using raid 0, and put the commit log on say the system mirror.

 will the node balance out the load on the drives, or is it agnostic to usage 
 of drives underlying data directories?
It will not. 
There is a feature coming in v1.2 to add better support for JBOD 
configurations. 

A word of warning. If you put more than 300GB to 400GB per node you may end 
experience some issues such as repair, compaction or disaster recovery taking a 
long time. These are simply soft limits that provide a good rule of thumb for 
HDD based systems with 1 GigE networking.   

Hope that helps. 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 7:39 AM, Casey Deccio ca...@deccio.net wrote:

 I'm building a new cluster (to replace the broken setup I've written about 
 in previous posts) that will consist of only two nodes.  I understand that 
 I'll be sacrificing high availability of writes if one of the nodes goes 
 down, and I'm okay with that.  I'm more interested in maintaining high 
 consistency and high read availability.  So I've decided to use a write-level 
 consistency of ALL and read-level consistency of ONE.
 
 My first question is about the drives in this setup.  If I initially set up 
 the system with, say, 4 drives for data and 1 drive for commitlog, and later 
 I decide to add more capacity to the node by adding more drives for data 
 (adding the new data directory entries in cassandra.yaml), will the node 
 balance out the load on the drives, or is it agnostic to usage of drives 
 underlying data directories?
 
 My second question has to do with RAID striping.  Would it be more useful to 
 stripe the disk with the commitlog or the disks with the data?  Of course, 
 with a single striped volume for data directories, it would be more difficult 
 to add capacity to the node later, as I've suggested above.
 
 Casey



Re: Disk configuration in new cluster node

2012-09-17 Thread Robin Verlangen
 A word of warning. If you put more than 300GB to 400GB per node you may
end experience some issues  ... 

I think this is probably the solution to your multiple disk problem. You
could use easily one single disk to store the data on, and one disk for the
commitlog. No issues with JBOD, RAID or whatever. If you want to improve
throughput you might consider a RAID-0 setup.

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/17 aaron morton aa...@thelastpickle.com

  4 drives for data and 1 drive for commitlog,

 How are you configuring the drives ? It's normally best to present one big
 data volume, e.g. using raid 0, and put the commit log on say the system
 mirror.

 will the node balance out the load on the drives, or is it agnostic to
 usage of drives underlying data directories?

 It will not.
 There is a feature coming in v1.2 to add better support for JBOD
 configurations.

 A word of warning. If you put more than 300GB to 400GB per node you may
 end experience some issues such as repair, compaction or disaster recovery
 taking a long time. These are simply soft limits that provide a good rule
 of thumb for HDD based systems with 1 GigE networking.

 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/09/2012, at 7:39 AM, Casey Deccio ca...@deccio.net wrote:

 I'm building a new cluster (to replace the broken setup I've written
 about in previous posts) that will consist of only two nodes.  I understand
 that I'll be sacrificing high availability of writes if one of the nodes
 goes down, and I'm okay with that.  I'm more interested in maintaining high
 consistency and high read availability.  So I've decided to use a
 write-level consistency of ALL and read-level consistency of ONE.

 My first question is about the drives in this setup.  If I initially set
 up the system with, say, 4 drives for data and 1 drive for commitlog, and
 later I decide to add more capacity to the node by adding more drives for
 data (adding the new data directory entries in cassandra.yaml), will the
 node balance out the load on the drives, or is it agnostic to usage of
 drives underlying data directories?

 My second question has to do with RAID striping.  Would it be more useful
 to stripe the disk with the commitlog or the disks with the data?  Of
 course, with a single striped volume for data directories, it would be more
 difficult to add capacity to the node later, as I've suggested above.

 Casey





Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Rene Kochen
Oke, thanks!

So a column tombstone will only be removed if all row fragments are
present in the tables being compacted.

I have a row called Index which contains columns like page0,
page1, page2, etc. Every several minutes, new columns are created
and old ones deleted. The problem is that I now have an Index row in
several SSTables, but the column tombstones are never deleted. And
reading the Index row (and all its column tombstones) takes longer
and longer.

If I do a major compaction, all tombstones are deleted and reading the
index row takes one millisecond again (and all the garbage-collect
issues because of this).

Is it not advised to use rows with many new column creates/deletes
(because of how minor compactions work)?

Thanks!

Rene

2012/9/17 aaron morton aa...@thelastpickle.com:
 Does minor compaction delete expired column-tombstones when the row is
 also present in another table which is

 No.
 Compaction is per Column Family.

 Tombstones will be expired by Minor Compaction if all fragments of the row
 are contained in the SSTables being compacted.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote:

 Hi all,

 Does minor compaction delete expired column-tombstones when the row is
 also present in another table which is not subject to the minor
 compaction?

 Example:

 Say there are 5 SStables:

 - Customers_0 (10 MB)
 - Customers_1 (10 MB)
 - Customers_2 (10 MB)
 - Customers_3 (10 MB)
 - Customers_4 (30 MB)

 A minor compaction is triggered which will compact the similar sized
 tables 0 to 3. In these tables is a customer record with key C1 with
 an expired column tombstone. Customer C1 is also present in table 4.
 Will the minor compaction delete the column (i.e. will the tombstone
 be present in the newly created table)?

 Thanks,

 Rene




Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote:

 I have a schema that represents a filesystem and one example of a Super CF 
 is:
 This may help with some ideas
 http://www.datastax.com/dev/blog/cassandra-file-system-design
 
 In general we advise to avoid Super Columns if possible. They are often 
 slower, and the sub columns are not indexed. Meaning all the sub columns have 
 to be read into memory. 
 
 
 So if I set column_count = 1, as I have now, but fetch 1000 dirs (rows) 
 and each one happens to have 1 files (columns) the dataset is 1000x1.
 This is the way the query works internally. Multiget is simply a collections 
 of independent gets. 
 
  
 The multiget() is more efficient, but I'm having trouble trying to limit the 
 size of the data returned in order to not crash the cassandra node.
 Often less is more. I would only ask for a few 10's of rows at a time, or try 
 to limit the size of the returned query to a few MB's. Otherwise a lot of 
 data get's dragged through cassandra, the network and finally Python. 
 
 You may want to consider a CF like the inode CF it the article above. Where 
 the parent dir is a column with a secondary index. 

Thanks Aaron! I will take your points into consideration.

Best regards,
André



Re: Repair: Issue in netstats

2012-09-17 Thread B R
Sorry for the delay; been out of the loop.

Could this problem be due to running repair on a node upgraded to 1.0.11
but the other node in the cluster is still at 0.8.x ?

On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 That obviously shouldn't happen and I don't remember any open ticket
 related to that. You might want to open a ticket on jira
 (https://issues.apache.org/jira/browse/CASSANDRA).

 --
 Sylvain

 On Fri, Sep 7, 2012 at 10:50 AM, B R software.research.w...@gmail.com
 wrote:
  We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node
 and
  running upgradesstables, we have run a routine repair operation, This
  operation has been running for a long time and does not seem to be
  progressing.
 
  Running netstats has shown unexpected values for percentages as shown
 below.
  Any clue as to what could be be issue ?
 
  bin/nodetool -h 172.16.0.34 netstats
  Mode: NORMAL
  Streaming to: /172.16.0.29
 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=116
  progress=19946657796608/334406146 - 5964800%
 /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=116
  progress=0/179880575 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12
  progress=0/1448134 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=116
  progress=0/350403675 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89
  progress=0/27569594 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1
  progress=0/95043 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1
  progress=0/232800 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1
  progress=0/82705 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=116
  progress=0/724836994 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=116
  progress=0/401797714 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2
  progress=0/301297 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3
  progress=0/829914 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2
  progress=0/288460 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13
  progress=0/1954639 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8
  progress=0/1187649 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1
  progress=0/141714 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=116
  progress=0/390168999 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=111
  progress=13620592201686/303748754 - 4484163%
 /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=110
  progress=0/162808076 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10
  progress=0/1922996 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=111
  progress=0/350744309 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87
  progress=0/24364920 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2
  progress=0/228764 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=111
  progress=0/720722886 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=111
  progress=0/364643588 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4
  progress=0/963207 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2
  progress=0/360024 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1
  progress=0/72842 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11
  progress=0/1381176 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13
  progress=0/3266736 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2
  progress=0/639705 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=111
  progress=0/358443928 - 0%
   Nothing streaming from /172.16.0.29
  Pool NameActive   Pending  Completed
  Commandsn/a 0 19
  Responses   n/a 02050444
 
  Regards.



Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote:

 I have a schema that represents a filesystem and one example of a Super CF 
 is:
 This may help with some ideas
 http://www.datastax.com/dev/blog/cassandra-file-system-design

Could you explain the usage of the sentinel? Which nodes have it? I 
understand that it should be used for recursive dir listings, to restrict the 
nodes returned to the /tmp/ dir, but I'm not sure I understand how it 
works

Thanks,
André

Re: Many ParNew collections

2012-09-17 Thread Rene Kochen
Thanks Aaron,

I found the problem. It's in this thread: minor compaction and delete
expired column-tombstones.

The problem was that I have one big row called Index which contains
many tombstones. Reading all these tombstones caused the memory
issues.

I think node 1 and 3 have had enough minor compactions so that the
tombstones were removed. The second node still contains several old
SSTables and it takes some time before the whole thing is compacted
again.

Thanks,

Rene

2012/9/17 aaron morton aa...@thelastpickle.com:
 The second node (the one suffering from many GC) has a high read
 latency compared to the others. Another thing is that the compacted
 row maximum size is bigger than on the other nodes.

 Node 2 also:
 * has about 220MB of data, while the others have about 45MB
 * has about 1 Million keys while the others have about 0.3 Million

 - Should the other nodes also have that wide row,

 yes. Are you running repair ? What CL are you using ?

 - Could repeatedly reading a wide row cause parnew problems?

 Maybe. Are you reading the whole thing ?
 It's only 22MB, it's big but not huge.

 I would:

 * ensure repair is running and completing, this may even out the data load.
 * determine if GC is associate with compactions, repair or general activity.
 * if Gc is associated with compactions the simple thing is to reduce
 concurrent_compactions and in_memory_compaction_limit in the yaml. Note this
 is often a simple / quick fix that can increase IO load and slow down
 compaction. The harder thing is to tune the JVM memory settings (the
 defaults often do a good job).

 Hope that helps.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/09/2012, at 10:41 PM, Rene Kochen rene.koc...@schange.com wrote:

 Thanks Aaron,

 At another production site the exact same problems occur (also after
 ~6 months). Here I have a very small cluster of three nodes with
 replication factor = 3.
 One of the three nodes begins to have many long Parnews and high CPU
 load. I upgraded to Cassandra 1.0.11, but the GC problem still
 continues on that node.

 If I look at the CFStats of the three nodes, there is one CF which is
 different:

 Column Family: Logs
 SSTable count: 1
 Space used (live): 47606705
 Space used (total): 47606705
 Number of Keys (estimate): 338176
 Memtable Columns Count: 22297
 Memtable Data Size: 51542275
 Memtable Switch Count: 1
 Read Count: 189441
 Read Latency: 0,768 ms.
 Write Count: 123411
 Write Latency: 0,035 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 0
 Bloom Filter False Ratio: 0,0
 Bloom Filter Space Used: 721456
 Key cache capacity: 20
 Key cache size: 56685
 Key cache hit rate: 0.9132482658217008
 Row cache: disabled
 Compacted row minimum size: 73
 Compacted row maximum size: 263210
 Compacted row mean size: 94

 Column Family: Logs
 SSTable count: 3
 Space used (live): 233688199
 Space used (total): 233688199
 Number of Keys (estimate): 1191936
 Memtable Columns Count: 20147
 Memtable Data Size: 47067518
 Memtable Switch Count: 1
 Read Count: 188473
 Read Latency: 4031,791 ms.
 Write Count: 120412
 Write Latency: 0,042 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 234
 Bloom Filter False Ratio: 0,0
 Bloom Filter Space Used: 2603808
 Key cache capacity: 20
 Key cache size: 5153
 Key cache hit rate: 1.0
 Row cache: disabled
 Compacted row minimum size: 73
 Compacted row maximum size: 25109160
 Compacted row mean size: 156

 Column Family: Logs
 SSTable count: 1
 Space used (live): 47714798
 Space used (total): 47714798
 Number of Keys (estimate): 338176
 Memtable Columns Count: 29046
 Memtable Data Size: 66585390
 Memtable Switch Count: 1
 Read Count: 196048
 Read Latency: 1,466 ms.
 Write Count: 127709
 Write Latency: 0,034 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 8
 Bloom Filter False Ratio: 0,00847
 Bloom Filter Space Used: 720496
 Key cache capacity: 20
 Key cache size: 54166
 Key cache hit rate: 0.9833443960960739
 Row cache: disabled
 Compacted row minimum size: 73
 Compacted row maximum size: 263210
 Compacted row mean size: 95

 The second node (the one suffering from many GC) has a high read
 latency compared to the others. Another thing is that the compacted
 row maximum size is bigger than on the other nodes.

 What puzzles me:

 - Should the other nodes also have that wide row, because the
 replication factor is three and I only have three nodes? I must say
 that the wide row is probably the index row which has columns
 added/removed continuously. Maybe the other nodes lost much data
 because of compactions?
 - Could repeatedly reading a wide row cause parnew problems?

 Thanks!

 Rene

 2012/8/17 aaron morton aa...@thelastpickle.com:

 - Cassandra 0.7.10

 You _really_ should look at getting up to 1.1 :) Memory management is much
 better and the JVM heap requirements are less.

 However, there is one node with high read latency and far too many
 ParNew collections (compared 

Re: Repair: Issue in netstats

2012-09-17 Thread Sylvain Lebresne
On Mon, Sep 17, 2012 at 11:06 AM, B R software.research.w...@gmail.com wrote:
 Could this problem be due to running repair on a node upgraded to 1.0.11 but
 the other node in the cluster is still at 0.8.x ?

Yes, repair (as all operation requiring streaming) doesn't work
correctly across major Cassandra version. First thing you should do is
to finish the upgrade of the nodes.

--
Sylvain


 On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 That obviously shouldn't happen and I don't remember any open ticket
 related to that. You might want to open a ticket on jira
 (https://issues.apache.org/jira/browse/CASSANDRA).

 --
 Sylvain

 On Fri, Sep 7, 2012 at 10:50 AM, B R software.research.w...@gmail.com
 wrote:
  We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node
  and
  running upgradesstables, we have run a routine repair operation, This
  operation has been running for a long time and does not seem to be
  progressing.
 
  Running netstats has shown unexpected values for percentages as shown
  below.
  Any clue as to what could be be issue ?
 
  bin/nodetool -h 172.16.0.34 netstats
  Mode: NORMAL
  Streaming to: /172.16.0.29
 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db
  sections=116
  progress=19946657796608/334406146 - 5964800%
 /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db
  sections=116
  progress=0/179880575 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12
  progress=0/1448134 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db
  sections=116
  progress=0/350403675 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89
  progress=0/27569594 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1
  progress=0/95043 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1
  progress=0/232800 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1
  progress=0/82705 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db
  sections=116
  progress=0/724836994 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db
  sections=116
  progress=0/401797714 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2
  progress=0/301297 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3
  progress=0/829914 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2
  progress=0/288460 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13
  progress=0/1954639 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8
  progress=0/1187649 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1
  progress=0/141714 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db
  sections=116
  progress=0/390168999 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db
  sections=111
  progress=13620592201686/303748754 - 4484163%
 /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db
  sections=110
  progress=0/162808076 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10
  progress=0/1922996 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db
  sections=111
  progress=0/350744309 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87
  progress=0/24364920 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2
  progress=0/228764 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db
  sections=111
  progress=0/720722886 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db
  sections=111
  progress=0/364643588 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4
  progress=0/963207 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2
  progress=0/360024 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1
  progress=0/72842 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11
  progress=0/1381176 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13
  progress=0/3266736 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2
  progress=0/639705 - 0%
 /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db
  sections=111
  progress=0/358443928 - 0%
   Nothing streaming from /172.16.0.29
  Pool NameActive   Pending  Completed
  Commandsn/a 0 19
  Responses   n/a 02050444
 
  Regards.




Re: cassandra/hadoop BulkOutputFormat failures

2012-09-17 Thread Brian Jeltema
As suggested, it was a version-skew problem. 

Thanks.

Brian

On Sep 14, 2012, at 11:34 PM, Jeremy Hanna wrote:

 A couple of guesses:
 - are you mixing versions of Cassandra?  Streaming differences between 
 versions might throw this error.  That is, are you bulk loading with one 
 version of Cassandra into a cluster that's a different version?
 - (shot in the dark) is your cluster overwhelmed for some reason?
 
 If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw.
 
 Jeremy
 
 On Sep 14, 2012, at 1:34 PM, Brian Jeltema brian.jelt...@digitalenvoy.net 
 wrote:
 
 I'm trying to do a bulk load from a Cassandra/Hadoop job using the 
 BulkOutputFormat class.
 It appears that the reducers are generating the SSTables, but is failing to 
 load them into the cluster:
 
 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : 
 attempt_201208201337_0184_r_04_0, Status : FAILED
 java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, 
 /10.4.0.2, /10.4.0.1, /10.4.0.3, /10.4.0.4] 
   at 
 org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
   at 
 org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
   at 
 org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)   
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)  
 
 A brief look at the BulkOutputFormat class shows that it depends on 
 SSTableLoader. My Hadoop cluster
 and my Cassandra cluster are co-located on the same set of machines. I 
 haven't found any stated restrictions,
 but does this technique only work if the Hadoop cluster is distinct from the 
 Cassandra cluster? Any suggestions
 on how to get past this problem?
 
 Thanks in advance.
 
 Brian
 
 



Cassandra Messages Dropped

2012-09-17 Thread Michael Theroux
Hello,

While under load, we have occasionally been seeing messages dropped errors in 
our cassandra log.  Doing some research, I understand this is part of 
Cassandra's design to shed load, and we should look at the tpstats-like output 
to determine what should be done to resolve the situation.  Typically, you will 
see lots of messages blocked or pending, and that might be an indicator that a 
specific part of hardware needs to be improved/tuned/upgraded.  

However, looking at the output we are getting, I'm finding it difficult to see 
what needs to be tuned, as it looks to me cassandra is handling the load within 
the mutation stage:

NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 658) 
119 MUTATION messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
ReadStage 3 3 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
RequestResponseStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
MutationStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
ReplicateOnWriteStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
GossipStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
StreamStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
MemtablePostFlusher   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
FlushWriter   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
commitlog_archiver0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
AntiEntropySessions   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) 
CompactionManager 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) 
MessagingServicen/a   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) 
Cache Type Size Capacity   
KeysToSave Provider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) 
KeyCache2184533  2184533
  all 
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) 
RowCache  00
  all  org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) 
ColumnFamilyMemtable ops,data
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116) 
system.NodeIdInfo 0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.IndexInfo  0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.LocationInfo   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.Versions   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.schema_keyspaces   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.Migrations 0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.schema_columnfamilies 0,0
 

Astyanax error

2012-09-17 Thread A J
Hello,

I am tyring to retrieve a list of Column Names (that are defined as
Integer) from a CF with RowKey as Integer as well. (I don't care for
the column values that are just nulls)

Following is snippet of my Astyanax code. I am getting 0 columns but I
know the key that I am querying contains a few hundred columns. Any
idea what part of the code below is incorrect ?

Thanks.

Astyanax code:

ColumnFamilyInteger, Integer CF1 =
new ColumnFamilyInteger, Integer(
CF1, // Column Family Name
IntegerSerializer.get(),   // Key Serializer
IntegerSerializer.get());  // Column Serializer

//Reading data
int NUM_EVENTS = 9;

StopWatch clock = new StopWatch();
clock.start();
for (int i = 0; i  NUM_EVENTS; ++i) {
ColumnListInteger result = keyspace.prepareQuery(CF1)
.getKey(1919)
.execute().getResult();
System.out.println( results are:  + result.size() );
}
clock.stop();



CF definition:
===
[default@ks1] describe CF1;
ColumnFamily: CF1
  Key Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.IntegerType


Re: Disk configuration in new cluster node

2012-09-17 Thread Casey Deccio
On Mon, Sep 17, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.comwrote:

  4 drives for data and 1 drive for commitlog,

 How are you configuring the drives ? It's normally best to present one big
 data volume, e.g. using raid 0, and put the commit log on say the system
 mirror.


Given the advice to use a single RAID 0 volume, I think that's what I'll
do.  By system mirror, you are referring to the volume on which the OS is
installed?  Should the volume with the commit log also have multiple disks
in a RAID 0 volume?  Alternatively, would a RAID 1 setup be reasonable for
the system volume/OS, so the system itself can be resilient to disk
failure, or would that kill commit performance?

Any preference to hardware RAID 0 vs. using something like mdadm?

A word of warning. If you put more than 300GB to 400GB per node you may end
 experience some issues such as repair, compaction or disaster recovery
 taking a long time. These are simply soft limits that provide a good rule
 of thumb for HDD based systems with 1 GigE networking.


Hmm.  My hope was to be able to run a minimal number of nodes and maximize
their capacity because it doesn't make sense in my case to build or
maintain a large cluster.  I wanted to run a two-node setup (RF=1, RCL=ONE,
WCL=ALL), each with several disks having large capacity, totaling 10 - 12
TB.  Is this (another) bad idea?

Casey


Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Josep Blanquer
We've run exactly into the same problem recently. Some specific keys in a
couple CFs accumulate a fair amount of column churn over time.

Pre Cassandra 1.x we scheduled full compactions often to purge them.
However, when we moved to 1.x but we adopted the recommended practice of
avoiding full compactions. The problem took a while to manifest itself, but
over the course of several weeks (few months) of not doing full compactions
the load on those services slowly increased...and despite we have
everything monitored, it was not trivial to find out that it was the
accumulation of tombstones on 'some' keys, for 'some' CF in the cluster
that were really causing long latencies and CPU spikes (high CPU is a
typical signature when having a fair amount of tombstones in the SSTables).

Is there any JIRA or enhancement to perhaps be able to detect when certain
column tombstones can be deleted in minor compactions? The new introduction
of SSTable min-max timestamps might help? or perhaps there are new ones
coming up that I'm not aware of 

I'm saying this because there is absolutely no way (that I know of) to find
out or monitor when Cassandra encounters many column tombstones when doing
searches. That alone could help detect these cases so one can change the
data model and/or realize that needs full compactions. For example a new
metric at the CF level that tracks % of tombstones read per row (ideally a
histogram based on row size), or perhaps spit something out in the logs (a
la mysql slowquery log) when a wide row is read and a certain % of
tombstone columns are encountered...this alone can be a huge help in at
least detecting the latent problem.

...what we had to do to fully debug and understand the issue was to build
some tools that scanned SSTables and provided some of those stats. In a
large cluster that is painful to do.

Anyway, just wanted to chime in the thread to provide our input in the
matter.

Cheers,

Josep M.

On Mon, Sep 17, 2012 at 2:01 AM, Rene Kochen
rene.koc...@emea.schange.comwrote:

 Oke, thanks!

 So a column tombstone will only be removed if all row fragments are
 present in the tables being compacted.

 I have a row called Index which contains columns like page0,
 page1, page2, etc. Every several minutes, new columns are created
 and old ones deleted. The problem is that I now have an Index row in
 several SSTables, but the column tombstones are never deleted. And
 reading the Index row (and all its column tombstones) takes longer
 and longer.

 If I do a major compaction, all tombstones are deleted and reading the
 index row takes one millisecond again (and all the garbage-collect
 issues because of this).

 Is it not advised to use rows with many new column creates/deletes
 (because of how minor compactions work)?

 Thanks!

 Rene

 2012/9/17 aaron morton aa...@thelastpickle.com:
  Does minor compaction delete expired column-tombstones when the row is
  also present in another table which is
 
  No.
  Compaction is per Column Family.
 
  Tombstones will be expired by Minor Compaction if all fragments of the
 row
  are contained in the SSTables being compacted.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote:
 
  Hi all,
 
  Does minor compaction delete expired column-tombstones when the row is
  also present in another table which is not subject to the minor
  compaction?
 
  Example:
 
  Say there are 5 SStables:
 
  - Customers_0 (10 MB)
  - Customers_1 (10 MB)
  - Customers_2 (10 MB)
  - Customers_3 (10 MB)
  - Customers_4 (30 MB)
 
  A minor compaction is triggered which will compact the similar sized
  tables 0 to 3. In these tables is a customer record with key C1 with
  an expired column tombstone. Customer C1 is also present in table 4.
  Will the minor compaction delete the column (i.e. will the tombstone
  be present in the newly created table)?
 
  Thanks,
 
  Rene
 
 



Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Sylvain Lebresne
 Is there any JIRA or enhancement to perhaps be able to detect when certain
 column tombstones can be deleted in minor compactions? The new introduction
 of SSTable min-max timestamps might help? or perhaps there are new ones
 coming up that I'm not aware of 

https://issues.apache.org/jira/browse/CASSANDRA-4671

--
Sylvain


persistent compaction issue (1.1.4 and 1.1.5)

2012-09-17 Thread Michael Kjellman
Hi All,

I have an issue where each one of my nodes (currently all running at 1.1.5) is 
reporting around 30,000 pending compactions. I understand that a pending 
compaction doesn't necessarily mean it is a scheduled task however I'm confused 
why this behavior is occurring. It is the same on all nodes, occasionally goes 
down 5k pending compaction tasks, and then returns to 25,000-35,000 compaction 
tasks pending.

I have tried a repair operation/scrub operation on two of the nodes and while 
compactions initially happen the number of pending compactions does not 
decrease.

Any ideas? Thanks for your time.

Best,
michael


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Bloom Filters in Cassandra

2012-09-17 Thread Bill Hastings
How are bloom filters used in Cassandra? Is my understanding correct
in that there is one per SSTable encapsulating what keys are in the
SSTable? Please advise.


Is Cassandra right for me?

2012-09-17 Thread Marcelo Elias Del Valle
Hello,

 I am new to Cassandra and I am in doubt if Cassandra is the right
technology to use in the architecture I am defining. Also, I saw a
presentation which said that if I don't have rows with more than a hundred
rows in Cassandra, whether I am doing something wrong or I shouldn't be
using Cassandra. Therefore, it might be the case I am doing something
wrong. If you could help me to find out the answer for these questions by
giving any feedback, it would be highly appreciated.
 Here is my need and what I am thinking in using Cassandra for:

   - I need to support a high volume of writes per second. I might have a
   billion writes per hour
   - I need to write non-structured data that will be processed later by
   hadoop processes to generate structured data from it. Later, I index the
   structured data using SOLR or SOLANDRA, so the data can be consulted by my
   end user application. Is Cassandra recommended for that, or should I be
   thinking in writting directly to HDFS files, for instance? What's the main
   advantage I get from storing data in a nosql service like Cassandra, when
   compared to storing files into HDFS?
   - Usually I will write json data associated to an ID and my hadoop
   processes will process this data to write data to a database. I have two
   doubts here:
  - If I don't need to perform complicated queries in Cassandra, should
  I store the json-like data just as a column value? I am afraid of doing
  something wrong here, as I would need just to store the json
file and some
  more 5 or 6 fields to query the files later.
  - Does it make sense to you to use hadoop to process data from
  Cassandra and store the results in a database, like HBase? Once I have
  structured data, is there any reason I should use Cassandra instead of
  HBase?

 I am sorry if the questions are too dummy, I have been watching a lot
of videos and reading a lot of documentation about Cassandra, but honestly,
more I read more I have questions.

Thanks in advance.

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


are counters stable enough for production?

2012-09-17 Thread Bartłomiej Romański
Hi,

Does anyone have any experience with using Cassandra counters in production?

We rely heavily on them and recently we've got a few very serious
problems. Our counters values suddenly became a few times higher than
expected. From the business point of view this is a disaster :/ Also
there a few open major bugs related to them. Some of them for quite
long (months).

We are seriously considering going back to other solutions (e.g. SQL
databases). We simply cannot afford incorrect counter values. We can
tolerate loosing a few increments from time to time, but we cannot
tolerate having counters suddenly 3 times higher or lower than the
expected values.

What is the current status of counters? Should I consider them a
production-ready feature and we just have some bad luck? Or should I
rather consider them as a experimental-feature and look for some other
solutions?

Do you have any experiences with them? Any comments would be very
helpful for us!

Thanks,
Bartek


Re: Query advice to prevent node overload

2012-09-17 Thread aaron morton
 Could you explain the usage of the sentinel?
Queries that use a secondary index must include an equality clause. That's the 
sentinel is there for…

 select filename from inode where filename  ‘/tmp’ and filename  ‘/tmq’ and 
 sentinel = ‘x’;

Cheers 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/09/2012, at 9:17 PM, André Cruz andre.c...@co.sapo.pt wrote:

 On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote:
 
 I have a schema that represents a filesystem and one example of a Super CF 
 is:
 This may help with some ideas
 http://www.datastax.com/dev/blog/cassandra-file-system-design
 
 Could you explain the usage of the sentinel? Which nodes have it? I 
 understand that it should be used for recursive dir listings, to restrict the 
 nodes returned to the /tmp/ dir, but I'm not sure I understand how it 
 works
 
 Thanks,
 André



Re: Cassandra Messages Dropped

2012-09-17 Thread aaron morton
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
 MemtablePostFlusher   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
 FlushWriter   1 5 0
Looks suspiciously like 
http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E

What version are you on ? 

Are there any ERROR log messages before this ? 

Are you seeing MutationStage back up ? 

Are you see log messages from GCInspector ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 2:16 AM, Michael Theroux mthero...@yahoo.com wrote:

 Hello,
 
 While under load, we have occasionally been seeing messages dropped errors 
 in our cassandra log.  Doing some research, I understand this is part of 
 Cassandra's design to shed load, and we should look at the tpstats-like 
 output to determine what should be done to resolve the situation.  Typically, 
 you will see lots of messages blocked or pending, and that might be an 
 indicator that a specific part of hardware needs to be 
 improved/tuned/upgraded.  
 
 However, looking at the output we are getting, I'm finding it difficult to 
 see what needs to be tuned, as it looks to me cassandra is handling the load 
 within the mutation stage:
 
 NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 
 658) 119 MUTATION messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
 Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
 ReadStage 3 3 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
 RequestResponseStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
 ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
 MutationStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
 ReplicateOnWriteStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
 GossipStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
 AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
 MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
 StreamStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
 MemtablePostFlusher   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
 FlushWriter   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
 MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
 commitlog_archiver0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
 InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
 AntiEntropySessions   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
 HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) 
 CompactionManager 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) 
 MessagingServicen/a   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) 
 Cache Type Size Capacity   
 KeysToSave Provider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) 
 KeyCache2184533  2184533  
 all 
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) 
 RowCache  00  
 all  org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) 
 ColumnFamilyMemtable ops,data
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116) 

Re: Stream definition is lost after server restart

2012-09-17 Thread Ishan Thilina
Sorry,

Forgot to mention that I'm using Cassandra 1.1.3
--
Thank you..!
-
071-6372089

Ishan's info: www.ishans.info
මගේ සටහන්: www.siblog.ishans.info
Ishan's way: www.blog.ishans.info
-



On Mon, Sep 17, 2012 at 9:32 PM, Ishan Thilina is...@ishans.info wrote:

 Hi all,

 I am currently working on a project which uses Cassandra. I have a task
 running in my server which will periodically look at a certain set of
 pre-defined data (of the server) and writes them to Cassandra. The
 procedure for this work is as follows.

 1. I give a name and a version to the task.

 2. I configure what data should the task monitor.

 3. The task will then look if a stream definition exists for the task
 using the task name and its version.

 4. If a definition does not exist, then the task will create a definition
 (By looking at the types of data to be monitored).

 5. Then (or if a stream definition exists) the task will write the data to
 Cassandra

 6. The task will repeat the steps 3 to 5 forever (even after server
 restart).


 Please note that there can be multiple tasks like this monitoring
 different sets of data.


 The problem occurs when the server is used for few days and when several
 (around 100) stream definitions are created, I have observed that after the
 server is restarted, a stream definition does not exist exception is thrown
 in the step 3. I manually checked and the stream definition actually exists.

 When a new server is used (with a clean Cassandra server), then everything
 works fine for few days. But most of the time after a few days, the same
 issue arises.

 Has anyone experienced this..?



 --
 Thank you..!
 -
 071-6372089

 Ishan's info: www.ishans.info
 මගේ සටහන්: www.siblog.ishans.info
 Ishan's way: www.blog.ishans.info
 -




HTimedOutException and cluster not working

2012-09-17 Thread Jason Wee
Hello,

A context to our environment, we have a clusters of 9 nodes with a few
keyspaces. The client write to the cluster with consistency level of one to
a keyspace in the cluster with a replication factor of 3. The hector client
is configured such that all the nodes in cluster is specified and so that
we would want to ensure that at any write request, two nodes, can fail and
one write is succcess to the cluster node.

However, under certain situation, we seen in the log, HTimedOutException is
logged during writing to the cluster. Hector client thus failover to the
next node in the cluster but what we noticed is that, the same exception,
HTimedOutException is logged for all the nodes. This result that the
cluster is not working as a whole. Logically, we checked all the nodes in
the cluster for load. Only node-3 seem to have high pending MutationStage
when nodetool tpstats is run. Other nodes are fine with 0 active and 0
pending for all the stages.

/nodetool -h localhost tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 6983 0 0
RequestResponseStage 0 0 1252368951 0 0
MutationStage 16 2177067 879092633 0 0
ReadRepairStage 0 0 3648106 0 0
ReplicateOnWriteStage 0 0 33722610 0 0
GossipStage 0 0 20504608 0 0
AntiEntropyStage 0 0 1197 0 0
MigrationStage 0 0 89 0 0
MemtablePostFlusher 0 0 5659 0 0
StreamStage 0 0 296 0 0
FlushWriter 0 0 5616 0 1321
MiscStage 0 0 5964 0 0
AntiEntropySessions 0 0 88 0 0
InternalResponseStage 0 0 27 0 0
HintedHandoff 1 2 5976 0 0

Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
BINARY 0
READ 178
MUTATION 17467
REQUEST_RESPONSE 0

We proceed to check if there is any compaction in node-3 and found out the
following:

./nodetool -hlocalhost compactionstats
pending tasks: 196
compaction type keyspace column family bytes compacted bytes total progress
Cleanup MyKeyspace MyCF 6946398685 10230720119 67.90%


Question:
* with a replication factor of 3 in the keyspace and client write
consistency
  level of one, in the situation above, and the current hector client
settings
  and cluster settings, it should be possible in this scenario, write
success
  on one of the nodes even though node-3 is too busy or failing for any
reason?

* when hector client failover to other nodes, basically all the nodes fail,
why
  is this so?

* what factors that increase MutationStage active and pending values?

Thank you for any comments and insight

Regards,
Jason