from:"Jabbar Azam"

Re: Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam

Hello,

I've just changed my materialized view to have one partition key. The view
gets generated now.

After some refactoring I found that I didn't need a composite primary key
at all. However if I later need one then I'll use a UDT. If it works...

On Wed, 10 Feb 2016 at 13:04 DuyHai Doan <doanduy...@gmail.com> wrote:

> You can't have more than 1 non-pk column from the base table as primary
> key column of the view. All is explained here:
> http://www.doanduyhai.com/blog/?p=1930
>
> On Wed, Feb 10, 2016 at 10:43 AM, Abdul Jabbar Azam <aja...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I tried creating a material view using a composite partition key but I
>> got an error. I can't remember the error but it was complaining about the
>> presence of the second field in the partition key.
>>
>> Has anybody experienced this or have a workaround. I haven't tried UDT's
>> yet.
>>
>>
>> --
>> Regards
>>
>> Abdul Jabbar Azam
>> twitter: @ajazam
>>
>
> --
Regards

Abdul Jabbar Azam
twitter: @ajazam

Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam

Hello,

I tried creating a material view using a composite partition key but I got
an error. I can't remember the error but it was complaining about the
presence of the second field in the partition key.

Has anybody experienced this or have a workaround. I haven't tried UDT's
yet.


-- 
Regards

Abdul Jabbar Azam
twitter: @ajazam

Re: Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam

Ah. I think that's where I'm going wrong. I'll have a look when I get home.

On Wed, 10 Feb 2016 at 13:04 DuyHai Doan <doanduy...@gmail.com> wrote:

> You can't have more than 1 non-pk column from the base table as primary
> key column of the view. All is explained here:
> http://www.doanduyhai.com/blog/?p=1930
>
> On Wed, Feb 10, 2016 at 10:43 AM, Abdul Jabbar Azam <aja...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I tried creating a material view using a composite partition key but I
>> got an error. I can't remember the error but it was complaining about the
>> presence of the second field in the partition key.
>>
>> Has anybody experienced this or have a workaround. I haven't tried UDT's
>> yet.
>>
>>
>> --
>> Regards
>>
>> Abdul Jabbar Azam
>> twitter: @ajazam
>>
>
> --
Regards

Abdul Jabbar Azam
twitter: @ajazam

cassandra client testing

2016-02-09 Thread Abdul Jabbar Azam

Hello,

What do people do to test their cassandra client code? Do you

a) mock out the cassandra code
b) use a framework which simulates cassandra
c) actually use cassandra, perhaps inside docker


-- 
Regards

Abdul Jabbar Azam
twitter: @ajazam

Re: cassandra client testing

2016-02-09 Thread Abdul Jabbar Azam

This looks really good. I can see in master that java driver 3.0 support
has been added.

I can't see out how to generate exceptions. I'd like to test my akka
supervisor hierarchy as well.

On Tue, 9 Feb 2016 at 22:48 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

> http://www.scassandra.org/
>
> From: Abdul Jabbar Azam
> Reply-To: "user@cassandra.apache.org"
> Date: Tuesday, February 9, 2016 at 2:23 PM
> To: "user@cassandra.apache.org"
> Subject: cassandra client testing
>
> Hello,
>
> What do people do to test their cassandra client code? Do you
>
> a) mock out the cassandra code
> b) use a framework which simulates cassandra
> c) actually use cassandra, perhaps inside docker
>
>
> --
> Regards
>
> Abdul Jabbar Azam
> twitter: @ajazam
>
-- 
Regards

Abdul Jabbar Azam
twitter: @ajazam

Re: cassandra client testing

2016-02-09 Thread Abdul Jabbar Azam

Hello Will,

I'll give scassandra a try first, otherwise use a test keyspace.

On Tue, 9 Feb 2016 at 22:52 Will Hayworth <whaywo...@atlassian.com> wrote:

> I've never seen Scassandra before--neat!
>
> For what it's worth, we just use a test keyspace with a lower RF (that is
> to say, 1). The tables are identical to our prod keyspace, but the
> permissions are different for the user on our Bamboo instances so that we
> can test things like table creation etc.
>
> ___
> Will Hayworth
> Developer, Engagement Engine
> Atlassian
>
> My pronoun is "they". <http://pronoun.is/they>
>
>
>
> On Tue, Feb 9, 2016 at 2:47 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> http://www.scassandra.org/
>>
>> From: Abdul Jabbar Azam
>> Reply-To: "user@cassandra.apache.org"
>> Date: Tuesday, February 9, 2016 at 2:23 PM
>> To: "user@cassandra.apache.org"
>> Subject: cassandra client testing
>>
>> Hello,
>>
>> What do people do to test their cassandra client code? Do you
>>
>> a) mock out the cassandra code
>> b) use a framework which simulates cassandra
>> c) actually use cassandra, perhaps inside docker
>>
>>
>> --
>> Regards
>>
>> Abdul Jabbar Azam
>> twitter: @ajazam
>>
>
> --
Regards

Abdul Jabbar Azam
twitter: @ajazam

Re: What are the best ways to learn Apache Cassandra

2015-12-19 Thread Abdul Jabbar Azam

The documentation at www.datastax.com is very good. So I would recommend
that.

On Sat, 19 Dec 2015, 09:21 Akhil Mehra <akhilme...@gmail.com> wrote:

> What are some things you wish you knew when you started learning Apache
> Cassandra.
>
> What are some of the best resources you have come across to learn Apache
> Cassandra. Books, blogs etc. I am looking for tips on key concepts,
> principles that you wish you were exposed to when you started learning
> Apache Cassandra.
>
> What were the main pain points when trying to get to grips with Cassandra.
>
> Essentially I am looking for all tips that will help shorten the learning
> curve.
>
> Thanks
> Regards,
> Akhil Mehra
>
-- 
Regards

Abdul Jabbar Azam
twitter: @ajazam

Re: Using Cassandra for geospacial search

2015-01-26 Thread Jabbar Azam

Hello,

You'll find this useful
http://www.slideshare.net/mobile/mmalone/working-with-dimensional-data-in-distributed-hash-tables

Its how simplegeo used geohashing and Cassandra for geolocation.

On Mon, 26 Jan 2015 15:48 SEGALIS Morgan msega...@gmail.com wrote:

 Hi everyone,

 I wanted to know if someone has a feedback using geoHash algorithme with
 cassandra ?

 I will have to create a nearby functionnality soon, and I really would
 like to do it with cassandra for it's scalability, otherwise the smart
 choice would be MongoDB apparently.

 Is Cassandra can be used to do geospacial search (with some kind of
 radius) while being fast and scalable ?

 Thanks.

 --
 Morgan SEGALIS

Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?

2015-01-03 Thread Jabbar Azam

Hello,

Or you can have a look at akka http://www.akka.io for event processing and
use cassandra for persistence(Peters suggestion).

On Sat Jan 03 2015 at 11:59:45 AM Peter Lin wool...@gmail.com wrote:


 It looks like you're using the wrong tool and architecture.

 If the use case really needs continuous query like event processing, use
 an ESP product to do that. You can still store data in Cassandra for
 persistence .

 The design you want is to have two paths: event stream and persistence. At
 the entry point, the system makes parallel calls. One goes to a messaging
 system that feeds the ESP and a second that calls Cassandra


 Sent from my iPhone

 On Jan 3, 2015, at 5:46 AM, Hugo José Pinto hugo.pi...@inovaworks.com
 wrote:

 Hello.

 We're currently using Hazelcast (http://hazelcast.org/) as a distributed
 in-memory data grid. That's been working sort-of-well for us, but going
 solely in-memory has exhausted its path in our use case, and we're
 considering porting our application to a NoSQL persistent store. After the
 usual comparisons and evaluations, we're borderline close to picking
 Cassandra, plus eventually Spark for analytics.

 Nonetheless, there is a gap in our architectural needs that we're still
 not grasping how to solve in Cassandra (with or without Spark): Hazelcast
 allows us to create a Continuous Query in that, whenever a row is
 added/removed/modified from the clause's resultset, Hazelcast calls up back
 with the corresponding notification. We use this to continuously update the
 clients via AJAX streaming with the new/changed rows.

 This is probably a conceptual mismatch we're making, so - how to best
 address this use case in Cassandra (with or without Spark's help)? Is there
 something in the API that allows for Continuous Queries on key/clause
 changes (haven't found it)? Is there some other way to get a stream of
 key/clause updates? Events of some sort?

 I'm aware that we could, eventually, periodically poll Cassandra, but in
 our use case, the client is potentially interested in a large number of
 table clause notifications (think all changes to Ship positions on
 California's coastline), and iterating out of the store would kill the
 streamer's scalability.

 Hence, the magic question: what are we missing? Is Cassandra the wrong
 tool for the job? Are we not aware of a particular part of the API or
 external library in/outside the apache realm that would allow for this?

 Many thanks for any assistance!

 Hugo

Re: Wide rows best practices and GC impact

2014-12-04 Thread Jabbar Azam

Hello,

I saw this earlier yesterday but didn't want to reply because I didn't know
what the cause was.

Basically I using wide rows with cassandra 1.x and was inserting data
constantly. After about 18 hours the JVM would crash with a dump file. For
some reason I removed the compaction throttling and the problem
disappeared. I've never really found out what the root cause was.


On Thu Dec 04 2014 at 2:49:57 AM Gianluca Borello gianl...@draios.com
wrote:

 Thanks Robert, I really appreciate your help!

 I'm still unsure why Cassandra 2.1 seem to perform much better in that
 same scenario (even setting the same values of compaction threshold and
 number of compactors), but I guess we'll revise when we'll decide to
 upgrade 2.1 in production.

 On Dec 3, 2014 6:33 PM, Robert Coli rc...@eventbrite.com wrote:
 
  On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello gianl...@draios.com
 wrote:
 
  We mainly store time series-like data, where each data point is a
 binary blob of 5-20KB. We use wide rows, and try to put in the same row all
 the data that we usually need in a single query (but not more than that).
 As a result, our application logic is very simple (since we have to do just
 one query to read the data on average) and read/write response times are
 very satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:
 
 
  100mb is not HYOOOGE but is around the size where large rows can cause
 heap pressure.
 
  You seem to be unclear on the implications of pending compactions,
 however.
 
  Briefly, pending compactions indicate that you have more SSTables than
 you should. As compaction both merges row versions and reduces the number
 of SSTables, a high number of pending compactions causes problems
 associated with both having too many row versions (fragmentation) and a
 large number of SSTables (per-SSTable heap/memory (depending on version)
 overhead like bloom filters and index samples). In your case, it seems the
 problem is probably just the compaction throttle being too low.
 
  My conjecture is that, given your normal data size and read/write
 workload, you are relatively close to GC pre-fail when compaction is
 working. When it stops working, you relatively quickly get into a state
 where you exhaust heap because you have too many SSTables.
 
  =Rob
  http://twitter.com/rcolidba
  PS - Given 30GB of RAM on the machine, you could consider investigating
 large-heap configurations, rbranson from Instagram has some slides out
 there on the topic. What you pay is longer stop the world GCs, IOW latency
 if you happen to be talking to a replica node when it pauses.

Re: Storing time-series and geospatial data in C*

2014-11-27 Thread Jabbar Azam

Spico,
Here's a link flor the time series data
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

You'll also need to understand the composite key format
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refCompositePk.html

Mike Malone has done videos and slides on how they used an older version of
cassandra for storing geo
information http://readwrite.com/2011/02/17/video-simplegeo-cassandra

Or you can use elastic search for for working with geospatial information.
http://blog.florian-hopf.de/2014/08/use-cases-for-elasticsearch-geospatial.html

A word of warning though with elastic search. It does not provide simple
linear scalability like cassandra, nor is it easy to setup for cross
datacentre operation.


Datastax enterprise has Solr integrated so you could use that
http://digbigdata.com/geospatial-search-cassandra-datastax-enterprise/

Jabbar Azam



On Thu Nov 27 2014 at 12:39:59 PM Spico Florin spicoflo...@gmail.com
wrote:

 Hello!
   Can you please recommend me some new articles and case studies were
 Cassandra was used to store time-series and geo-spatial data? I'm
 particular interested in best practices, data models and retrieval
 techniques.
  Thanks.
  Regards,
  Florin

Re: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam

Hello Alexey,

The node count is 20 per site and there will be two sites. RF=3. But since
the software isn't complete and the database code is going through a
rewrite we aren't sure about space requirements. The node count is only a
guess, bases on the number of dev nodes in use. We will have better
information when the rewrite is done and testing resumes.

The data will be time series data. It was binary blobs originally but we
have found that the new datastax c# drivers have improved alot in terms of
read performance.

I'm curious. What is your definition of commodity. My IT people seem to
think that the servers must be super robust. Personally I'm not sure if
that should be the case.

The node

Thanks

Jabbar Azam

On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote:

 Cassandra is a cluster itself, it's not necessary to have redundant each
 node. Cassandra has replication for that. And also Cassandra is designed to
 run in multiple data center - am think that redundant policy is applicable
 for you. Only thing from your saying you can deploy is raid10, other don't
 make any sense. As you are in stage of designing you cluster, please
 provide some numbers: how many data will be stored on each node, how many
 nodes would you have? What type of data will be stored in cluster: binary
 object o something time series?

 Cassandra is designed to run on commodity hardware.

 Отправлено с iPad

  8 нояб. 2014 г., в 6:26, Jabbar Azam aja...@gmail.com написал(а):
 
  Hello all,
 
  My work will be deploying a cassandra cluster next year. Due to internal
 wrangling we can't seem to agree on the hardware. The software hasn't been
 finished, but management are asking for a ballpark figure for the hardware
 costs.
 
  The problem is the IT team are saying the nodes need to have multiple
 points of redundancy
 
  e.g. dual power supplies, dual nics, SSD's configured in raid 10.
 
 
  The software team is saying that due to cassandras resilient nature, due
 to the way data is distributed and scalability that lots of cheap boes
 should be used. So they have been taling about self build consumer grade
 boxes with single nics, PSU's single SSDs etc.
 
  Obviously the self build boxes will cost a fraction of the price, but
 each box is not as resilient as the first option.
 
  We don;t use any cloud technologies, so that's out of the question.
 
  My question is what do people use in the real world in terms of node
 resiliancy when running a cassandra cluster?
 
  Write now the team is only thinking of hosting cassandra on the nodes.
 I'll see if I can twist their arms and see the light with Apache Spark.
 
  Obviously there are other tiers of servers, but they won't be running
 cassandra.
 
 
 
 
 
  Thanks
 
  Jabbar Azam

Re: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam

Hello Jack,

Some really good points. I never thought of issues with the JVM or OOM
issues.

Thanks

Jabbar Azam

On 8 November 2014 16:52, Jack Krupansky j...@basetechnology.com wrote:

   About  the only thing you can say is two specific points:

 1. A more resilient node is great, but it in no ways reduces or eliminates
 the need total nodes. Sometimes nodes become inaccessible due to network
 outages or system maintenance (e.g., software upgrades), or the vagaries of
 Java JVM and OOM issues.
 2. Replication redundancy is also for supporting higher load, not just
 availability on node outage.

 -- Jack Krupansky

  *From:* Jabbar Azam aja...@gmail.com
 *Sent:* Friday, November 7, 2014 3:24 PM
 *To:* user@cassandra.apache.org
 *Subject:* Redundancy inside a cassandra node

  Hello all,

 My work will be deploying a cassandra cluster next year. Due to internal
 wrangling we can't seem to agree on the hardware. The software hasn't been
 finished, but management are asking for a ballpark figure for the hardware
 costs.

 The problem is the IT team are saying the nodes need to have multiple
 points of redundancy

 e.g. dual power supplies, dual nics, SSD's configured in raid 10.


 The software team is saying that due to cassandras resilient nature, due
 to the way data is distributed and scalability that lots of cheap boes
 should be used. So they have been taling about self build consumer grade
 boxes with single nics, PSU's single SSDs etc.

 Obviously the self build boxes will cost a fraction of the price, but each
 box is not as resilient as the first option.

 We don;t use any cloud technologies, so that's out of the question.

 My question is what do people use in the real world in terms of node
 resiliancy when running a cassandra cluster?

 Write now the team is only thinking of hosting cassandra on the nodes.
 I'll see if I can twist their arms and see the light with Apache Spark.

 Obviously there are other tiers of servers, but they won't be running
 cassandra.





  Thanks

 Jabbar Azam

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam

With regards to money I think it's always a good idea to find a cost
effective solution. The problem is different people have different
interpretations of what cost effectiveness means. I'm referring to my
organisation. ;). I'm sure it happens in other organisations. Biases,
politics, experience, how stuff is currently done dictates how new
solutions are created.

I think the idea of not using redundancy, goes against current thinking
unfortunately. Especially not using raid 10. I think the problem may be due
to lack of know how of dev ops and tools like cobbler and ansible, chef and
puppet.. I'm working on this, but it's hard work doing this in my spare
time.

Do you build your own nodes, or use a well known brand like Dell or HP.
Dell recommended R720 nodes for the cassandra nodes or the R320 nodes.

We have built our own dev nodes from consumer grade kit but becuase they
have no redundancy they are not taken seriously for production nodes.
They're not rack mount, which is a big no with respect to the IT department.




Thanks

Jabbar Azam

On 8 November 2014 12:31, Plotnik, Alexey aplot...@rhonda.ru wrote:

  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
 problem is money. If your IT people have a $$$ they can deploy Cassandra on
 super robust hardware with triple power supply of course. But why then you
 need Cassandra? Only for scalability?

 The idea of high available clusters is to get robustness from availability
 (not from hardware reliability). More availability (more nodes) you have -
 more money you need to buy hardware. Cassandra is the most high available
 system on the planet - it scaled horizontally to any number of nodes. You
 have time series data, you can set replication factor  3 if needed.

 There is a concept of network topology in Cassandra - you can specify on
 which *failure domain* (racks or independent power lines) your nodes
 installed on, and then replication will be computed correspondingly to
 store replicas of a specified data on a different failure domains. The same
 is for DC - there is a concept of data center in Cassandra topology, it
 knows about your data centers.

 You should think not about hardware but about your data model - is
 Cassandra applicable for you domain? Thinks about queries to your
 data. Cassandra is actually a key value storage (documentation says it's a
 column based storage, but it's just an CQL-abstraction over key and binary
 value, nothing special except counters) so be very careful in designing
 your data model.

 Anyway, let me answer your original question:
  what do people use in the real world in terms of node resiliancy when
 running a cassandra cluster?

 Nothing because Cassandra is high available system. They use SSDs if they
 need speed. They do not use Raid10 on the node, they don't use dual power
 as well, because it's not cheap in cluster of many nodes and have no sense
 because reliability is ensured by replication in large clusters. Not sure
 about dual NICs, network reliability is ensured by distributing your
 cluster across multiple data centers.

 We're using single SSD and single HDD on each node (we symlink some CF
 folders to other disk). SSD for CFs where we need low latency, HDD for
 binary data. If one of them fails, replication save us and we have
 time to deploy new node and load data from replicas with Cassandra repair
 feature back to original node. And we have no problem with it, node fail
 sometimes, but it doesn't affect customers. That is.


 -- Original Message --
 From: Jabbar Azam aja...@gmail.com
 To: user@cassandra.apache.org user@cassandra.apache.org
 Sent: 08.11.2014 19:43:18
 Subject: Re: Redundancy inside a cassandra node


 Hello Alexey,

 The node count is 20 per site and there will be two sites. RF=3. But since
 the software isn't complete and the database code is going through a
 rewrite we aren't sure about space requirements. The node count is only a
 guess, bases on the number of dev nodes in use. We will have better
 information when the rewrite is done and testing resumes.

 The data will be time series data. It was binary blobs originally but we
 have found that the new datastax c# drivers have improved alot in terms of
 read performance.

 I'm curious. What is your definition of commodity. My IT people seem to
 think that the servers must be super robust. Personally I'm not sure if
 that should be the case.

 The node

  Thanks

 Jabbar Azam

 On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote:

 Cassandra is a cluster itself, it's not necessary to have redundant each
 node. Cassandra has replication for that. And also Cassandra is designed to
 run in multiple data center - am think that redundant policy is applicable
 for you. Only thing from your saying you can deploy is raid10, other don't
 make any sense. As you are in stage of designing you cluster, please
 provide some numbers: how many data will be stored on each node, how many
 nodes

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam

Hello Eric,

You make a good point about resiliency being applied at a higher level in
the stack.

Thanks

Jabbar Azam

On 8 November 2014 14:24, Eric Stevens migh...@gmail.com wrote:

  They do not use Raid10 on the node, they don't use dual power as well,
 because it's not cheap in cluster of many nodes

 I think the point here is that money spent on traditional failure
 avoidance models is better spent in a Cassandra cluster by instead having
 more nodes of less expensive hardware.  Rather than redundant disks network
 ports and power supplies, spend that money on another set of nodes in a
 different topological (and probably physical) rack.  The parallel to
 having redundant disk arrays is to increase replication factor (RF=3 is
 already one replica better than Raid 10, and with fewer SPOFs).

 The only reason I can think you'd want to double down on hardware failover
 like the traditional model is if you are constrained in your data center
 (eg, space or cooling) and you'd rather run machines which are individually
 physically more resilient in exchange for running a lower RF.

 On Sat Nov 08 2014 at 5:32:22 AM Plotnik, Alexey aplot...@rhonda.ru
 wrote:

  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
 problem is money. If your IT people have a $$$ they can deploy Cassandra on
 super robust hardware with triple power supply of course. But why then you
 need Cassandra? Only for scalability?

 The idea of high available clusters is to get robustness from
 availability (not from hardware reliability). More availability (more
 nodes) you have - more money you need to buy hardware. Cassandra is the
 most high available system on the planet - it scaled horizontally to any
 number of nodes. You have time series data, you can set replication
 factor  3 if needed.

 There is a concept of network topology in Cassandra - you can specify on
 which *failure domain* (racks or independent power lines) your nodes
 installed on, and then replication will be computed correspondingly to
 store replicas of a specified data on a different failure domains. The same
 is for DC - there is a concept of data center in Cassandra topology, it
 knows about your data centers.

 You should think not about hardware but about your data model - is
 Cassandra applicable for you domain? Thinks about queries to your
 data. Cassandra is actually a key value storage (documentation says it's a
 column based storage, but it's just an CQL-abstraction over key and binary
 value, nothing special except counters) so be very careful in designing
 your data model.

 Anyway, let me answer your original question:
  what do people use in the real world in terms of node resiliancy when
 running a cassandra cluster?

 Nothing because Cassandra is high available system. They use SSDs if
 they need speed. They do not use Raid10 on the node, they don't use dual
 power as well, because it's not cheap in cluster of many nodes and have no
 sense because reliability is ensured by replication in large clusters. Not
 sure about dual NICs, network reliability is ensured by distributing your
 cluster across multiple data centers.

 We're using single SSD and single HDD on each node (we symlink some CF
 folders to other disk). SSD for CFs where we need low latency, HDD for
 binary data. If one of them fails, replication save us and we have
 time to deploy new node and load data from replicas with Cassandra repair
 feature back to original node. And we have no problem with it, node fail
 sometimes, but it doesn't affect customers. That is.


 -- Original Message --
 From: Jabbar Azam aja...@gmail.com
 To: user@cassandra.apache.org user@cassandra.apache.org
 Sent: 08.11.2014 19:43:18
 Subject: Re: Redundancy inside a cassandra node


 Hello Alexey,

 The node count is 20 per site and there will be two sites. RF=3. But
 since the software isn't complete and the database code is going through a
 rewrite we aren't sure about space requirements. The node count is only a
 guess, bases on the number of dev nodes in use. We will have better
 information when the rewrite is done and testing resumes.

 The data will be time series data. It was binary blobs originally but we
 have found that the new datastax c# drivers have improved alot in terms of
 read performance.

 I'm curious. What is your definition of commodity. My IT people seem to
 think that the servers must be super robust. Personally I'm not sure if
 that should be the case.

 The node

  Thanks

 Jabbar Azam

 On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote:

 Cassandra is a cluster itself, it's not necessary to have redundant each
 node. Cassandra has replication for that. And also Cassandra is designed to
 run in multiple data center - am think that redundant policy is applicable
 for you. Only thing from your saying you can deploy is raid10, other don't
 make any sense. As you are in stage of designing you cluster, please
 provide some numbers: how many

Redundancy inside a cassandra node

2014-11-07 Thread Jabbar Azam

Hello all,

My work will be deploying a cassandra cluster next year. Due to internal
wrangling we can't seem to agree on the hardware. The software hasn't been
finished, but management are asking for a ballpark figure for the hardware
costs.

The problem is the IT team are saying the nodes need to have multiple
points of redundancy

e.g. dual power supplies, dual nics, SSD's configured in raid 10.


The software team is saying that due to cassandras resilient nature, due to
the way data is distributed and scalability that lots of cheap boes should
be used. So they have been taling about self build consumer grade boxes
with single nics, PSU's single SSDs etc.

Obviously the self build boxes will cost a fraction of the price, but each
box is not as resilient as the first option.

We don;t use any cloud technologies, so that's out of the question.

My question is what do people use in the real world in terms of node
resiliancy when running a cassandra cluster?

Write now the team is only thinking of hosting cassandra on the nodes. I'll
see if I can twist their arms and see the light with Apache Spark.

Obviously there are other tiers of servers, but they won't be running
cassandra.





Thanks

Jabbar Azam

Re: Scala driver

2014-09-02 Thread Jabbar Azam

Hello,
I'm also using the Java driver. Its evolving the fastest and is simple to
use

Thanks

Jabbar Azam
On 2 Sep 2014 06:15, Gary Zhao garyz...@gmail.com wrote:

 Thanks Jan. I decided to use Java driver directly. It's not hard to use.


 On Sun, Aug 31, 2014 at 1:08 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:

 Hi Gary,

 On 31 Aug 2014, at 07:19, Gary Zhao garyz...@gmail.com wrote:

 Hi

 Could you recommend a Scala driver and share your experiences of using
 it. Im thinking if i use java driver in Scala directly


 I am using Martin’s approach without any problems:

 https://github.com/magro/play2-scala-cassandra-sample

 The actual mapping from Java to Scala futures for the async case is in


 https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala

 HTH,

 Jan



 Thanks

Re: Backup Cassandra to

2014-06-12 Thread Jabbar Azam

Yes, I never thought of that.

Thanks

Jabbar Azam

On 12 June 2014 19:45, Jeremy Jongsma jer...@barchart.com wrote:

That will not necessarily scale, and I wouldn't recommend it - your
backup node will need as much disk space as an entire replica of the
cluster data. For a cluster with a couple of nodes that may be OK, for
dozens of nodes, probably not. You also lose the ability to restore
individual nodes - the only way to replace a dead node is with a full
repair.

On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam aja...@gmail.com wrote:

There is another way. You create a cassandra node in it's own datacentre,
then any changes going to the main cluster will be replicated to this node.
You can backup from this node. In the event of a disaster the data from
both clusters and wiped and then replayed to the individual node. The data
will then be replicated to the main cluster.

This will also work for the case when the main cluster increases or
decreases in size.

Thanks

Jabbar Azam

On 12 June 2014 18:27, Andrew redmu...@gmail.com wrote:

There isn’t a lot of “actual documentation” on the act of backing up,
but I did research for my own company into the act of backing up and
unfortunately, you’re not going to have a similar setup as Oracle. There
are reasons for this, however.

If you have more than one replica of the data, that means each node in
the cluster will likely be holding it’s own unique set of data. So you
would need to back up the ENTIRE set of nodes in order to get an accurate
snapshot. Likewise, you would need to restore it to the cluster of the
same size in order to restore it (and then run refresh to tell Cassandra to
reload the tables from disk).

Copying the snapshots is easy—it’s just a bunch of files in your data
directory. It’s even smaller if you use incremental snapshots. I’ll
admit, I’m no expert on tape drives, but I’d imagine it’s as easy as
copy/pasting the snapshots to the drive (or whatever the equivalent tape
drive operation is).

What you (and I, admittedly) would really like to see is a way to back
up all the logical *data*, and then simply replay it. This is possible on
Oracle because it’s typically restricted to either one (plus maybe one or
two standbys) that don’t “share” any data. What you could do, in theory,
is literally select all the data in the entire cluster and simply dump it
to a file—but this could take hours, days, or even weeks to complete,
depending on the size of your data, and then simply re-load it. This is
probably not a great solution, but hey—maybe it will work for you.

Netflix (thankfully) has posted a lot of their operational observations
and what not, including their utility Priam. In their documentation, they
include some overviews of what they use:
https://github.com/Netflix/Priam/wiki/Backups

Hope this helps!

Andrew

On June 12, 2014 at 6:18:57 AM, Jack Krupansky (j...@basetechnology.com)
wrote:

The doc for backing up – and restoring – Cassandra is here:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html

That doesn’t tell you how to move the “snapshot” to or from tape, but a
snapshot is the starting point for backing up Cassandra.

-- Jack Krupansky

*From:* Camacho, Maria (NSN - FI/Espoo) maria.cama...@nsn.com
*Sent:* Thursday, June 12, 2014 4:57 AM
*To:* user@cassandra.apache.org
*Subject:* Backup Cassandra to

Hi there,

I'm trying to find information/instructions about backing up and
restoring a Cassandra DB to and from a tape unit.

I was hopping someone in this forum could help me with this since I
could not find anything useful in Google :(

Thanks in advance,

Maria

Re: CQL query regarding indexes

2014-06-12 Thread Jabbar Azam

In this use case you don't need the secondary index. Instead use Primary
key(partition_id, senttime)

Thanks

Jabbar Azam
On 12 Jun 2014 23:44, Roshan codeva...@gmail.com wrote:

 Hi

 Cassandra - 2.0.8
 DataStax driver - 2.0.2

 I have create a keyspace and a table with indexes like below.
 CREATE TABLE services.messagepayload (
   partition_id uuid,
   messageid bigint,
   senttime timestamp,
   PRIMARY KEY (partition_id)
 ) WITH compression =
 { 'sstable_compression' : 'LZ4Compressor', 'chunk_length_kb' : 64 };

 CREATE INDEX idx_messagepayload_senttime ON services.messagepayload
 (senttime);

 While I am running the below query I am getting an exception.

 SELECT * FROM b_bank_services.messagepayload WHERE senttime=140154480
 AND senttime=140171760 ALLOW FILTERING;

 com.datastax.driver.core.exceptions.InvalidQueryException: No indexed
 columns present in by-columns clause with Equal operator

 Could someone can explain what's going on? I have create a index to the
 search column, but seems not working.

 Thanks.



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-regarding-indexes-tp7595122.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

Re: autoscaling cassandra cluster

2014-05-22 Thread Jabbar Azam

Netflix uses Scryer
http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.htmlfor
predictive and reactive autoscaling but they only refer to EC2
instances. They don't mention anything about cassandra scaling or adding
and removing nodes.

I've just looked at the priam wiki and it also doesn't mention scaling. It
also mentions that vnodes aren't fully supported. That's no use for me as
I'm using 2.x. The other issue, rather feature of cassandra, is that adding
a new node increases the load on the system so this surge would need to be
taken into account.

I think I'll leave this problem for more intelligent people than me and
concentrate on the application logic, which can scale by adding or removing
application and front end servers.

Thanks for all your comments.

Thanks

Jabbar Azam


On 22 May 2014 19:55, Robert Coli rc...@eventbrite.com wrote:

 On Wed, May 21, 2014 at 4:35 AM, Jabbar Azam aja...@gmail.com wrote:

 Has anybody got a cassandra cluster which autoscales depending on load or
 times of the day?


 Netflix probably does, managed with Priam.

 In general I personally do not consider Cassandra's mechanisms for joining
 and parting nodes to currently work well enough to consider designing a
 production system which would do so as part of regular operation.

 =Rob

autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam

Hello,

Has anybody got a cassandra cluster which autoscales depending on load or
times of the day?

I've seen the documentation on the datastax website and that only mentioned
adding and removing nodes, unless I've missed something.

I want to know how to do this for the google compute engine. This isn't for
a production system but a test system(multiple nodes) where I want to
learn. I'm not sure how to check the performance of the cluster, whether I
use one performance metric or a mix of performance metrics and then invoke
a script to add or remove nodes from the cluster.

I'd be interested to know whether people out there are autoscaling
cassandra on demand.

Thanks

Jabbar Azam

Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam

Hello Prem,

I'm trying to find out whether people are autoscaling up and down
automatically, not manually. I'm also interested in whether they are using
a cloud based solution and creating and destroying instances.

I've found the following regarding GCE
https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
how instances can be created and destroyed.

 I


Thanks

Jabbar Azam


On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load or
 times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This isn't
 for a production system but a test system(multiple nodes) where I want to
 learn. I'm not sure how to check the performance of the cluster, whether I
 use one performance metric or a mix of performance metrics and then invoke
 a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam

Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam

That sounds interesting. I was thinking of using coreos with docker
containers for the business logic, frontend and Cassandra. I'll also have a
look at cassandra-mesos

Thanks

Jabbar Azam
On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

I agree with Prem, but recently a guy send this promising project called
Mesos in this list.
https://github.com/mesosphere/cassandra-mesos
One of its goals is to make scaling easier.
I don’t have any personal opinion yet but maybe you could give it a try.

Regards,
Panagiotis

On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

Hello Prem,

I'm trying to find out whether people are autoscaling up and down
automatically, not manually. I'm also interested in whether they are using
a cloud based solution and creating and destroying instances.

I've found the following regarding GCE
https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
how instances can be created and destroyed.

Thanks

Jabbar Azam

On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

Hi Jabbar,
with vnodes, scaling up should not be a problem. You could just add a
machines with the cluster/seed/datacenter conf and it should join the
cluster.
Scaling down has to be manual where you drain the node and decommission
it.

thanks,
Prem

On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

Hello,

Has anybody got a cassandra cluster which autoscales depending on load
or times of the day?

I've seen the documentation on the datastax website and that only
mentioned adding and removing nodes, unless I've missed something.

I want to know how to do this for the google compute engine. This isn't
for a production system but a test system(multiple nodes) where I want to
learn. I'm not sure how to check the performance of the cluster, whether I
use one performance metric or a mix of performance metrics and then invoke
a script to add or remove nodes from the cluster.

I'd be interested to know whether people out there are autoscaling
cassandra on demand.

Thanks

Jabbar Azam

Re: How to enable a Cassandra node to participate in multiple cluster

2014-05-21 Thread Jabbar Azam

Hello Salih,

As far as I'm aware a node can't be in two clusters. In the casdandra.yaml
file you can only specify one cluster. The storage system and all the
protocols would have to be modified so information about multiple clusters
is passed around. I'm sure somebody else could give you more and accurate
detail.

If your saving on hardware then you could think about using docker or
virtualisation , but you'll have problems with performance. A bit like the
problems you get when you have small instances at Amazon.

Thanks

Jabbar Azam
On 21 May 2014 19:07, Salih Kardan karda...@gmail.com wrote:

 Hello everyone,

 I want to use Cassandra cluster for some specific purpose across data
 centers. What I want to figure out is how can I enable a single Cassandra
 node to participate in multiple clusters at the same time? I googled it,
 however I could not find any use case of Cassandra as I mentioned above. Is
 this possible with the current architecture of Cassandra?

 Salih

Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam

Hello James,

How do you alter your cassandra.yaml file with each nodes IP address?

I want to use the scaling software(which I've not got yet) to create and
destroy the GCE instances. I want to use fleet to deploy and undeploy the
cassandra nodes inside the docker instances. I do realise I will have to
run nodetool to add and remove the nodes from the cluster and also the node
cleanup.

Disclaimer: this is not a production system but something Im experimenting
with in my own time.

Thanks

Jabbar Azam

On 21 May 2014 15:51, James Horey j...@opencore.io wrote:

If you're interested and/or need some Cassandra docker images let me know
I'll shoot you a link.

James

Sent from my iPhone

On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:

That sounds interesting. I was thinking of using coreos with docker
containers for the business logic, frontend and Cassandra. I'll also have a
look at cassandra-mesos

Thanks

Jabbar Azam
On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

Regards,
Panagiotis

On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

Hello Prem,

I've found the following regarding GCE
https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
how instances can be created and destroyed.

Thanks

Jabbar Azam

On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

thanks,
Prem

On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

Hello,

Has anybody got a cassandra cluster which autoscales depending on load
or times of the day?

I've seen the documentation on the datastax website and that only
mentioned adding and removing nodes, unless I've missed something.

I want to know how to do this for the google compute engine. This
isn't for a production system but a test system(multiple nodes) where I
want to learn. I'm not sure how to check the performance of the cluster,
whether I use one performance metric or a mix of performance metrics and
then invoke a script to add or remove nodes from the cluster.

I'd be interested to know whether people out there are autoscaling
cassandra on demand.

Thanks

Jabbar Azam

Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam

Hello Ben,

I''m looking forward to reading the netflix links. Thanks :)

Thanks

Jabbar Azam

On 21 May 2014 18:08, Ben Bromhead b...@instaclustr.com wrote:

The mechanics for it are simple compared to figuring out when to scale,
especially when you want to be scaling before peak load on your cluster
(adding and removing nodes puts additional load on your cluster).

We are currently building our own in-house solution for this for our
customers. If you want to have a go at it yourself, this is a good starting
point:

http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html

http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html

Most of this is fairly specific to Netflix, but an interesting read
nonetheless.

Datastax OpsCenter also provides capacity planning and forecasting and can
provide an easy set of metrics you can make your scaling decisions on.

http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

Ben Bromhead
Instaclustr | www.instaclustr.com |
@instaclustrhttp://twitter.com/instaclustr |
+61 415 936 359

On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote:

If you're interested and/or need some Cassandra docker images let me know
I'll shoot you a link.

James

Sent from my iPhone

On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:

That sounds interesting. I was thinking of using coreos with docker
containers for the business logic, frontend and Cassandra. I'll also have a
look at cassandra-mesos

Thanks

Jabbar Azam
On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

Regards,
Panagiotis

On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

Hello Prem,

I've found the following regarding GCE
https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
how instances can be created and destroyed.

Thanks

Jabbar Azam

On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

thanks,
Prem

On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

Hello,

Has anybody got a cassandra cluster which autoscales depending on load
or times of the day?

I've seen the documentation on the datastax website and that only
mentioned adding and removing nodes, unless I've missed something.

I want to know how to do this for the google compute engine. This
isn't for a production system but a test system(multiple nodes) where I
want to learn. I'm not sure how to check the performance of the cluster,
whether I use one performance metric or a mix of performance metrics and
then invoke a script to add or remove nodes from the cluster.

I'd be interested to know whether people out there are autoscaling
cassandra on demand.

Thanks

Jabbar Azam

Re: idempotent counters

2014-05-19 Thread Jabbar Azam

Thanks Aaron. I've mitigated this by removing the dependency on idempotent
counters. But its good to know the limitations of counters.

Thanks

Jabbar Azam
On 19 May 2014 08:36, Aaron Morton aa...@thelastpickle.com wrote:

Does anybody else use another technique for achieving this idempotency
with counters?

The idempotency problem with counters has to do with what will happen when
you get a timeout. If you reply the write there is a chance of the
increment been applied twice. This is inherent in the current design.

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 1:07 am, Jabbar Azam aja...@gmail.com wrote:

Hello,

Do people use counters when they want to have idempotent operations in
cassandra?

I have a use case for using a counter to check for a count of objects in a
partition. If the counter is more than some value then the data in the
partition is moved into two different partitions. I can't work out how to
do this splitting and recover if a problem happens during modification of
the counter.

http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2explains
that counters shouldn't be used if you want idempotency. I would
agree, but the alternative is not very elegant. I would have to manully
count the objects in a partition and then move the data and repeat the
operation if something went wrong.

It is less resource intensive to read a counter value to see if a
partition needs splitting then to read all the objects in a partition. The
counter value can be stored in its own table sorting in descending order of
the counter value.

Does anybody else use another technique for achieving this idempotency
with counters?

I'm using cassandra 2.0.7.

Thanks

Jabbar Azam

Re: CQL Datatype in Cassandra

2013-11-06 Thread Jabbar Azam

Hello Techy Teck,

Couldn't find any evidence on the datastax website but found this
http://wiki.apache.org/cassandra/CassandraLimitations

which I believe is correct.


Thanks

Jabbar Azam


On 6 November 2013 20:19, Techy Teck comptechge...@gmail.com wrote:

 We are using CQL table like this -

 CREATE TABLE testing (
   description text,
   last_modified_date timeuuid,
   employee_id text,
   value text,
   PRIMARY KEY (employee_name, last_modified_date)
 )


 We have made description as text in the above table. I am thinking is
 there any limitations on text data type in CQL such as it can only have
 certain number of bytes and after that it will truncate?

 Any other limitations that I should be knowing? Should I use blob there?

Re: CQL Datatype in Cassandra

2013-11-06 Thread Jabbar Azam

Forget. The text value can be upto 2GB in size, but in practice it will be
less.

Thanks

Jabbar Azam


On 6 November 2013 21:12, Jabbar Azam aja...@gmail.com wrote:

 Hello Techy Teck,

 Couldn't find any evidence on the datastax website but found this
 http://wiki.apache.org/cassandra/CassandraLimitations

 which I believe is correct.


 Thanks

 Jabbar Azam


 On 6 November 2013 20:19, Techy Teck comptechge...@gmail.com wrote:

 We are using CQL table like this -

 CREATE TABLE testing (
   description text,
   last_modified_date timeuuid,
   employee_id text,
   value text,
   PRIMARY KEY (employee_name, last_modified_date)
 )


 We have made description as text in the above table. I am thinking is
 there any limitations on text data type in CQL such as it can only have
 certain number of bytes and after that it will truncate?

 Any other limitations that I should be knowing? Should I use blob there?

Re: videos of 2013 summit

2013-07-04 Thread Jabbar Azam

http://www.youtube.com/playlist?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU

Thanks

Jabbar Azam
On 4 Jul 2013 18:17, S Ahmed sahmed1...@gmail.com wrote:

 Hi,

 Are the videos online anywhere for the 2013 summit?

Re: Cassandra driver performance question...

2013-06-24 Thread Jabbar Azam

Hello tony,

I couldnt reply earlier because I've been decorating over the weekend so
have been a bit busy.

Let me know what's happens.

Out of couriosity why are you using and not a cql3 native driver?

Thanks

Jabbar Azam
On 24 Jun 2013 00:32, Tony Anecito adanec...@yahoo.com wrote:

Hi Jabbar,

I was able to get the performance issue resolved by reusing the
connection object. It will be interesting to see what happens when I use a
connection pool from a app server.

I still think it would be a good idea to have a minimal mode for metadata.
It is rare I use metadata.

Regards,
-Tony

*From:* Tony Anecito adanec...@yahoo.com
*To:* user@cassandra.apache.org user@cassandra.apache.org; Tony
Anecito adanec...@yahoo.com
*Sent:* Friday, June 21, 2013 9:33 PM
*Subject:* Re: Cassandra driver performance question...

Hi Jabbar,

I think I know what is going on. I happened accross a change mentioned by
the jdbc driver developers regarding metadata caching. Seems the metadata
caching was moved from the connection object to the preparedStatement
object. So I am wondering if the time difference I am seeing on the second
preparedStatement object is because of the Metadata is cached then.

So my question is how to test this theory? Is there a way to stop the
metadata from coming accross from Cassandra? A 20x performance improvement
would be nice to have.

Thanks,
-Tony

*From:* Tony Anecito adanec...@yahoo.com
*To:* user@cassandra.apache.org user@cassandra.apache.org
*Sent:* Friday, June 21, 2013 8:56 PM
*Subject:* Re: Cassandra driver performance question...

Thanks Jabbar,

I ran nodetool as suggested and it 0 latency for the row count I have.

I also ran cli list command for the table hit by my JDBC perparedStatement
and it was slow like 121msecs the first time I ran it and second time I ran
it it was 40msecs versus jdbc call of 38msecs to start with unless I run it
twice also and get 1.5-2.5msecs for executeQuery the second time the
preparedStatement is called.

I ran describe from cli for the table and it said caching is ALL which
is correct.

A real mystery and I need to understand better what is going on.

Regards,
-Tony

*From:* Jabbar Azam aja...@gmail.com
*To:* user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com
*Sent:* Friday, June 21, 2013 3:32 PM
*Subject:* Re: Cassandra driver performance question...

Hello Tony,

I would guess that the first queries data is put into the row cache and
the filesystem cache. The second query gets the data from the row cache and
or the filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will
definitely help. The following advice from Aaron Morton will also help

You can also see what it looks like from the server side.

nodetool proxyhistograms will show you full request latency recorded by the
coordinator.
nodetool cfhistograms will show you the local read latency, this is just the
time it takes
to read data on a replica and does not include network or wait times.

If the proxyhistograms is showing most requests running faster than your app
says it's your
app.

http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E

Thanks

Jabbar Azam

On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

Hi All,
I am using jdbc driver and noticed that if I run the same query twice the
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a
difference.

I am wondering how to setup cassandra such that the first query is always
as fast as the second one. The second one was 1.8msec and the first 28msec
for the same exact paremeters. I am using preparestatement.

Thanks!

Re: Cassandra driver performance question...

2013-06-24 Thread Jabbar Azam

Hello Tony,

This came out recently

http://www.datastax.com/doc-source/developer/java-driver/index.html

I can't vouch for performance but the documentation is ok and it works. I'm
using it on a side project myself.

There is also astyanax by netflix and it also supports CQL 3
https://github.com/Netflix/astyanax/wiki/Getting-Started

Thanks

Jabbar Azam

On 24 June 2013 15:34, Tony Anecito adanec...@yahoo.com wrote:

Hi Jabbar,

I am using JDBC driver because almost no examples exist about what you
mention. Even most of the JDBC examples I find do not work because they are
incomplete or out of date. If you have a good reference about what you
mentioned I can try it.

As I menioned I got selects to work now I am trying to get inserts to work
via JDBC. Running into issues there also but I will work at it till I get
them to work.

Regards,
-Tony

*From:* Jabbar Azam aja...@gmail.com
*To:* user@cassandra.apache.org
*Cc:* Tony Anecito adanec...@yahoo.com
*Sent:* Monday, June 24, 2013 3:26 AM

*Subject:* Re: Cassandra driver performance question...

Hello tony,
I couldnt reply earlier because I've been decorating over the weekend so
have been a bit busy.
Let me know what's happens.
Out of couriosity why are you using and not a cql3 native driver?
Thanks
Jabbar Azam
On 24 Jun 2013 00:32, Tony Anecito adanec...@yahoo.com wrote:

Hi Jabbar,

I was able to get the performance issue resolved by reusing the
connection object. It will be interesting to see what happens when I use a
connection pool from a app server.

I still think it would be a good idea to have a minimal mode for metadata.
It is rare I use metadata.

Regards,
-Tony

Hi Jabbar,

So my question is how to test this theory? Is there a way to stop the
metadata from coming accross from Cassandra? A 20x performance improvement
would be nice to have.

Thanks,
-Tony

*From:* Tony Anecito adanec...@yahoo.com
*To:* user@cassandra.apache.org user@cassandra.apache.org
*Sent:* Friday, June 21, 2013 8:56 PM
*Subject:* Re: Cassandra driver performance question...

Thanks Jabbar,

I ran nodetool as suggested and it 0 latency for the row count I have.

I ran describe from cli for the table and it said caching is ALL which
is correct.

A real mystery and I need to understand better what is going on.

Regards,
-Tony

*From:* Jabbar Azam aja...@gmail.com
*To:* user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com
*Sent:* Friday, June 21, 2013 3:32 PM
*Subject:* Re: Cassandra driver performance question...

Hello Tony,

I would guess that the first queries data is put into the row cache and
the filesystem cache. The second query gets the data from the row cache and
or the filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will
definitely help. The following advice from Aaron Morton will also help

You can also see what it looks like from the server side.

If the proxyhistograms is showing most requests running faster than your app
says it's your
app.

http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E

Thanks

Jabbar Azam

On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

Hi All,
I am using jdbc driver and noticed that if I run the same query twice the
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a
difference.

Thanks!

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Jabbar Azam

Hello Mohammed,

You should increase the heap space. You should also tune the garbage
collection so young generation objects are collected faster, relieving
pressure on heap We have been using jdk 7 and it uses G1 as the default
collector. It does a better job than me trying to optimise the JDK 6 GC
collectors.

Bear in mind though that the OS will need memory, so will the row cache and
the filing system. Although memory usage will depend on the workload of
your system.

I'm sure you'll also get good advice from other members of the mailing list.

Thanks

Jabbar Azam


On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote:

  We have a 3-node cassandra cluster on AWS. These nodes are running
 cassandra 1.2.2 and have 8GB memory. We didn't change any of the default
 heap or GC settings. So each node is allocating 1.8GB of heap space. The
 rows are wide; each row stores around 260,000 columns. We are reading the
 data using Astyanax. If our application tries to read 80,000 columns each
 from 10 or more rows at the same time, some of the nodes run out of heap
 space and terminate with OOM error. Here is the error message:

 ** **

 java.lang.OutOfMemoryError: Java heap space

 at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
 

 at
 org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
 

 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
 

 at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
 

 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
 

 at org.apache.cassandra.db.Table.getRow(Table.java:355)

 at
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
 

at
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
 

 at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]

 java.lang.OutOfMemoryError: Java heap space

 at java.lang.Long.toString(Long.java:269)

 at java.lang.Long.toString(Long.java:764)

 at
 org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
 

 at
 org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
 

 at
 org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
 

 at
 org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)

 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
 

 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 The data in each column is less than 50 bytes. After adding all the column
 overheads (column name + metadata), it should not be more than 100 bytes.
 So reading 80,000 columns from 10 rows each means that we are reading
 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to
 fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the
 data request is too big to fill

Re: Cassandra driver performance question...

2013-06-21 Thread Jabbar Azam

Hello Tony,

I would guess that the first queries data  is put into the row cache and
the filesystem cache. The second query gets the data from the row cache and
or the filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will
definitely help. The following advice from Aaron Morton will also help

You can also see what it looks like from the server side.

nodetool proxyhistograms will show you full request latency recorded
by the coordinator.
nodetool cfhistograms will show you the local read latency, this is
just the time it takes
to read data on a replica and does not include network or wait times.

If the proxyhistograms is showing most requests running faster than
your app says it's your
app.


http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E



Thanks

Jabbar Azam


On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,
 I am using jdbc driver and noticed that if I run the same query twice the
 second time it is much faster.
 I setup the row cache and column family cache and it not seem to make a
 difference.

 I am wondering how to setup cassandra such that the first query is always
 as fast as the second one. The second one was 1.8msec and the first 28msec
 for the same exact paremeters. I am using preparestatement.

 Thanks!

Re: CQL3 Driver, DESCRIBE

2013-06-09 Thread Jabbar Azam

Hello Joe,

I would use cqlsh and run  table in there  I'm not sure why you want
to run that from the driver!

Thanks

Jabbar Azam


On 9 June 2013 23:49, Joe Greenawalt joe.greenaw...@gmail.com wrote:

 Hi, I was playing around with the datastax driver today, and I wanted to
 call DESCRIBE TABLE ;.  But got a syntax error: line 1:0 no viable
 alternative at input 'describe'.  Is that functionality just not
 implemented in the 1.0 driver?

 If that's true:
 Does anyone know if its planned?
 Is there another way to get a description of the table?

 If it's not true, does anyone know where I could be doing something wrong?
 I have a good connection, and I'm simply running session.execute(DESCRIBE
 TABLE {keyspaceName}.{tableName};);

 Thanks,
 Joe

Re: CQL3 Driver, DESCRIBE

2013-06-09 Thread Jabbar Azam

Oops I meant describe table ...

Thanks

Jabbar Azam


On 10 June 2013 00:16, Jabbar Azam aja...@gmail.com wrote:

 Hello Joe,

 I would use cqlsh and run  table in there  I'm not sure why you want
 to run that from the driver!

 Thanks

 Jabbar Azam


 On 9 June 2013 23:49, Joe Greenawalt joe.greenaw...@gmail.com wrote:

 Hi, I was playing around with the datastax driver today, and I wanted to
 call DESCRIBE TABLE ;.  But got a syntax error: line 1:0 no viable
 alternative at input 'describe'.  Is that functionality just not
 implemented in the 1.0 driver?

 If that's true:
 Does anyone know if its planned?
 Is there another way to get a description of the table?

 If it's not true, does anyone know where I could be doing something wrong?
 I have a good connection, and I'm simply running
 session.execute(DESCRIBE TABLE {keyspaceName}.{tableName};);

 Thanks,
 Joe

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Jabbar Azam

Hello Cem,

You can get a similar effect by specifying a TTL value for data you save to
a table. If the data becomes older than the TTL value then it will
automatically be deleted by  C*

Thanks

Jabbar Azam


On 29 May 2013 17:01, cem cayiro...@gmail.com wrote:

 Thank you very much for the fast answer.

 Does playORM use different column families for each partition in
 Cassandra?

 Cem


 On Wed, May 29, 2013 at 5:30 PM, Jeremy Powell 
 jeremym.pow...@gmail.comwrote:

 Cem, yes, you can do this with C*, though you have to handle the logic
 yourself (other libraries might do this for you, seen the dev of playORM
 discuss some things which might be similar).  We use Astyanax
 and programmatically create CFs based on a time period of our choosing that
 makes sense for our system, programmatically drop CFs if/when they are
 outside a certain time period (rather than using C*'s TTL), and write data
 to the different CFs as needed.

 ~Jeremy

 On Wed, May 29, 2013 at 8:36 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I used time range partitions 5 years ago with MySQL to clean up data
 much faster.

 I had a big FACT table with time range partitions and it was very is to
 drop old partitions (with archiving) and do some saving on disk.

 Has anyone implemented such a thing in Cassandra? It would be great if
 we have that in Cassandra.

 Best Regards,
 Cem.

Re: Compaction causing OutOfHeap

2013-05-27 Thread Jabbar Azam

Hello,

I've notice in an earlier 1.2.x that if I had a compaction throughput
throttle some of the nodes would give an out of memory error only if I was
inserting data for more than 10 hours continuosly.  The work around was to
switch off compaction throttling.

This was in a test environment doing lots of inserts so switching off
compaction throttling was ok.

Thanks

Jabbar Azam
On 27 May 2013 04:29, John Watson j...@disqus.com wrote:

 Having (2) 1.2.5 nodes constantly crashing due to OutOfHeap errors.

 It always happens when the same large compaction is about to finish (they
 re-run the same compaction after restarting.)

 An indicator is CMS GC time of 3-5s (and the many related problems felt
 throughout the rest of the cluster)

Look table structuring advice

2013-05-04 Thread Jabbar Azam

Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use the
following CQL3

select * from roles;

When a new name is added it will be stored on a different host and doing a
select * is going to be inefficient because the table will be stored across
the cluster and each node will respond. The number of roles may be less
than or just greater than a dozen. I'm not sure if I'm storing the roles
correctly.


The other thing I'm thinking about is that when I've read the roles once
then I can cache them.

Thanks

Jabbar Azam

Re: Look table structuring advice

2013-05-04 Thread Jabbar Azam

I never thought about using a synthetic key, but in this instance with
about a dozen rows it's probably ok. Thanks for your great idea.

Where  did you read about the synthetic key idea? I've not come across it
before.

Thanks

Jabbar Azam


On 4 May 2013 19:30, Dave Brosius dbros...@mebigfatguy.com wrote:

 if you want to store all the roles in one row, you can do

 create table roles (synthetic_key int, name text, primary
 key(synthetic_key, name)) with compact storage

 when inserting roles, just use the same key

 insert into roles (synthetic_key, name) values (0, 'Programmer');
 insert into roles (synthetic_key, name) values (0, 'Tester');

 and use

 select * from roles where synthetic_key = 0;


 (or some arbitrary key value you decide to use)

 the that data is stored on one node (and its replicas)

 of course if the number of roles grows to be large, you lose most of the
 value in having a cluster.




 On 05/04/2013 12:09 PM, Jabbar Azam wrote:

 Hello,

 I want to create a simple table holding user roles e.g.

 create table roles (
name text,
primary key(name)
 );

 If I want to get a list of roles for some admin tool I can use the
 following CQL3

 select * from roles;

 When a new name is added it will be stored on a different host and doing
 a select * is going to be inefficient because the table will be stored
 across the cluster and each node will respond. The number of roles may be
 less than or just greater than a dozen. I'm not sure if I'm storing the
 roles correctly.


 The other thing I'm thinking about is that when I've read the roles once
 then I can cache them.

 Thanks

 Jabbar Azam

Re: cql query

2013-05-02 Thread Jabbar Azam

Hello Sri,

As far as I know you can if name and age are part of your partition key and
timestamp is the cluster key e.g.

create table columnfamily (
name varchar,
age varchar,
tstamp timestamp,
   partition key((name, age), tstamp)
);




Thanks

Jabbar Azam


On 2 May 2013 11:45, Sri Ramya ramya.1...@gmail.com wrote:

 hi

 Can some body tell me is it possible to to do multiple query on cassandra
 like Select * from columnfamily where name='foo' and age ='21' and
 timestamp = 'unixtimestamp' ;

 Please tell me some guidence for these kind of queries

   Thank you

Re: Cassandra multi-datacenter

2013-05-02 Thread Jabbar Azam

I'm not sure why you want to use public Ip's in the other data centre.
You're cassandra nodes in the other datacentre will be accessible from the
internet

Personally I would use private IP addresses in the second data centre, on a
different IP subnet.

A VPN is your only solution if you want to keep your data private and
unhackable, as it's tunneling it's way through the internet

A slow network connection will mean your data is not in sync in both
datacentres unless you explicitly specify quorum as your consisteny level
in your mutation requests but your database throughput will be affected by
this.

You bandwidth to the second datacentre and the quantity of your mutation
requests will dictate how long it will take the second datacentre to get in
sync with the primary datacentre.


I've probably missed something but there are plenty of intelligent people
in this mailing list to fill the blanks :)

Thanks

Jabbar Azam


On 2 May 2013 20:28, Daning Wang dan...@netseer.com wrote:

 Hi all,

 We are deploying Cassandra on two data centers. there is slower network
 connection between data centers.

 Looks casandra should use internal ip to communicate with nodes in the
 same data center, and public ip to talk to nodes in other data center. We
 know VPN is a solution, but want to know if there is other idea.

 Thanks in advance,

 Daning

Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-16 Thread Jabbar Azam

I already have thanks. I'll do the tests with the hardware arrives.

Thanks

Jabbar Azam


On 16 April 2013 22:27, aaron morton aa...@thelastpickle.com wrote:

 Can't we use LCS?

 Do some reading and some tests…

 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
 http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 15/04/2013, at 10:44 PM, Jabbar Azam aja...@gmail.com wrote:

 I know the SSD's are a bit small but they should be enough for our
 application. Out test data is 1.6 TB(including replication of rf=3). Can't
 we use LCS? This will give us more space at the expensive of more I/O but
 SSD's have loads of I/Os.





 Thanks

 Jabbar Azam


 On 14 April 2013 20:20, Jabbar Azam aja...@gmail.com wrote:

 Thanks Aaron.

 Thanks

 Jabbar Azam


 On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote:

 That's better.

 The SSD size is a bit small, and be warned that you will want to leave
 50Gb to 100GB free to allow room for compaction (using the default size
 tiered).

 On the ram side you will want to run about 4GB (assuming cass 1.2) for
 the JVM the rest can be off heap Cassandra structures. This may not leave
 too much free space for the os page cache, but SSD may help there.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and
 256gb ssds?

 I know it will depend on our workload but will be better than a dual
 core CPU. I think

 Jabbar Azam
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:

 Duel core not the greatest you might run into GC issues before you run
 out of IO from your ssd devices. Also cassandra has other concurrency
 settings that are tuned roughly around the number of processors/cores. It
 is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
 gen garbage managing lots of sockets whatever.


 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:

 That's my guess. My colleague is still looking at CPU's so I'm hoping
 he can get quad core CPU's for the servers.

 Thanks

 Jabbar Azam


 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron 
 CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)

Re: MySQL Cluster performing faster than Cassandra cluster on single table

2013-04-16 Thread Jabbar Azam

MySQL cluster also has the index in ram.  So with lots of rows the ram
becomes a limiting factor.

That's what my colleague found and hence why were sticking with Cassandra.
On 16 Apr 2013 21:05, horschi hors...@gmail.com wrote:



 Ah, I see, that makes sense. Have you got a source for the storing of
 hundreds of gigabytes? And does Cassandra not store anything in memory?

 It stores bloom filters and index-samples in memory. But they are much
 smaller than the actual data and they can be configured.



 Yeah, my dataset is small at the moment - perhaps I should have chosen
 something larger for the work I'm doing (University dissertation), however,
 it is far too late to change now!

 On paper mysql-cluster looks great. But in daily use its not as nice as
 Cassandra (where you have machines dying, networks splitting, etc.).

 cheers,
 Christian

Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-15 Thread Jabbar Azam

I know the SSD's are a bit small but they should be enough for our
application. Out test data is 1.6 TB(including replication of rf=3). Can't
we use LCS? This will give us more space at the expensive of more I/O but
SSD's have loads of I/Os.





Thanks

Jabbar Azam


On 14 April 2013 20:20, Jabbar Azam aja...@gmail.com wrote:

 Thanks Aaron.

 Thanks

 Jabbar Azam


 On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote:

 That's better.

 The SSD size is a bit small, and be warned that you will want to leave
 50Gb to 100GB free to allow room for compaction (using the default size
 tiered).

 On the ram side you will want to run about 4GB (assuming cass 1.2) for
 the JVM the rest can be off heap Cassandra structures. This may not leave
 too much free space for the os page cache, but SSD may help there.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and
 256gb ssds?

 I know it will depend on our workload but will be better than a dual core
 CPU. I think

 Jabbar Azam
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:

 Duel core not the greatest you might run into GC issues before you run
 out of IO from your ssd devices. Also cassandra has other concurrency
 settings that are tuned roughly around the number of processors/cores. It
 is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
 gen garbage managing lots of sockets whatever.


 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:

 That's my guess. My colleague is still looking at CPU's so I'm hoping
 he can get quad core CPU's for the servers.

 Thanks

 Jabbar Azam


 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)

Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-14 Thread Jabbar Azam

Thanks Aaron.

Thanks

Jabbar Azam


On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote:

 That's better.

 The SSD size is a bit small, and be warned that you will want to leave
 50Gb to 100GB free to allow room for compaction (using the default size
 tiered).

 On the ram side you will want to run about 4GB (assuming cass 1.2) for the
 JVM the rest can be off heap Cassandra structures. This may not leave too
 much free space for the os page cache, but SSD may help there.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb
 ssds?

 I know it will depend on our workload but will be better than a dual core
 CPU. I think

 Jabbar Azam
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:

 Duel core not the greatest you might run into GC issues before you run
 out of IO from your ssd devices. Also cassandra has other concurrency
 settings that are tuned roughly around the number of processors/cores. It
 is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
 gen garbage managing lots of sockets whatever.


 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:

 That's my guess. My colleague is still looking at CPU's so I'm hoping he
 can get quad core CPU's for the servers.

 Thanks

 Jabbar Azam


 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)

Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Jabbar Azam

With your example you can do an equality search with surname and city and
then use in with country

Eg.  Select * from yourtable where surname=blah and city=blah blah and
country in (country1, country2)

Hope that helps

Jabbar Azam
On 13 Apr 2013 07:06, Gareth Collins gareth.o.coll...@gmail.com wrote:

 Hello,

 If I have a cql3 table like this (I don't have a table with this data -
 this is just for example):

 create table (
 surname text,
 city text,
 country text,
 event_id timeuuid,
 data text,
 PRIMARY KEY ((surname, city, country),event_id));

 there is no way of (easily) getting the set (or a subset) of partition
 keys, is there (i.e. surname/city/country)? If I want easy access to do
 queries to get a subset of the partition keys, I have to create another
 table?

 I am assuming yes but just making sure I am not missing something obvious
 here.

 thanks in advance,
 Gareth

Any experience of 20 node mini-itx cassandra cluster

2013-04-12 Thread Jabbar Azam

Hello,

I'm going to be building a 20 node cassandra cluster in one datacentre. The
spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB
RAM and two nics.


Has anybody done any performance testing with this setup or have any
gotcha's I should be aware of wrt to the hardware?

 I do realise the CPU is fairly low computational power but I'm going to
assume the system is going to be IO bound hence the RAM and SSD's.


Thanks

Jabbar Azam

Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-12 Thread Jabbar Azam

That's my guess. My colleague is still looking at CPU's so I'm hoping he
can get quad core CPU's for the servers.

Thanks

Jabbar Azam


On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on their
 performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one datacentre.
 The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD,
 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going to
 assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)

Re: multiple Datacenter values in PropertyFileSnitch

2013-04-11 Thread Jabbar Azam

Hello,

I'm not an expert but I don't think you can do what you want. The way to
separate data for applications on the same cluster is to use different
tables for different applications or use multiple keyspaces, a keyspace per
application. The replication factor you specify for each keyspace specifies
how many copies of the data are stored in each datacenter.

You can't specify that data for a particular application is stored on a
specific node, unless that node is in its own cluster.

I think of a cassandra cluster as a shared resource where all the
applications have access to all the nodes in the cluster.


Thanks

Jabbar Azam


On 11 April 2013 14:13, Matthias Zeilinger matthias.zeilin...@bwinparty.com
 wrote:

  Hi,

 ** **

 I would like to create big cluster for many applications.

 Within this cluster I would like to separate the data for each
 application, which can be easily done via different virtual datacenters and
 the correct replication strategy.

 What I would like to know, if I can specify for 1 node multiple values in
 the PropertyFileSnitch configuration, so that I can use 1 node for more
 applications?

 For example:

 6 nodes:

 3 for App A

 3 for App B

 4 for App C

 ** **

 I want to have such a configuration:

 Node 1 – DC-A DC-C

 Node 2 – DC-B  DC-C

 Node 3 – DC-A  DC-C

 Node 4 – DC-B  DC-C

 Node 5 – DC-A

 Node 6 – DC-B

 ** **

 Is this possible or does anyone have another solution for this?

 ** **

 ** **

 Thx  br matthias

Re: Two Cluster each with 12 nodes- Cassandra database

2013-04-11 Thread Jabbar Azam

Hello,

I don't know what pelops is. I'm not sure why you want two clusters. I
would have two clusters if I want to have data stored on totally separate
servers for perhaps security reasons.

If you are going to have the servers in one location then you might as well
have one cluster. You'll have the maximum aggregate io of all the servers.

If you're thinking of doing analytics as well then you can create two
virtual datacentres.  One for realtime inserts and reads and the second for
analytics.  You could have have and 16 /8 server split.  Obviously you'll
have to work out what the optimum split is for your workload.

Not sure if I've answered your question...
On 11 Apr 2013 18:51, Raihan Jamal jamalrai...@gmail.com wrote:

 Folks, Any thoughts on this? I am still in the learning process. So any
 guidance will be of great help.





 *Raihan Jamal*


 On Wed, Apr 10, 2013 at 10:39 PM, Raihan Jamal jamalrai...@gmail.comwrote:

 I have started working on a project in which I am using `Cassandra
 database`.

 Our production DBA's have setup `two cluster` and each cluster will have
 `12 nodes`.

 I will be using `Pelops client` to read the data from Cassandra database.
 Now I am thinking what's the best way to create `Cluster` using `Pelops
 client` like how many nodes I should add while creating cluster?

 My understanding was to create the cluster with all the `24 nodes` as I
 will be having two cluster each with 12 nodes? This is the right approach?


 *If not, then how we decide what nodes (from each cluster) I should add
 while creating the cluster using Pelops client?
 *

 String[] nodes = cfg.getStringArray(cassandra.servers);

 int port = cfg.getInt(cassandra.port);

 boolean dynamicND = true; // dynamic node discovery

 Config casconf = new Config(port, true, 0);

 Cluster cluster = new Cluster(nodes, casconf, dynamicND);

 Pelops.addPool(Const.CASSANDRA_POOL, cluster, Const.CASSANDRA_KS);


 Can anyone help me out with this?

 Any help will be appreciated.


 **

Re: Backup strategies in a multi DC cluster

2013-03-26 Thread Jabbar Azam

Thank you for your feedback.  I'll speak to the dev guys and come up with
something appropriate.
On 26 Mar 2013 17:51, aaron morton aa...@thelastpickle.com wrote:

 Assume you have four nodes and a snapshot is taken.  The following day if
 a node goes down and data is corrupt through user error then how do you use
 the previouus nights snapshots?

 Not sure what is corrupt, the snapshot/backup or the data is incorrect
 through application error.

 Would you replace the faulty node first and then restore last nights
 snapshot?  What happens if you don't have a replacement node? You won't be
 able to restore last nights snapshot.

 You would need to stop the entire cluster, and restore the snapshots on
 all nodes.
 If you restored the snapshot on just one node, new or old HW, it would
 have some data with an older timestamp than the other nodes. Cassandra
 would see this as an inconsistency, that the restored node missed some
 writes, and resolve the consistency be the most recent values.

 However if a virtual datacenter consisting of a backup node is used then
 the backup node could be used regardless of the number of nodes in the
 datacentre.


 It depends on the failure scenario and what you are trying to protect
 against.

 If you have 4 nodes and one node fails the best thing to do is start a new
 node and let cassandra stream the data from the other nodes. The new node
 could have the same token as the previous failed node. So long as the
 /var/lib/cassandra/data/system dir is empty (and the node is not a seed) it
 will join the cluster and ask the others for data.

 If you want to ensure availability then consider bigger clusters, e.g. 6
 nodes with rf 3 allows you to lose up to 2 nodes and stay up. Or a higher
 RF. (see http://thelastpickle.com/2011/06/13/Down-For-Me/)

 It's tricky to protect agains application error creating bad data using
 just backups. You may need to look at how you can replay events in your
 system and consider which parts of your data model should be directly
  mutates and which should be indirectly mutated by recording changes in
 another part of the model.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 25/03/2013, at 8:19 AM, Jabbar Azam aja...@gmail.com wrote:

 Thanks Aaron. I have a hypothetical question.

 Assume you have four nodes and a snapshot is taken.  The following day if
 a node goes down and data is corrupt through user error then how do you use
 the previouus nights snapshots?

 Would you replace the faulty node first and then restore last nights
 snapshot?  What happens if you don't have a replacement node? You won't be
 able to restore last nights snapshot.

 However if a virtual datacenter consisting of a backup node is used then
 the backup node could be used regardless of the number of nodes in the
 datacentre. Would there be any disadvantages approach?  Sorry for the
 questions I want to understand all the options.
 On 24 Mar 2013 17:45, aaron morton aa...@thelastpickle.com wrote:

 There are advantages and disadvantages in both approaches. What are
 people doing in their production systems?

 Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to
 get things off node.

 Cheers


-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 23/03/2013, at 4:37 AM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 I've been experimenting with cassandra for quite a while now.

 It's time for me to look at backups but I'm not sure what the best
 practice is. I want to be able to recover the data to a point in time
 before any user or software errors.

 We will have two datacentres with 4 servers and RF=3.

 Each datacentre will have at most 1.6 TB(includes replication, LZ4
 compression, using test data) of data. That is ten years of data after
 which we will start purging. This amounts to about 400MB of data generation
 per day.

 I've read about users doing snapshots of individual nodes to S3(Netflix)
 and I've read  about creating virtual datacentres (
 http://www.datastax.com/dev/blog/multi-datacenter-replication) where
 each virtual datacentre contains a backup node.

 There are advantages and disadvantages in both approaches. What are
 people doing in their production systems?




 --
 Thanks

 Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-25 Thread Jabbar Azam

nodetool cleanup took about 23.5 hours on each node(did this in parallel).
started the nodetool cleanup 20:53 on March 22 and it's still running
(10:08 25 March)

The RF = 3. The load on each node is 490 GB, 491 GB, 323GB, 476GB

I think I read some that removenode is faster the more nodes there are in
the cluster.

My next email will be the last in the thread. I thought the info might be
useful to other people in the community.





On 21 March 2013 21:59, Jabbar Azam aja...@gmail.com wrote:

 nodetool cleanup command removes keys which can be deleted from the node
 the  command is run. So I'm assuming I can run nodetool cleanup on all the
 old nodes in parallel. Wouldn't do this on a live cluster as it's I/O
 intensive on each node.


 On 21 March 2013 17:26, Jabbar Azam aja...@gmail.com wrote:

 Can I do a multiple node nodetool cleanup on my test cluster?
 On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote:


 All cassandra-topology.properties are the same.

 The node add appears to be successful. I can see it using nodetool
 status. I'm doing a node cleanup on the old nodes and then will do a node
 remove, to remove the old node. The actual node join took about 6 hours.
 The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra





 On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote:

  Not sure if I needed to change cassandra-topology.properties file on
 the existing nodes.

 If you are using the PropertyFileSnitch all nodes need to have the same
 cassandra-topology.properties file.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote:

 I've added the node with a different IP address and after disabling the
 firewall data is being streamed from the existing nodes to the wiped node.
 I'll do a cleanup, followed by remove node once it's done.

 I've also added the new node to the existing nodes'
 cassandra-topology.properties file and restarted them. I also found I had
 iptables switched on and couldn't understand why the wiped node couldn't
 see the cluster. Not sure if I needed to change
 cassandra-topology.properties file on the existing nodes.




 On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote:

 Do I use removenode before adding the reinstalled node or after?


 On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the 
 good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring 
 as a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:
 user@cassandra.apache.org user@cassandra.apache.orgmailto:
 user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't
 follow the replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your
 best option is to follow the replace node documentation which I belive 
 you
 need to


  1.  Have the token be the same BUT add 1 to it so it doesn't think
 it's the same computer
  2.  Have the bootstrap option set or something so streaming takes
 affect.

 I would however test that all out in QA to make sure it works and if
 you have QUOROM reads/writes a good part of that test would be to take 
 node
 X down after your node Y is back in the cluster to make sure 
 reads/writes
 are working on the node you fixed…..you just need to make sure node X
 shares one of the token ranges of node Y AND your writes/reads are in 
 that
 token range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.org user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user

Re: cfhistograms

2013-03-25 Thread Jabbar Azam

This also has a good description of how to interpret the results

http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/


On 25 March 2013 16:36, Brian Tarbox tar...@cabotresearch.com wrote:

 I think we all go through this learning curve.  Here is the answer I gave
 last time this question was asked:

 The output of this command seems to make no sense unless I think of it as
 5 completely separate histograms that just happen to be displayed
 together.

 Using this example output should I read it as: my reads all took either 1
 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
 separately I had read latencies of 2, 8,69, etc?

 In other words...each row isn't really a row...i.e. on those 16033 reads
 from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
 size and 0 column count.  Is that right?

 Offset  SSTables Write Latency  Read Latency  Row Size
  Column Count
 1  16033 00
  0 0
 2303   00
0 1
 3  0 00
  0 0
 4  0 00
  0 0
 5  0 00
  0 0
 6  0 00
  0 0
 7  0 00
  0 0
 8  0 02
  0 0
 10 0 00
  0  6261
 12 0 02
  0   117
 14 0 08
  0 0
 17 0 3   69
  0   255
 20 0 7  163
  0 0
 24 019 1369
  0 0


 On Mon, Mar 25, 2013 at 11:52 AM, Kanwar Sangha kan...@mavenir.comwrote:

  Can someone explain how to read the cfhistograms o/p ?

 ** **

 [root@db4 ~]# nodetool cfhistograms usertable data

 usertable/data histograms

 Offset  SSTables Write Latency  Read Latency  Row
 Size  Column Count

 12857444  4051 0
 0 342711

 26355104 27021 0
0  201313

 32579941 61600 0
 0  130489

 4 374067119286 0
 0 91378

 5   9175210934 0
 0 68548

 6  0321098 0
 0 54479

 7  0476677 0
 0 45427

 8  0734846 0
 0 38814

 10 0   2867967 4
 0 65512

 12 0   536684422
 0 59967

 14 0   691143136
 0 63980

 17 0  10155740   127
 0115714

 20 0   7432318   302
   0138759

 24 0   5231047   969
 0193477

 29 0   2368553  2790
 0209998

 35 0859591  4385
 0204751

 42 0456978  3790
 0214658

 50 0306084  2465
 0151838

 60 0223202  2158
 0 40277

 72 0122906  2896
 0  1735

 ** **

 ** **

 Thanks

 Kanwar

 ** **





-- 
Thanks

Jabbar Azam

Re: Backup strategies in a multi DC cluster

2013-03-24 Thread Jabbar Azam

Thanks Aaron. I have a hypothetical question.

Assume you have four nodes and a snapshot is taken.  The following day if a
node goes down and data is corrupt through user error then how do you use
the previouus nights snapshots?

Would you replace the faulty node first and then restore last nights
snapshot?  What happens if you don't have a replacement node? You won't be
able to restore last nights snapshot.

However if a virtual datacenter consisting of a backup node is used then
the backup node could be used regardless of the number of nodes in the
datacentre. Would there be any disadvantages approach?  Sorry for the
questions I want to understand all the options.
On 24 Mar 2013 17:45, aaron morton aa...@thelastpickle.com wrote:

 There are advantages and disadvantages in both approaches. What are people
 doing in their production systems?

 Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to
 get things off node.

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 23/03/2013, at 4:37 AM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 I've been experimenting with cassandra for quite a while now.

 It's time for me to look at backups but I'm not sure what the best
 practice is. I want to be able to recover the data to a point in time
 before any user or software errors.

 We will have two datacentres with 4 servers and RF=3.

 Each datacentre will have at most 1.6 TB(includes replication, LZ4
 compression, using test data) of data. That is ten years of data after
 which we will start purging. This amounts to about 400MB of data generation
 per day.

 I've read about users doing snapshots of individual nodes to S3(Netflix)
 and I've read  about creating virtual datacentres (
 http://www.datastax.com/dev/blog/multi-datacenter-replication) where each
 virtual datacentre contains a backup node.

 There are advantages and disadvantages in both approaches. What are people
 doing in their production systems?




 --
 Thanks

 Jabbar Azam

Backup strategies in a multi DC cluster

2013-03-22 Thread Jabbar Azam

Hello,

I've been experimenting with cassandra for quite a while now.

It's time for me to look at backups but I'm not sure what the best practice
is. I want to be able to recover the data to a point in time before any
user or software errors.

We will have two datacentres with 4 servers and RF=3.

Each datacentre will have at most 1.6 TB(includes replication, LZ4
compression, using test data) of data. That is ten years of data after
which we will start purging. This amounts to about 400MB of data generation
per day.

I've read about users doing snapshots of individual nodes to S3(Netflix)
and I've read  about creating virtual datacentres (
http://www.datastax.com/dev/blog/multi-datacenter-replication) where each
virtual datacentre contains a backup node.

There are advantages and disadvantages in both approaches. What are people
doing in their production systems?




-- 
Thanks

Jabbar Azam

Re: cannot start Cassandra on Windows7

2013-03-22 Thread Jabbar Azam

Hello Marina,

I've downloaded a fresh copy of v1.2.3 and it's running fine on my Windows
7 64 bit PC. I am using jdk 1.6.0 u29 64 bit. I have local admin
permissions to my PC.


On 22 March 2013 15:36, Marina ppi...@yahoo.com wrote:


 
  Hi,
  I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on
 my
  Windows7 machine (I did not find a Windows-specific distributable...).
 Then, I
  tried to start Cassandra as following and got an error:
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f
  Starting Cassandra Server
  Exception in thread main java.lang.ExceptionInInitializerError
  Caused by: java.lang.RuntimeException: Couldn't figure out log4j
 configuration:
  log4j-server.properties
  at
 org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo
  n.java:81)
  at org.apache.cassandra.service.CassandraDaemon.clinit
 (CassandraDaemon
  .java:57)
  Could not find the main class:
 org.apache.cassandra.service.CassandraDaemon.
 Pr
  ogram will exit.
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bin
 
  It looks similar to the Cassandra issue that was already fixed:
  https://issues.apache.org/jira/browse/CASSANDRA-2383
 
  however I am still getting this error
  I am an Administrator on my machine, and have access to all files in the
 apache-
  cassandra-1.2.3\conf dir, including the log4j ones.
 
  Do I need to configure anything else on Winows ? I did not find any
 Windows-
  specific installation/setup/startup instructions - if there are such
 documents
  somewhere, please let me know!
 
  Thanks,
  Marina
 
 

 In case it helps, I have added echo of CASSANDRA_CLASSPATH:

 C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f
 Starting Cassandra Server
 CASSANDRA_CLASSPATH=C:\Marina\Tools\DataStax
 Community\apache-cassandra\conf;
 C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\antlr-3.2.jar;C:\Marin
 a\Tools\DataStax
 Community\apache-cassandra\lib\apache-cassandra-1.2.2.jar;C:\
 Marina\Tools\DataStax
 Community\apache-cassandra\lib\apache-cassandra-clientutil
 -1.2.2.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\apache-cass
 andra-thrift-1.2.2.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib
 \avro-1.4.0-fixes.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\
 avro-1.4.0-sources-fixes.jar;C:\Marina\Tools\DataStax
 Community\apache-cassand
 ra\lib\commons-cli-1.1.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra
 \lib\commons-codec-1.2.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra
 \lib\commons-lang-2.6.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\
 lib\compress-lzf-0.8.4.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra
 \lib\concurrentlinkedhashmap-lru-1.3.jar;C:\Marina\Tools\DataStax
 Community\ap
 ache-cassandra\lib\guava-13.0.1.jar;C:\Marina\Tools\DataStax
 Community\apache-
 cassandra\lib\high-scale-lib-1.1.2.jar;C:\Marina\Tools\DataStax
 Community\apac
 he-cassandra\lib\jackson-core-asl-1.9.2.jar;C:\Marina\Tools\DataStax
 Community
 \apache-cassandra\lib\jackson-mapper-asl-1.9.2.jar;C:\Marina\Tools\DataStax
 Co
 mmunity\apache-cassandra\lib\jamm-0.2.5.jar;C:\Marina\Tools\DataStax
 Community
 \apache-cassandra\lib\jbcrypt-0.3m.jar;C:\Marina\Tools\DataStax
 Community\apac
 he-cassandra\lib\jline-1.0.jar;C:\Marina\Tools\DataStax
 Community\apache-cassa
 ndra\lib\json-simple-1.1.jar;C:\Marina\Tools\DataStax
 Community\apache-cassand
 ra\lib\libthrift-0.7.0.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra
 \lib\log4j-1.2.16.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\
 lz4-1.1.0.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\metrics-
 core-2.0.3.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\netty-3
 .5.9.Final.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\servlet
 -api-2.5-20081211.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\
 slf4j-api-1.7.2.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\sl
 f4j-log4j12-1.7.2.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\
 snakeyaml-1.6.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\snap
 py-java-1.0.4.1.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\lib\sn
 aptree-0.1.jar;C:\Marina\Tools\DataStax
 Community\apache-cassandra\build\class
 es\main;C:\Marina\Tools\DataStax
 Community\apache-cassandra\build\classes\thri
 ft
 Exception in thread main java.lang.ExceptionInInitializerError
 Caused by: java.lang.RuntimeException: Couldn't figure out log4j
 configuration:
 log4j-server.properties
 at
 org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo
 n.java:81)
 at
 org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon
 .java:57)
 Could not find the main class:
 org.apache.cassandra.service.CassandraDaemon.  Pr
 ogram will exit.







-- 
Thanks

Jabbar Azam

Re: cannot start Cassandra on Windows7

2013-03-22 Thread Jabbar Azam

Viktor, you're right. I didn't get any errors on my windows console but
cassandra.yaml and log4j-server.properties need modifying.


On 22 March 2013 15:44, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote:

 You NEED to edit cassandra.yaml and log4j-server.properties paths before
 starting on Windows.

 There're a LOT of things to learn for starters. Google for Cassandra on
 Windows.



 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies. -Original Message-
  From: Marina [mailto:ppi...@yahoo.com]
  Sent: Friday, March 22, 2013 17:21
  To: user@cassandra.apache.org
  Subject: cannot start Cassandra on Windows7
 
  Hi,
  I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on
 my
  Windows7 machine (I did not find a Windows-specific distributable...).
 Then, I
  tried to start Cassandra as following and got an error:
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting
  Cassandra Server Exception in thread main
  java.lang.ExceptionInInitializerError
  Caused by: java.lang.RuntimeException: Couldn't figure out log4j
  configuration:
  log4j-server.properties
  at
  org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo
  n.java:81)
  at
  org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon
  .java:57)
  Could not find the main class:
  org.apache.cassandra.service.CassandraDaemon.  Pr ogram will exit.
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bin
 
  It looks similar to the Cassandra issue that was already fixed:
  https://issues.apache.org/jira/browse/CASSANDRA-2383
 
  however I am still getting this error
  I am an Administrator on my machine, and have access to all files in the
  apache- cassandra-1.2.3\conf dir, including the log4j ones.
 
  Do I need to configure anything else on Winows ? I did not find any
  Windows- specific installation/setup/startup instructions - if there are
 such
  documents somewhere, please let me know!
 
  Thanks,
  Marina
 
 




-- 
Thanks

Jabbar Azam

Re: cannot start Cassandra on Windows7

2013-03-22 Thread Jabbar Azam

Oops, I also had opscenter installed on my PC.

My  changes

log4j-server.properties file

log4j.appender.R.File=c:/var/log/cassandra/system.log

cassandra.yaml file


# directories where Cassandra should store data on disk.
data_file_directories:
- c:/var/lib/cassandra/data

# commit log
commitlog_directory: c:/var/lib/cassandra/commitlog

# saved caches
saved_caches_directory: c:/var/lib/cassandra/saved_caches*

*
I also added an environment variable for windows called CASSANDRA_HOME*
*
*
*
I needed to do this for one of my colleagues and now it's documented ;)*

*
*
*


On 22 March 2013 15:47, Jabbar Azam aja...@gmail.com wrote:

 Viktor, you're right. I didn't get any errors on my windows console but
 cassandra.yaml and log4j-server.properties need modifying.


 On 22 March 2013 15:44, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote:

 You NEED to edit cassandra.yaml and log4j-server.properties paths before
 starting on Windows.

 There're a LOT of things to learn for starters. Google for Cassandra on
 Windows.



 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies. -Original Message-
  From: Marina [mailto:ppi...@yahoo.com]
  Sent: Friday, March 22, 2013 17:21
  To: user@cassandra.apache.org
  Subject: cannot start Cassandra on Windows7
 
  Hi,
  I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on
 my
  Windows7 machine (I did not find a Windows-specific distributable...).
 Then, I
  tried to start Cassandra as following and got an error:
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting
  Cassandra Server Exception in thread main
  java.lang.ExceptionInInitializerError
  Caused by: java.lang.RuntimeException: Couldn't figure out log4j
  configuration:
  log4j-server.properties
  at
  org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo
  n.java:81)
  at
  org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon
  .java:57)
  Could not find the main class:
  org.apache.cassandra.service.CassandraDaemon.  Pr ogram will exit.
 
  C:\Marina\Tools\apache-cassandra-1.2.3\bin
 
  It looks similar to the Cassandra issue that was already fixed:
  https://issues.apache.org/jira/browse/CASSANDRA-2383
 
  however I am still getting this error
  I am an Administrator on my machine, and have access to all files in the
  apache- cassandra-1.2.3\conf dir, including the log4j ones.
 
  Do I need to configure anything else on Winows ? I did not find any
  Windows- specific installation/setup/startup instructions - if there
 are such
  documents somewhere, please let me know!
 
  Thanks,
  Marina
 
 




 --
 Thanks

 Jabbar Azam




-- 
Thanks

Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-21 Thread Jabbar Azam

All cassandra-topology.properties are the same.

The node add appears to be successful. I can see it using nodetool status.
I'm doing a node cleanup on the old nodes and then will do a node remove,
to remove the old node. The actual node join took about 6 hours. The wiped
node(now new node) has about 324 GB of files in /var/lib/cassandra





On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote:

  Not sure if I needed to change cassandra-topology.properties file on the
 existing nodes.

 If you are using the PropertyFileSnitch all nodes need to have the same
 cassandra-topology.properties file.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote:

 I've added the node with a different IP address and after disabling the
 firewall data is being streamed from the existing nodes to the wiped node.
 I'll do a cleanup, followed by remove node once it's done.

 I've also added the new node to the existing nodes'
 cassandra-topology.properties file and restarted them. I also found I had
 iptables switched on and couldn't understand why the wiped node couldn't
 see the cluster. Not sure if I needed to change
 cassandra-topology.properties file on the existing nodes.




 On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote:

 Do I use removenode before adding the reinstalled node or after?


 On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring as a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow
 the replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think
 it's the same computer
  2.  Have the bootstrap option set or something so streaming takes
 affect.

 I would however test that all out in QA to make sure it works and if
 you have QUOROM reads/writes a good part of that test would be to take node
 X down after your node Y is back in the cluster to make sure reads/writes
 are working on the node you fixed…..you just need to make sure node X
 shares one of the token ranges of node Y AND your writes/reads are in that
 token range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I
 waited for over a week to insert lots of data into the cluster. During the
 end of the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to
 achieve each step.

 Can anybody help?


 --
 Thanks

  Jabbar Azam



 --
 Thanks

 Jabbar Azam





 --
 Thanks

 Jabbar Azam




 --
 Thanks

 Jabbar Azam





-- 
Thanks

Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-21 Thread Jabbar Azam

Can I do a multiple node nodetool cleanup on my test cluster?
On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote:


 All cassandra-topology.properties are the same.

 The node add appears to be successful. I can see it using nodetool status.
 I'm doing a node cleanup on the old nodes and then will do a node remove,
 to remove the old node. The actual node join took about 6 hours. The wiped
 node(now new node) has about 324 GB of files in /var/lib/cassandra





 On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote:

  Not sure if I needed to change cassandra-topology.properties file on the
 existing nodes.

 If you are using the PropertyFileSnitch all nodes need to have the same
 cassandra-topology.properties file.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote:

 I've added the node with a different IP address and after disabling the
 firewall data is being streamed from the existing nodes to the wiped node.
 I'll do a cleanup, followed by remove node once it's done.

 I've also added the new node to the existing nodes'
 cassandra-topology.properties file and restarted them. I also found I had
 iptables switched on and couldn't understand why the wiped node couldn't
 see the cluster. Not sure if I needed to change
 cassandra-topology.properties file on the existing nodes.




 On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote:

 Do I use removenode before adding the reinstalled node or after?


 On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring as a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow
 the replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think
 it's the same computer
  2.  Have the bootstrap option set or something so streaming takes
 affect.

 I would however test that all out in QA to make sure it works and if
 you have QUOROM reads/writes a good part of that test would be to take 
 node
 X down after your node Y is back in the cluster to make sure reads/writes
 are working on the node you fixed…..you just need to make sure node X
 shares one of the token ranges of node Y AND your writes/reads are in that
 token range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I
 waited for over a week to insert lots of data into the cluster. During the
 end of the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to
 achieve each step.

 Can anybody help?


 --
 Thanks

Re: Recovering from a faulty cassandra node

2013-03-21 Thread Jabbar Azam

nodetool cleanup command removes keys which can be deleted from the node
the  command is run. So I'm assuming I can run nodetool cleanup on all the
old nodes in parallel. Wouldn't do this on a live cluster as it's I/O
intensive on each node.


On 21 March 2013 17:26, Jabbar Azam aja...@gmail.com wrote:

 Can I do a multiple node nodetool cleanup on my test cluster?
 On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote:


 All cassandra-topology.properties are the same.

 The node add appears to be successful. I can see it using nodetool
 status. I'm doing a node cleanup on the old nodes and then will do a node
 remove, to remove the old node. The actual node join took about 6 hours.
 The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra





 On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote:

  Not sure if I needed to change cassandra-topology.properties file on
 the existing nodes.

 If you are using the PropertyFileSnitch all nodes need to have the same
 cassandra-topology.properties file.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote:

 I've added the node with a different IP address and after disabling the
 firewall data is being streamed from the existing nodes to the wiped node.
 I'll do a cleanup, followed by remove node once it's done.

 I've also added the new node to the existing nodes'
 cassandra-topology.properties file and restarted them. I also found I had
 iptables switched on and couldn't understand why the wiped node couldn't
 see the cluster. Not sure if I needed to change
 cassandra-topology.properties file on the existing nodes.




 On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote:

 Do I use removenode before adding the reinstalled node or after?


 On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring as 
 a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow
 the replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your
 best option is to follow the replace node documentation which I belive 
 you
 need to


  1.  Have the token be the same BUT add 1 to it so it doesn't think
 it's the same computer
  2.  Have the bootstrap option set or something so streaming takes
 affect.

 I would however test that all out in QA to make sure it works and if
 you have QUOROM reads/writes a good part of that test would be to take 
 node
 X down after your node Y is back in the cluster to make sure reads/writes
 are working on the node you fixed…..you just need to make sure node X
 shares one of the token ranges of node Y AND your writes/reads are in 
 that
 token range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I
 waited for over a week to insert lots of data into the cluster. During 
 the
 end of the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can

Re: Recovering from a faulty cassandra node

2013-03-20 Thread Jabbar Azam

I've added the node with a different IP address and after disabling the
firewall data is being streamed from the existing nodes to the wiped node.
I'll do a cleanup, followed by remove node once it's done.

I've also added the new node to the existing nodes'
cassandra-topology.properties file and restarted them. I also found I had
iptables switched on and couldn't understand why the wiped node couldn't
see the cluster. Not sure if I needed to change
cassandra-topology.properties file on the existing nodes.




On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote:

 Do I use removenode before adding the reinstalled node or after?


 On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring as a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow
 the replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think it's
 the same computer
  2.  Have the bootstrap option set or something so streaming takes
 affect.

 I would however test that all out in QA to make sure it works and if you
 have QUOROM reads/writes a good part of that test would be to take node X
 down after your node Y is back in the cluster to make sure reads/writes are
 working on the node you fixed…..you just need to make sure node X shares
 one of the token ranges of node Y AND your writes/reads are in that token
 range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I
 waited for over a week to insert lots of data into the cluster. During the
 end of the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to
 achieve each step.

 Can anybody help?


 --
 Thanks

  Jabbar Azam



 --
 Thanks

 Jabbar Azam





 --
 Thanks

 Jabbar Azam




-- 
Thanks

Jabbar Azam

RE: java.lang.OutOfMemoryError: unable to create new native thread

2013-03-20 Thread Jabbar Azam

Hello,
Also have a look at
http://www.datastax.com/docs/1.2/install/recommended_settings
 On 21 Mar 2013 00:06, S C as...@outlook.com wrote:

  Apparently max user process was set very low on the machine.

 How to check?
 ulimit -u

 Set it to unlimited /etc/security/limits.conf

 * soft nprocs unlimited
 * hard nprocs unlimited



 --
 From: as...@outlook.com
 To: user@cassandra.apache.org
 Subject: RE: java.lang.OutOfMemoryError: unable to create new native thread
 Date: Fri, 15 Mar 2013 18:57:05 -0500

 I think I figured out where the issue is. I will keep you posted soon.

 --
 From: as...@outlook.com
 To: user@cassandra.apache.org
 Subject: java.lang.OutOfMemoryError: unable to create new native thread
 Date: Fri, 15 Mar 2013 17:54:25 -0500

 I have a Cassandra node that is going down frequently with
 'java.lang.OutOfMemoryError: unable to create new native thread. Its a
 16GB VM out of which 4GB is set as Xmx and there are no other process
 running on the VM.  I have about 300 clients connecting to this node on an
 average. I have no indication from vmstats/SAR that my VM has used more
 memory or is memory hungry. Doesn't indicate a memory issue to
 me. Appreciate any pointers to this.

 System Specs:
 2CPU
 16GB
 RHEL 6.2


 Thank you.

Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam

Hello,

I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited
for over a week to insert lots of data into the cluster. During the end of
the process one of the nodes had a hardware fault.

I have fixed the hardware fault but the filing system on that node is
corrupt so I'll have to reinstall the OS and cassandra.

I can think of two ways of reintegrating the host into the cluster

1) shrink the cluster to three nodes and add the node into the cluster

2) Add the node into the cluster without shrinking

I'm not sure of the best approach to take and I'm not sure how to achieve
each step.

Can anybody help?


-- 
Thanks

 Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam

Hello Dean.

I'm using vnodes so can't specify a token. In addition I can't follow the
replace node docs because I don't have a replacement node.


On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.gov wrote:

 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think it's
 the same computer
  2.  Have the bootstrap option set or something so streaming takes affect.

 I would however test that all out in QA to make sure it works and if you
 have QUOROM reads/writes a good part of that test would be to take node X
 down after your node Y is back in the cluster to make sure reads/writes are
 working on the node you fixed…..you just need to make sure node X shares
 one of the token ranges of node Y AND your writes/reads are in that token
 range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited
 for over a week to insert lots of data into the cluster. During the end of
 the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to achieve
 each step.

 Can anybody help?


 --
 Thanks

  Jabbar Azam




-- 
Thanks

Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam

Yes you're probably right. I don't really understand the token generation
so was reluctant to do that. I'll install linux on the faulty node now and
let you know what happens.


On 19 March 2013 15:38, Hiller, Dean dean.hil...@nrel.gov wrote:

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow the
 replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think it's
 the same computer
  2.  Have the bootstrap option set or something so streaming takes affect.

 I would however test that all out in QA to make sure it works and if you
 have QUOROM reads/writes a good part of that test would be to take node X
 down after your node Y is back in the cluster to make sure reads/writes are
 working on the node you fixed…..you just need to make sure node X shares
 one of the token ranges of node Y AND your writes/reads are in that token
 range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited
 for over a week to insert lots of data into the cluster. During the end of
 the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to achieve
 each step.

 Can anybody help?


 --
 Thanks

  Jabbar Azam



 --
 Thanks

 Jabbar Azam




-- 
Thanks

Jabbar Azam

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam

Do I use removenode before adding the reinstalled node or after?


On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote:

 In 1.2, you may want to use the nodetool removenode if your server i
 broken or unreachable, else I guess nodetool decommission remains the good
 way to remove a node. (
 http://www.datastax.com/docs/1.2/references/nodetool)

 When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
 change the configuration if needed (not sure about the auto_bootstrap
 param) and start Cassandra on that node again. It should join the ring as a
 new node.

 Good luck.


 2013/3/19 Hiller, Dean dean.hil...@nrel.gov

 Since you cleared out that node, it IS the replacement node.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Recovering from a faulty cassandra node

 Hello Dean.

 I'm using vnodes so can't specify a token. In addition I can't follow the
 replace node docs because I don't have a replacement node.


 On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 I have not done this as of yet but from all that I have read your best
 option is to follow the replace node documentation which I belive you need
 to


  1.  Have the token be the same BUT add 1 to it so it doesn't think it's
 the same computer
  2.  Have the bootstrap option set or something so streaming takes affect.

 I would however test that all out in QA to make sure it works and if you
 have QUOROM reads/writes a good part of that test would be to take node X
 down after your node Y is back in the cluster to make sure reads/writes are
 working on the node you fixed…..you just need to make sure node X shares
 one of the token ranges of node Y AND your writes/reads are in that token
 range.

 Dean

 From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto:
 aja...@gmail.commailto:aja...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, March 19, 2013 8:51 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Recovering from a faulty cassandra node

 Hello,

 I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited
 for over a week to insert lots of data into the cluster. During the end of
 the process one of the nodes had a hardware fault.

 I have fixed the hardware fault but the filing system on that node is
 corrupt so I'll have to reinstall the OS and cassandra.

 I can think of two ways of reintegrating the host into the cluster

 1) shrink the cluster to three nodes and add the node into the cluster

 2) Add the node into the cluster without shrinking

 I'm not sure of the best approach to take and I'm not sure how to achieve
 each step.

 Can anybody help?


 --
 Thanks

  Jabbar Azam



 --
 Thanks

 Jabbar Azam





-- 
Thanks

Jabbar Azam

Re: Backup solution

2013-03-14 Thread Jabbar Azam

Hello,

If the live data centre disappears restoring the data from the backup is
going to take ages especially if the data is going from one data centre to
another, unless you have a high bandwidth connection between data centres
or you have a small amount of data.

Jabbar Azam
On 14 Mar 2013 14:31, Rene Kochen rene.koc...@schange.com wrote:

 Hi all,

 Is the following a good backup solution.

 Create two data-centers:

 - A live data-center with multiple nodes (commodity hardware). Clients
 connect to this cluster with LOCAL_QUORUM.
 - A backup data-center with 1 node (with fast SSDs). Clients do not
 connect to this cluster. Cluster only used for creating and storing
 snapshots.

 Advantages:

 - No snapshots and bulk network I/O (transfer snapshots) needed on the
 live cluster.
 - Clients are not slowed down because writes to the backup data-center are
 async.
 - On the backup cluster snapshots are made on a regular basis. This again
 does not affect the live cluster.
 - The back-up cluster does not need to process client requests/reads, so
 we need less machines for the backup cluster than the live cluster.

 Are there any disadvantages with this approach?

 Thanks!

Re: Compaction statistics information

2013-03-03 Thread Jabbar Azam

Thanks Tyler
On 3 Mar 2013 18:55, Tyler Hobbs ty...@datastax.com wrote:

 It's a description of how many of the compacted SSTables the rows were
 spread across prior to compaction.  In your case, 15 rows were spread
 across two of the four sstables, 68757 rows were spread across three of the
 four sstables, and 6865 were spread across all four.


 On Fri, Mar 1, 2013 at 11:07 AM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 I'm seeing compaction statistics which look like the following

  INFO 17:03:09,216 Compacted 4 sstables to
 [/var/lib/cassandra/data/studata/datapoints/studata-datapoints-ib-629,].
 420,807,293 bytes to 415,287,150 (~98% of original) in 341,690ms =
 1.159088MB/s.  233,761 total rows, 75,637 unique.  Row merge counts were
 {1:0, 2:15, 3:68757, 4:6865, }

 Does anybody know what Row merge counts were {1:0, 2:15, 3:68757,
 4:6865, } means?

 --
 Thanks

  A Jabbar Azam




 --
 Tyler Hobbs
 DataStax http://datastax.com/

70 matches

Mail list logo