Re: Query

2016-12-29 Thread Edward Capriolo
You should start with understanding your needs. Once you understand your
need you can pick the software that fits your need. Staring with a software
stack is backwards.

On Thu, Dec 29, 2016 at 11:34 PM, Ben Slater 
wrote:

> I wasn’t familiar with Gizzard either so I thought I’d take a look. The
> first things on their github readme is:
> *NB: This project is currently not recommended as a base for new
> consumers.*
> (And no commits since 2013)
>
> So, Cassandra definitely looks like a better choice as your datastore for
> a new project.
>
> Cheers
> Ben
>
> On Fri, 30 Dec 2016 at 12:41 Manoj Khangaonkar 
> wrote:
>
>> I am not that familiar with gizzard but with gizzard + mysql , you have
>> multiple moving parts in the system that need to managed separately. You'll
>> need the mysql expert for mysql and the gizzard expert to manage the
>> distributed part. It can be argued that long term this will have higher
>> adminstration cost
>>
>> Cassandra's value add is its simple peer to peer architecture that is
>> easy to manage - a single database solution that is distributed, scalable,
>> highly available etc. In other words, once you gain expertise cassandra,
>> you get everything in one package.
>>
>> regards
>>
>>
>>
>>
>>
>> On Thu, Dec 29, 2016 at 4:05 AM, Sikander Rafiq 
>> wrote:
>>
>> Hi,
>>
>> I'm exploring Cassandra for handling large data sets for mobile app, but
>> i'm not clear where it stands.
>>
>>
>> If we use MySQL as  underlying database and Gizzard for building custom
>> distributed databases (with arbitrary storage technology) and Memcached for
>> highly queried data, then where lies Cassandra?
>>
>>
>> As i have read that Twitter uses both Cassandra and Gizzard. Please
>> explain me where Cassandra will act.
>>
>>
>> Thanks in advance.
>>
>>
>> Regards,
>>
>> Sikander
>>
>>
>> Sent from Outlook 
>>
>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>
>


Re: Query

2016-12-29 Thread Ben Slater
I wasn’t familiar with Gizzard either so I thought I’d take a look. The
first things on their github readme is:
*NB: This project is currently not recommended as a base for new consumers.*
(And no commits since 2013)

So, Cassandra definitely looks like a better choice as your datastore for a
new project.

Cheers
Ben

On Fri, 30 Dec 2016 at 12:41 Manoj Khangaonkar 
wrote:

> I am not that familiar with gizzard but with gizzard + mysql , you have
> multiple moving parts in the system that need to managed separately. You'll
> need the mysql expert for mysql and the gizzard expert to manage the
> distributed part. It can be argued that long term this will have higher
> adminstration cost
>
> Cassandra's value add is its simple peer to peer architecture that is easy
> to manage - a single database solution that is distributed, scalable,
> highly available etc. In other words, once you gain expertise cassandra,
> you get everything in one package.
>
> regards
>
>
>
>
>
> On Thu, Dec 29, 2016 at 4:05 AM, Sikander Rafiq 
> wrote:
>
> Hi,
>
> I'm exploring Cassandra for handling large data sets for mobile app, but
> i'm not clear where it stands.
>
>
> If we use MySQL as  underlying database and Gizzard for building custom
> distributed databases (with arbitrary storage technology) and Memcached for
> highly queried data, then where lies Cassandra?
>
>
> As i have read that Twitter uses both Cassandra and Gizzard. Please
> explain me where Cassandra will act.
>
>
> Thanks in advance.
>
>
> Regards,
>
> Sikander
>
>
> Sent from Outlook 
>
>
>
>
> --
> http://khangaonkar.blogspot.com/
>


Re: Query

2016-12-29 Thread Manoj Khangaonkar
I am not that familiar with gizzard but with gizzard + mysql , you have
multiple moving parts in the system that need to managed separately. You'll
need the mysql expert for mysql and the gizzard expert to manage the
distributed part. It can be argued that long term this will have higher
adminstration cost

Cassandra's value add is its simple peer to peer architecture that is easy
to manage - a single database solution that is distributed, scalable,
highly available etc. In other words, once you gain expertise cassandra,
you get everything in one package.

regards





On Thu, Dec 29, 2016 at 4:05 AM, Sikander Rafiq 
wrote:

> Hi,
>
> I'm exploring Cassandra for handling large data sets for mobile app, but
> i'm not clear where it stands.
>
>
> If we use MySQL as  underlying database and Gizzard for building custom
> distributed databases (with arbitrary storage technology) and Memcached for
> highly queried data, then where lies Cassandra?
>
>
> As i have read that Twitter uses both Cassandra and Gizzard. Please
> explain me where Cassandra will act.
>
>
> Thanks in advance.
>
>
> Regards,
>
> Sikander
>
>
> Sent from Outlook 
>



-- 
http://khangaonkar.blogspot.com/


Re: How to change Replication Strategy and RF

2016-12-29 Thread kurt Greaves
​If you're already using the cluster in production and require no downtime
you should perform a datacenter migration first to change the RF to 3.
Rough process would be as follows:

   1. Change keyspace to NetworkTopologyStrategy with RF=1. You shouldn't
   increase RF here as you will receive read failures as not all nodes have
   the data they own. You would have to wait for a repair to complete to stop
   any read failures.
   2. Configure your clients to use a LOCAL_* consistency and
   DCAwareRoundRobinPolicy for load balancing (with the current DC configured)
   3. Add a new datacenter, configure it's replication to be 3.
   4. Rebuild the new datacenter by running nodetool rebuild  on
   each node in the new DC.
   5. Migrate your clients to use the new datacenter, by switching the
   contact points to nodes in the new DC and the load balancing policy DC to
   the new DC
   6. At this point you could increase the replication factor on the old DC
   to 3, and then run a repair. Once the repair successfully completes you
   should have 2 DCs that you can use. If you need the DCs in separate
   locations you could change this step to adding another DC in the desired
   other location and running rebuilds as per steps 2-4.

- Kurt


Query

2016-12-29 Thread Sikander Rafiq
Hi,

I'm exploring Cassandra for handling large data sets for mobile app, but i'm 
not clear where it stands.


If we use MySQL as  underlying database and Gizzard for building custom 
distributed databases (with arbitrary storage technology) and Memcached for 
highly queried data, then where lies Cassandra?


As i have read that Twitter uses both Cassandra and Gizzard. Please explain me 
where Cassandra will act.


Thanks in advance.


Regards,

Sikander



Sent from Outlook


How to change Replication Strategy and RF

2016-12-29 Thread techpyaasa .
Hi all,

We have mistakenly setup c*-2.0.17 cluster (with 1 DC , 3 racks , 2 nodes
in each rack with SimpleStrategy & *RF=1)*.
Now data on each node is nearly 1.4 GB+ .

Now we would like to change Replication Strategy to NetworkTopologyStrategy
and RF=3 and also add a new Data Center to this cluster.

Can someone please suggest *safest* way to do so.

Thanks in advance,
Techpyaasa


Re: Comment on query performance

2016-12-29 Thread Ashutosh Dhundhara
Thanks DuyHai once again :-)

On Thu, Dec 29, 2016 at 3:35 PM, DuyHai Doan  wrote:

> No full table scan because you specify all the partition key columns in
> your WHERE clause.
>
> On Thu, Dec 29, 2016 at 11:02 AM, Ashutosh Dhundhara <
> ashutoshdhundh...@yahoo.com> wrote:
>
>> Thanks DuyHai.
>>
>> One more thing, is it going to be a full table scan across all the nodes
>> in cluster?
>>
>> On Thu, Dec 29, 2016 at 3:30 PM, DuyHai Doan 
>> wrote:
>>
>>> In your case, ALLOW FILTERING will require Cassandra to scan linearly on
>>> disk and fetch all the partition data into memory  so the performance
>>> depends on how "large" your partition is. For small partitions it should be
>>> fine.
>>>
>>>
>>> On Thu, Dec 29, 2016 at 10:00 AM, Ashutosh Dhundhara <
>>> ashutoshdhundh...@yahoo.com> wrote:
>>>
 Hi All,

 I have a table like this:

 CREATE TABLE IF NOT EXISTS Posts (
 idObject int,
 objectType text,
 idParent int,
 id int,
 idResolution int,
 PRIMARY KEY ((idObject, objectType, idParent), id)
 );

 Now have a look at the following query:

 SELECT * FROM POSTS WHERE idobject = 1 AND objectType = 'COURSE' AND 
 idParent = 0 AND idResolution = 1 ALLOW FILTERING

 Now the Partition Key is completely known, so if I use ALLOW FILTERING is
 there going to be any performance issue because the filtering is going to
 be done in a known single partition?


 --
 Ashutosh Dhundhara

>>>
>>>
>>
>>
>> --
>> Ashutosh Dhundhara
>>
>
>


-- 
Ashutosh Dhundhara


Re: Comment on query performance

2016-12-29 Thread DuyHai Doan
No full table scan because you specify all the partition key columns in
your WHERE clause.

On Thu, Dec 29, 2016 at 11:02 AM, Ashutosh Dhundhara <
ashutoshdhundh...@yahoo.com> wrote:

> Thanks DuyHai.
>
> One more thing, is it going to be a full table scan across all the nodes
> in cluster?
>
> On Thu, Dec 29, 2016 at 3:30 PM, DuyHai Doan  wrote:
>
>> In your case, ALLOW FILTERING will require Cassandra to scan linearly on
>> disk and fetch all the partition data into memory  so the performance
>> depends on how "large" your partition is. For small partitions it should be
>> fine.
>>
>>
>> On Thu, Dec 29, 2016 at 10:00 AM, Ashutosh Dhundhara <
>> ashutoshdhundh...@yahoo.com> wrote:
>>
>>> Hi All,
>>>
>>> I have a table like this:
>>>
>>> CREATE TABLE IF NOT EXISTS Posts (
>>> idObject int,
>>> objectType text,
>>> idParent int,
>>> id int,
>>> idResolution int,
>>> PRIMARY KEY ((idObject, objectType, idParent), id)
>>> );
>>>
>>> Now have a look at the following query:
>>>
>>> SELECT * FROM POSTS WHERE idobject = 1 AND objectType = 'COURSE' AND 
>>> idParent = 0 AND idResolution = 1 ALLOW FILTERING
>>>
>>> Now the Partition Key is completely known, so if I use ALLOW FILTERING is
>>> there going to be any performance issue because the filtering is going to
>>> be done in a known single partition?
>>>
>>>
>>> --
>>> Ashutosh Dhundhara
>>>
>>
>>
>
>
> --
> Ashutosh Dhundhara
>


Re: Comment on query performance

2016-12-29 Thread Ashutosh Dhundhara
Thanks DuyHai.

One more thing, is it going to be a full table scan across all the nodes in
cluster?

On Thu, Dec 29, 2016 at 3:30 PM, DuyHai Doan  wrote:

> In your case, ALLOW FILTERING will require Cassandra to scan linearly on
> disk and fetch all the partition data into memory  so the performance
> depends on how "large" your partition is. For small partitions it should be
> fine.
>
>
> On Thu, Dec 29, 2016 at 10:00 AM, Ashutosh Dhundhara <
> ashutoshdhundh...@yahoo.com> wrote:
>
>> Hi All,
>>
>> I have a table like this:
>>
>> CREATE TABLE IF NOT EXISTS Posts (
>> idObject int,
>> objectType text,
>> idParent int,
>> id int,
>> idResolution int,
>> PRIMARY KEY ((idObject, objectType, idParent), id)
>> );
>>
>> Now have a look at the following query:
>>
>> SELECT * FROM POSTS WHERE idobject = 1 AND objectType = 'COURSE' AND 
>> idParent = 0 AND idResolution = 1 ALLOW FILTERING
>>
>> Now the Partition Key is completely known, so if I use ALLOW FILTERING is
>> there going to be any performance issue because the filtering is going to
>> be done in a known single partition?
>>
>>
>> --
>> Ashutosh Dhundhara
>>
>
>


-- 
Ashutosh Dhundhara


Re: Comment on query performance

2016-12-29 Thread DuyHai Doan
In your case, ALLOW FILTERING will require Cassandra to scan linearly on
disk and fetch all the partition data into memory  so the performance
depends on how "large" your partition is. For small partitions it should be
fine.


On Thu, Dec 29, 2016 at 10:00 AM, Ashutosh Dhundhara <
ashutoshdhundh...@yahoo.com> wrote:

> Hi All,
>
> I have a table like this:
>
> CREATE TABLE IF NOT EXISTS Posts (
> idObject int,
> objectType text,
> idParent int,
> id int,
> idResolution int,
> PRIMARY KEY ((idObject, objectType, idParent), id)
> );
>
> Now have a look at the following query:
>
> SELECT * FROM POSTS WHERE idobject = 1 AND objectType = 'COURSE' AND idParent 
> = 0 AND idResolution = 1 ALLOW FILTERING
>
> Now the Partition Key is completely known, so if I use ALLOW FILTERING is
> there going to be any performance issue because the filtering is going to
> be done in a known single partition?
>
>
> --
> Ashutosh Dhundhara
>


Comment on query performance

2016-12-29 Thread Ashutosh Dhundhara
Hi All,

I have a table like this:

CREATE TABLE IF NOT EXISTS Posts (
idObject int,
objectType text,
idParent int,
id int,
idResolution int,
PRIMARY KEY ((idObject, objectType, idParent), id)
);

Now have a look at the following query:

SELECT * FROM POSTS WHERE idobject = 1 AND objectType = 'COURSE' AND
idParent = 0 AND idResolution = 1 ALLOW FILTERING

Now the Partition Key is completely known, so if I use ALLOW FILTERING is
there going to be any performance issue because the filtering is going to
be done in a known single partition?


-- 
Ashutosh Dhundhara