Re: Invalid metadata has been detected for role

2018-05-18 Thread Abdul Patel
Hey

Thanks for response, i accidently decommissioned the seed node which was
causing this.
I promoted another node as seed node and restartwd all nodes with new seed
nodes , follwed by full nodetool repair and cleanup and it fixed the issue.

On Thursday, May 17, 2018, kurt greaves  wrote:

> Can you post the stack trace and you're version of Cassandra?
>
> On Fri., 18 May 2018, 09:48 Abdul Patel,  wrote:
>
>> Hi
>>
>> I had to decommission one dc , now while adding bacl the same nodes ( i
>> used nodetool decommission) they both get added fine and i also see them im
>> nodetool status but i am unable to login in them .gives invalid mwtadat
>> error, i ran repair and later cleanup as well.
>>
>> Any ideas?
>>
>>


Re: Interesting Results - Cassandra Benchmarks over Time Series Data for IoT Use Case I

2018-05-18 Thread onmstester onmstester
I recommend you to review newts data model, which is a time-series data model 
upon cassandra:

https://github.com/OpenNMS/newts/wiki/DataModel



Sent using Zoho Mail









First the use-case: We have time-series of data from devices on several sites, 
where each device (with a unique dev_id) can have several sensors attached to 
it. Most queries however are both time limited as well as over a range of 
dev_ids, even for a single sensor (Multi-sensor joins are a whole different 
beast for another day!). We want to have a schema where the query can complete 
in time linear to the query ranges for both devices and time range, immaterial 
(largely) to the total data size. 





So we explored several different primary key definitions, learning from the 
best-practices communicated on this mailing list and over the interwebs. While 
details about the setup (Spark over C*) and schema are in a companion blog/site 
here [1], we just mention the primary keys and the key points here. 



PRIMARY KEY (dev_id, day, rec_time)


PRIMARY KEY ((dev_id, rec_time)


PRIMARY KEY (day, dev_id, rec_time)


PRIMARY KEY ((day, dev_id), rec_time)


PRIMARY KEY ((dev_id, day), rec_time)


Combination of above by adding a year field in the schema.




The main takeaway (again, please read through the details at [1]) is that we 
really don't have a single schema to answer the use case above without some 
drawback. Thus while the ((day, dev_id), rec_time) gives a constant response, 
it is dependent entirely on the total data size (full scan). On the other hand, 
(dev_id, day, rec_time) and its counterpart (day, dev_id, rec_time) provide 
acceptable results, we have the issue of very large partition space in the 
first, and hotspot while writing for the latter case.



We also observed that having a multi-field partition key allows for fast 
querying only if the "=" is used going left to right. If an IN() (for 
specifying eg. range of time or list of devices) is used once that order, than 
any further usage of IN() removes any benefit (i.e. a near full table scan).



Another useful learning was that using the IN() to query for days is less 
useful than putting in a range query.



Currently, it seems we are in a bind --- should we use a different data store 
for our usecase (which seems quite typical for IoT)? Something like HDFS or 
Parquet? We would love to get feedback on the benchmarking results and how we 
can possibly improve this and share widely.


[1] Cassandra Benchmarks over Time Series Data for IoT Use Case

   https://sites.google.com/an10.io/timeseries-results






-- 

Regards,

Arbab Khalil

Software Design Engineer










Re: Using K8s to Manage Cassandra in Production

2018-05-18 Thread Pradeep Chhetri
Hello Hassaan,

We use cassandra helm chart[0] for deploying cassandra over kubernetes in
production. We have around 200GB cas data. It works really well. You can
scale up nodes easily (I haven't tested scaling down).

I would say that if you are worried about running cassandra over k8s in
production, maybe you should first try setting it for your
staging/preproduction and gain confidence over time.

I have tested situations where i have killed the host running cassandra
container and have seen that container moves to a different node and joins
cluster properly. So from my experience its pretty good. No issues till yet.

[0]: https://github.com/kubernetes/charts/tree/master/incubator/cassandra


Regards,
Pradeep

On Fri, May 18, 2018 at 1:01 PM, Павел Сапежко 
wrote:

> Hi, Hassaan! For example we are using C* in k8s in production for our
> video surveillance system. Moreover, we are using Ceph RBD as our storage
> for cassandra. Today we have 8 C* nodes each manages 2Tb of data.
>
> On Fri, May 18, 2018 at 9:27 AM Hassaan Pasha  wrote:
>
>> Hi,
>>
>> I am trying to craft a deployment strategy for deploying and maintaining
>> a C* cluster. I was wondering if there are actual production deployments of
>> C* using K8s as the orchestration layer.
>>
>> I have been given the impression that K8s managing a C* cluster can be a
>> recipe for disaster, especially if you aren't well versed with the
>> intricacies of a scale-up/down event. I know use cases where people are
>> using Mesos or a custom tool built with terraform/chef etc to run their
>> production clusters but have yet to find a real K8s use case.
>>
>> *Questions?*
>> Is K8s a reasonable choice for managing a production C* cluster?
>> Are there documented use cases for this?
>>
>> Any help would be greatly appreciated.
>>
>> --
>> Regards,
>>
>>
>> *Hassaan Pasha*
>>
> --
>
> Regrads,
>
> Pavel Sapezhko
>
>


Re: Using K8s to Manage Cassandra in Production

2018-05-18 Thread Павел Сапежко
Hi, Hassaan! For example we are using C* in k8s in production for our video
surveillance system. Moreover, we are using Ceph RBD as our storage for
cassandra. Today we have 8 C* nodes each manages 2Tb of data.

On Fri, May 18, 2018 at 9:27 AM Hassaan Pasha  wrote:

> Hi,
>
> I am trying to craft a deployment strategy for deploying and maintaining a
> C* cluster. I was wondering if there are actual production deployments of
> C* using K8s as the orchestration layer.
>
> I have been given the impression that K8s managing a C* cluster can be a
> recipe for disaster, especially if you aren't well versed with the
> intricacies of a scale-up/down event. I know use cases where people are
> using Mesos or a custom tool built with terraform/chef etc to run their
> production clusters but have yet to find a real K8s use case.
>
> *Questions?*
> Is K8s a reasonable choice for managing a production C* cluster?
> Are there documented use cases for this?
>
> Any help would be greatly appreciated.
>
> --
> Regards,
>
>
> *Hassaan Pasha*
>
-- 

Regrads,

Pavel Sapezhko