Re: Insert-only application repair

2018-05-12 Thread onmstester onmstester
needs to be run to make sure data is in sync. Sent from my iPhone On May 12, 2018, at 3:54 AM, onmstester onmstester onmstes...@zoho.com wrote: In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according

Insert-only application repair

2018-05-12 Thread onmstester onmstester
In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever

Solve Busy pool at Cassandra side

2018-05-13 Thread onmstester onmstester
Hi, I'm getting "Pool is Busy (limit is 256)", while connecting to a single node cassandra cluster. The whole client side application is a 3rd-party lib which i can't change it's source and its session builder is not using any PoolingOptions. Is there any config on cassandra side that could

Re: Interesting Results - Cassandra Benchmarks over Time Series Data for IoT Use Case I

2018-05-18 Thread onmstester onmstester
I recommend you to review newts data model, which is a time-series data model upon cassandra: https://github.com/OpenNMS/newts/wiki/DataModel Sent using Zoho Mail First the use-case: We have time-series of data from devices on several sites, where each device (with a unique dev_id)

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got worse. Even increasing Key cache size and Row cache size did not help. Sent using Zoho Mail On Sun, 20 May 2018 08:52:03 +0430 Jeff Jirsa jji...@gmail.com wrote Column

IN clause of prepared statement

2018-05-20 Thread onmstester onmstester
The table is something like Samples ... partition key (partition,resource,(timestamp,metric_name) creating prepared statement : session.prepare("select * from samples where partition=:partition and resource=:resource and timestamp=:start and timestamp=:end and metric_name in

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Should i run compaction after changing column_index_size_in_kb? Sent using Zoho Mail On Sun, 20 May 2018 15:06:57 +0430 onmstester onmstester onmstes...@zoho.com wrote I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got

Reading from big partitions

2018-05-19 Thread onmstester onmstester
Hi, Due to some unpredictable behavior in input data i end up with some hundred partitions having more than 300MB size. Reading any sequence of data from these partitions took about 5 seconds while reading from other partitions (with less than 50MB sizes) took less than 10ms. Since i can't

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Data spread between a SSD disk and a 15K disk. the table has 26 tables totally. I haven't try tracing, but i will and inform you! Sent using Zoho Mail On Sun, 20 May 2018 08:26:33 +0430 Jonathan Haddad j...@jonhaddad.com wrote What disks are you using? How many sstables

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
It seems that there is no way doing this using Cassandra and even something like spark won't help because i'm going to read from a big Cassandra partition (bottleneck is reading from Cassandra) Sent using Zoho Mail On Tue, 22 May 2018 09:08:55 +0430 onmstester onmstester onmstes

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
practice, you shouldn’t do select * (as a production query) against any database. You want to list the columns you actually want to select. That way a later “alter table add column” (or similar) doesn’t cause unpredictable results to the application. Sean Durity From: onmstester onmstester

cassandra concurrent read performance problem

2018-05-26 Thread onmstester onmstester
By reading 90 partitions concurrently(each having size 200 MB), My single node Apache Cassandra became unresponsive, no read and write works for almost 10 minutes. I'm using this configs: memtable_allocation_type: offheap_buffers gc: G1GC heap: 128GB concurrent_reads: 128 (having more

Re: data consistency without using nodetool repair

2018-06-09 Thread onmstester onmstester
:28 PM, onmstester onmstester onmstes...@zoho.com wrote: I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it would put hige load on my

data consistency without using nodetool repair

2018-06-09 Thread onmstester onmstester
I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it would put hige load on my cluster for a long time, but also i need data

saving distinct data in cassandra result in many tombstones

2018-06-12 Thread onmstester onmstester
Hi, I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there are too many repeated data. Table schema: Table distinct( hourNumber int, key text, distinctValue long primary key (hourNumber) ) I want

Write performance degradation

2018-06-17 Thread onmstester onmstester
Hi, I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements with no problem. I saw a lot of warning show that most of batches not concerning a single node, so they should not be in a batch, on the other

cassandra update vs insert + delete

2018-05-27 Thread onmstester onmstester
Hi I want to load all rows from many partitions and change a column value in each row, which of following ways is better concerning disk space and performance? 1. create a update statement for every row and batch update for each partitions 2. create an insert statement for every row and batch

Re: how to immediately delete tombstones

2018-06-01 Thread onmstester onmstester
Thanks for your replies But my current situation is that i do not have enough free disk for my biggest sstable, so i could not run major compaction or nodetool garbagecollect Sent using Zoho Mail On Thu, 31 May 2018 22:32:32 +0430 Alain RODRIGUEZ arodr...@gmail.com wrote

how to immediately delete tombstones

2018-05-31 Thread onmstester onmstester
Hi, I've deleted 50% of my data row by row now disk usage of cassandra data is more than 80%. The gc_grace of table was default (10 days), now i set that to 0, although many compactions finished but no space reclaimed so far. How could i force deletion of tombstones in sstables and reclaim

Fwd: Re: cassandra update vs insert + delete

2018-05-28 Thread onmstester onmstester
ter to add / update or insert data and do a soft delete on old data and apply a TTL to remove it at a future time. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 27, 2018, 5:36 AM -0400, onmstester onmstester onmstes...@zoho.com, wrote: Hi I want to load all rows f

Fwd: 答复: Re: cassandra update vs insert + delete

2018-05-28 Thread onmstester onmstester
jia creative city,Luoyu Road,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516 发件人: onmstester onmstester onmstes...@zoho.com 发送时间: 2018年5月28日 14:33 收件人: user user@cassandra.apache.org 主题: Fwd: Re: cassandra update vs insert + delete How update is working underneath? Does it cr

copy sstables while cassandra is running

2018-06-23 Thread onmstester onmstester
Hi I'm using two directories on different disks as cassandra data storage, the small disk is 90% full and the bigger diskis 30% full (the bigger one was added later that we find out we need more storage!!), so i want to move all data to the big disk, one way is to stop my application and copy

Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
cycle. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester onmstes...@zoho.com, wrote: But monitoring cassandra with jmx using jvisualVM shows no problem, less than 30% of heap size used Sent using Zoho Mail

Re: Cassandra cluster: could not reach linear scalability

2018-02-19 Thread onmstester onmstester
onmstester onmstester onmstes...@zoho.com: I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with iperf, it really is!) using the common configs described in many sites including datastax itself

Re: Right sizing Cassandra data nodes

2018-02-23 Thread onmstester onmstester
Another Question on node density, in this scenario: 1. we should keep time series data of some years for a heavy write system in Cassandra ( 10K Ops in seconds) 2. the system is insert only and inserted data would never be updated 3. in partition key, we used number of months since 1970, so

Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with iperf, it really is!) using the common configs described in many sites including datastax itself: cluster_name: 'MyCassandraCluster' num_tokens: 256

Cassandra data model too many table

2018-02-18 Thread onmstester onmstester
I have a single structured row as input with rate of 10K per seconds. Each row has 20 columns. Some queries should be answered on these inputs. Because most of queries needs different where, group by or orderby, The final data model ended up like this: primary key for table of query1 :

Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester onmstes...@zoho.com, wrote: I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with iperf

hardware sizing for insert only scenarios

2018-02-26 Thread onmstester onmstester
Another Question on node density, in this scenario: 1. we should keep time series data of some years for a heavy write system in Cassandra ( 10K Ops in seconds) 2. the system is insert only and inserted data would never be updated 3. in partition key, we used number of months since 1970, so

Re: saving distinct data in cassandra result in many tombstones

2018-06-18 Thread onmstester onmstester
On Tue, 19 Jun 2018 08:16:28 +0430 onmstester onmstester onmstes...@zoho.com wrote Can i set gc_grace_seconds to 0 in this case? because reappearing deleted data has no impact on my Business Logic, i'm just either creating a new row or replacing the exactly same row. Sent using

Re: Write performance degradation

2018-06-18 Thread onmstester onmstester
16:24:48 +0430 DuyHai Doan doanduy...@gmail.com wrote Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester onmstes...@zoho.com wrote: Hi, I was doing 500K inserts

Re: saving distinct data in cassandra result in many tombstones

2018-06-18 Thread onmstester onmstester
data if the table hasn't been repaired during the grace interval. You can also just increase the tombstone thresholds, but the queries will be pretty expensive/wasteful. On Tue, Jun 12, 2018 at 2:02 AM, onmstester onmstester onmstes...@zoho.com wrote: Hi, I needed to save

Data model storage optimization

2018-07-28 Thread onmstester onmstester
The current data model described as table name: ((partition_key),cluster_key),other_column1,other_column2,... user_by_name: ((time_bucket, username)),ts,request,email user_by_mail: ((time_bucket, email)),ts,request,username The reason that all 2 keys (username, email) repeated in all tables is

Fwd: Re: Data model storage optimization

2018-07-29 Thread onmstester onmstester
How many rows in average per partition? around 10K. Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? We are just analyzing output

full text search on some text columns

2018-07-31 Thread onmstester onmstester
I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any

Fwd: Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester
Subject : Re: [EXTERNAL] full text search on some text columns Forwarded message Maybe this plugin could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 Jul 2018 at 22:37, onmstester onmstester wrote:

Re: full text search on some text columns

2018-07-31 Thread onmstester onmstester
Thanks Jordan, There would be millions of rows per day, is SASI capable of standing such a rate? Sent using Zoho Mail On Tue, 31 Jul 2018 19:47:55 +0430 Jordan West wrote On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester wrote: I need to do a full text search (like) on one

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester
  From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns   I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only

updating old partitions in STCS

2018-08-04 Thread onmstester onmstester
I read in some best practising documents on datam model that: do not update old partitions while using STCS. But i always use cluster keys in my queries and cqlsh-tracing reports that it only accesses sstables with data having specified cluster key (not all sstables containing part of

data loss

2018-08-14 Thread onmstester onmstester
I am inserting to Cassandra by a simple insert query and an update counter query for every input record. input rate is so high. I've configured the update query with idempotent = true (no config for insert query, default is false IMHO) I've seen multiple records having rows in counter table

bigger data density with Cassandra 4.0?

2018-08-25 Thread onmstester onmstester
I've noticed this new feature of 4.0: Streaming optimizations (https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html) Is this mean that we could have much more data density with Cassandra 4.0 (less problems than 3.X)? I mean > 10 TB of data on each node without

Fwd: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread onmstester onmstester
Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-reads and having nodes with big amount of data (> 2 TB) would be annoying for this

Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread onmstester onmstester
August 2018 at 19:43, onmstester onmstester wrote: Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-reads and having nodes with

Re: Cassandra node RAM amount vs data-per-node/total data?

2018-07-17 Thread onmstester onmstester
I actually never set Xmx > 32 GB, for any java application, unless it necessarily need more. Just because of the fact: "once you exceed this 32 GiB border JVM will stop using compressed object pointers, effectively reducing the available memory. That means increasing your JVM heap above 32 GiB

how to fix too many native-transport-blocked?

2018-07-18 Thread onmstester onmstester
Hi , On a cluster with 10 nodes, Out of 20K/seconds Native-Transports, 200/seconds of them blocked. They are mostly small single writes. Also I'm expriencing random read delays, which i suspect the filled native queue. On all nodes, cpu usage is less than 20 percent, and there is no problem in

Fwd: changing ip address of all nodes in cluster

2018-07-15 Thread onmstester onmstester
I tested the single node scenario on all nodes iteratively and it worked: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsChangeIp.html Sent using Zoho Mail Forwarded message From : onmstester onmstester To : "user" Date : S

changing ip address of all nodes in cluster

2018-07-15 Thread onmstester onmstester
I need to assign a new ip range to my cluster, What's the procedure? Thanks in advance Sent using Zoho Mail

Re: JMX metric to report number failed WCL ALL

2018-07-23 Thread onmstester onmstester
AM, onmstester onmstester wrote: I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)?  Although i don't care all replicas acknowledge the write, but i consider this as normal

JMX metric to report number failed WCL ALL

2018-07-22 Thread onmstester onmstester
I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)?  Although i don't care all replicas acknowledge the write, but i consider this as normal status of cluster. Sent using Zoho

Fwd: Re: Cassandra crashed with no log

2018-07-22 Thread onmstester onmstester
: Sun, 22 Jul 2018 10:43:38 +0430 Subject : Re: Cassandra crashed with no log Forwarded message Anything in non-Cassandra logs? Dmesg? --  Jeff Jirsa On Jul 21, 2018, at 11:07 PM, onmstester onmstester wrote: Cassandra in one of my nodes, crashed without any error/warning

Cassandra crashed with no log

2018-07-22 Thread onmstester onmstester
Cassandra in one of my nodes, crashed without any error/warning in system/gc/debug log. All jmx metrics is being monitored, last fetched values for heap usage is 50% and for cpu usage is 20%. How can i find the cause of crash? Sent using Zoho Mail

New cluster vs Increasing nodes to already existed cluster

2018-07-16 Thread onmstester onmstester
Currently i have a cluster with 10 nodes dedicated to one keyspace (Hardware sizing been done according to input rate and ttl just for current application requirements). I need a launch a new application with new keyspace with another set of servers (8 nodes), there is no relation between the

node replacement failed

2018-09-09 Thread onmstester onmstester
Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks, tried to replace the nodes with same ip using this:

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread onmstester onmstester
Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 nodes i should not worry about data distribution? Sent using Zoho Mail On Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa wrote Virtual nodes accomplish two primary goals 1) it makes it easier to gradually

Cluster CPU usage limit

2018-09-06 Thread onmstester onmstester
IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, what CPU usage percent (avg among all cluster nodes) should be determined as limit?   Rephrase: what's the normal CPU usage in Cassandra cluster (while no compaction, streaming or heavy-read

Re: node replacement failed

2018-09-10 Thread onmstester onmstester
Any idea? Sent using Zoho Mail On Sun, 09 Sep 2018 11:23:17 +0430  onmstester onmstester wrote Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks

Re: node replacement failed

2018-09-10 Thread onmstester onmstester
*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le lun. 10 sept. 2018 à  09:09, onmstester onmstester a écrit : Any idea? Sent using Zoho Mail On Sun, 09 Sep 2018 11:23:17

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-07 Thread onmstester onmstester
Why not setting default vnodes count to that recommendation in Cassandra installation files?  Sent using Zoho Mail On Tue, 04 Sep 2018 17:35:54 +0430 Durity, Sean R wrote   Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I

Fwd: Re: Cluster CPU usage limit

2018-09-07 Thread onmstester onmstester
not enough), but every advice I've seen is for a lower write thread count being optimal for most cases. On Thu, Sep 6, 2018 at 5:51 AM, onmstester onmstester wrote: IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, what CPU usage percent (avg among a

adding a non-used column just to debug ttl

2018-07-07 Thread onmstester onmstester
Hi, Because of "Cannot use selection function ttl on PRIMARY KEY part type", i'm adding a boolean column to table with no non-primary key columns, i'm just worried about someday i would need debugging on ttl! is this a right approach? anyone else is doing this? Sent using Zoho Mail

Compaction out of memory

2018-07-12 Thread onmstester onmstester
Cassandra crashed in Two out of 10 nodes in my cluster within 1 day, the error is: ERROR [CompactionExecutor:3389] 2018-07-10 11:27:58,857 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:3389,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed  

backup/restore cassandra data

2018-03-07 Thread onmstester onmstester
Would it be possible to copy/paste Cassandra data directory from one of nodes (which Its OS partition corrupted) and use it in a fresh Cassandra node? I've used rf=1 so that's my only chance! Sent using Zoho Mail

Re: backup/restore cassandra data

2018-03-08 Thread onmstester onmstester
#ops_backup_snapshot_restore_t Cheers Ben On Thu, 8 Mar 2018 at 17:07 onmstester onmstester onmstes...@zoho.com wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA

yet another benchmark bottleneck

2018-03-11 Thread onmstester onmstester
I'm going to benchmark Cassandra's write throughput on a node with following spec: CPU: 20 Cores Memory: 128 GB (32 GB as Cassandra heap) Disk: 3 seprate disk for OS, data and commitlog Network: 10 Gb (test it with iperf) Os: Ubuntu 16 Running Cassandra-stress: cassandra-stress write

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
? On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester onmstes...@zoho.com wrote: I'm going to benchmark Cassandra's write throughput on a node with following spec: CPU: 20 Cores Memory: 128 GB (32 GB as Cassandra heap) Disk: 3 seprate disk for OS, data and commitlog Network: 10 Gb

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
Mail On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester onmstes...@zoho.com wrote Apache-cassandra-3.11.1 Yes, i'm dosing a single host test Sent using Zoho Mail On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com wrote Would help

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
writes: 32 concurrent_counter_writes: 32 Jumping directly to 160 would be a bit high with spinning disks, maybe start with 64 just to see if it gets better. -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 12:08 PM To:

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
? -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 12:50 PM To: user user@cassandra.apache.org Subject: RE: yet another benchmark bottleneck no luck even with 320 threads for write Sent using Zoho Mail

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
2018 14:25:12 +0330 Jacques-Henri Berthemet jacques-henri.berthe...@genesys.com wrote Any errors/warning in Cassandra logs? What’s your RF? Using 300MB/s of network bandwidth for only 130 op/s looks very high. -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 10:48 AM To: user user@cassandra.apache.org Subject: Re: yet another benchmark bottleneck Running two instance of Apache Cassandra on same server, each having their own commit log disk

Re: yet another benchmark bottleneck

2018-03-13 Thread onmstester onmstester
-path at least that prevent scaling with high CPU core count. - Micke On 03/12/2018 03:14 PM, onmstester onmstester wrote: I mentioned that already tested increasing client threads + many stress-client instances in one node + two stress-client in two separate nodes, in all

Re: data types storage saving

2018-03-10 Thread onmstester onmstester
could i calculate disk usage approximately(without inserting actual data)? Sent using Zoho Mail On Sat, 10 Mar 2018 11:21:44 +0330 onmstester onmstester onmstes...@zoho.com wrote I've find out that blobs has no gain in storage saving! I had some 16 digit number which been saved

Re: data types storage saving

2018-03-09 Thread onmstester onmstester
...@smartthings.com wrote If you're willing to do the data type conversion in insert and retrieval, the you could use blobs as a sort of "adaptive length int" AFAIK On Tue, Mar 6, 2018 at 6:02 AM, onmstester onmstester onmstes...@zoho.com wrote: I'm using int data type for

cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
Running this command: nodetools cfhistograms keyspace1 table1 throws this exception in production server: javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Table,keyspace=keyspace1,scope=table1,name=EstimatePartitionSizeHistogram But i have no problem in a test

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
On Mar 6, 2018, at 3:29 AM, onmstester onmstester onmstes...@zoho.com wrote: Running this command: nodetools cfhistograms keyspace1 table1 throws this exception in production server: javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Table,keyspace=keyspace1

Re: write latency on single partition table

2018-04-06 Thread onmstester onmstester
, 9:45 AM onmstester onmstester onmstes...@zoho.com wrote: I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to group by some 500K

write latency on single partition table

2018-04-06 Thread onmstester onmstester
I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to group by some 500K records in the partition for a report with expected response time in

Re: Can I sort it as a result of group by?

2018-04-10 Thread onmstester onmstester
I'm using apache spark on top of cassandra for such cases Sent using Zoho Mail On Mon, 09 Apr 2018 18:00:33 +0430 DuyHai Doan doanduy...@gmail.com wrote No, sorting by column other than clustering column is not possible On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim

copy from one table to another

2018-04-08 Thread onmstester onmstester
Is there any way to copy some part of a table to another table in cassandra? A large amount of data should be copied so i don't want to fetch data to client and stream it back to cassandra using cql. Sent using Zoho Mail

Re: copy from one table to another

2018-04-08 Thread onmstester onmstester
art target node or run nodetool refresh Sent from my iPhone On Apr 8, 2018, at 4:15 AM, onmstester onmstester onmstes...@zoho.com wrote: Is there any way to copy some part of a table to another table in cassandra? A large amount of data should be copied so i don't want to fetch data

A Cassandra Storage Estimation Mechanism

2018-04-17 Thread onmstester onmstester
I was going to estimate Hardware requirements for a project which mainly uses Apache Cassandra. Because of rule "Cassandra nodes size better be 2 TB", the total disk usage determines number of nodes, and in most cases the result of this calculation would be so OK for satisfying the required

Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
:23 onmstester onmstester onmstes...@zoho.com wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain

Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
and your Cassandra cluster doesn’t sound terribly stressed then there is room to increase threads on the client to up throughput (unless your bottlenecked on IO or something)? On Sun, 18 Mar 2018 at 20:27 onmstester onmstester onmstes...@zoho.com wrote: -- Ben Slater Chief Product

Cassandra client tuning

2018-03-18 Thread onmstester onmstester
I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs: maxConnectionsPerHost = 5 maxRequestsPerHost = 32K maxAsyncQueue at client side = 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% and

cassandra spark-connector-sqlcontext too many tasks

2018-03-17 Thread onmstester onmstester
I'm querying a single cassandra partition using sqlContext and Its temView which creates more than 2000 tasks on spark and took about 360 seconds: sqlContext.read().format("org.apache.spark.sql.cassandra).options(ops).load.createOrReplaceTempView("tableName") But using

Re: Cassandra on high performance machine: virtualization vs Docker

2018-02-28 Thread onmstester onmstester
8144 9872 On Tue, Feb 27, 2018 at 8:26 PM, onmstester onmstester onmstes...@zoho.com wrote: What i've got to set up my Apache Cassandra cluster are some Servers with 20 Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk. Just read all over the web: Do not use big nodes for yo

Cassandra on high performance machine: virtualization vs Docker

2018-02-27 Thread onmstester onmstester
What i've got to set up my Apache Cassandra cluster are some Servers with 20 Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk. Just read all over the web: Do not use big nodes for your cluster, i'm convinced to run multiple nodes on a single physical server. So the question is which

data types storage saving

2018-03-06 Thread onmstester onmstester
I'm using int data type for one of my columns but for 99.99...% its data never would be 65K, Should i change it to smallint (It would save some Gigabytes disks in a few months) or Cassandra Compression would take care of it in storage? What about blob data type ? Isn't better to use it in

How to validate if network infrastructure is efficient for Cassandra cluster?

2018-10-21 Thread onmstester onmstester
Currently, before launching the production cluster, i run 'iperf -s' on half of the cluster and then run 'iperf -c $nextIP' on the other half using parallel ssh, So simultaneously all cluster's nodes are connecting together (paired) and then examining the result of iperfs, doing the math that

Re: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
What takes the most CPU? System or User?  most of it is used by  org.apache.cassandra.util.coalesceInternal and SepWorker.run Did you try removing a problematic node and installing a brand new one (instead of re-adding)? I did not install a new node, but did remove the problematic node and CPU

Fwd: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
or if the load your application is producing exceeds what your cluster can handle (needs more nodes). Chris On Oct 20, 2018, at 5:18 AM, onmstester onmstester wrote: 3 nodes in my cluster have 100% cpu usage and most of it is used by org.apache.cassandra.util.coalesceInternal and SepWorker.run

Fwd: Re: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
Any cron or other scheduler running on those nodes? no Lots of Java processes running simultaneously? no, just Apache Cassandra Heavy repair continuously running? none Lots of pending compactions? none, the cpu goes to 100% on first seconds of insert (write load) so no memtable flushed yet,  Is

Fwd: A quick question on unlogged batch

2018-11-01 Thread onmstester onmstester
Read this: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html Please use batch (any type of batch) for statements that only concerns a single partition, otherwise it cause a lot of performance degradation on your cluster and after a while throughput would be alot less than

Multiple cluster for a single application

2018-11-05 Thread onmstester onmstester
Hi, One of my applications requires to create a cluster with more than 100 nodes, I've read documents recommended to use clusters with less than 50 or 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each). Is it a good idea to use multiple clusters for a single application,

Fwd: Re: How to set num tokens on live node

2018-11-02 Thread onmstester onmstester
IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a repair on each node would move your data to the other dc Sent using Zoho Mail Forwarded message From : Goutham reddy To

Fwd: Re: Re: How to set num tokens on live node

2018-11-02 Thread onmstester onmstester
I am facing. Any comments. Thanks and Regards, Goutham On Fri, Nov 2, 2018 at 1:08 AM onmstester onmstester wrote: -- Regards Goutham Reddy IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a repair

Fwd: Re: A quick question on unlogged batch

2018-11-02 Thread onmstester onmstester
unlogged batch meaningfully outperforms parallel execution of individual statements, especially at scale, and creates lower memory pressure on both the clients and cluster.  They do outperform parallel individuals, but in cost of higher pressure on coordinators which leads to more blocked

High CPU usage on some of the nodes due to message coalesce

2018-10-20 Thread onmstester onmstester
3 nodes in my cluster have 100% cpu usage and most of it is used by  org.apache.cassandra.util.coalesceInternal and SepWorker.run? The most active threads are the messaging-service-incomming. Other nodes are normal, having 30 nodes, using Rack Aware strategy. with 10 rack each having 3 nodes.

Fwd: Re: Multiple cluster for a single application

2018-11-08 Thread onmstester onmstester
Thank you all, Actually, "the documents" i mentioned in my question, was a talk in youtube seen long time ago and could not find it. Also noticing that a lot of companies like Netflix built hundreds of Clusters each having 10s of nodes and saying that its much stable, i just concluded that big

how to configure the Token Allocation Algorithm

2018-09-30 Thread onmstester onmstester
Since i failed to find a document on how to configure and use the Token Allocation Algorithm (to replace the random Algorithm), just wanted to be sure about the procedure i've done: 1. Using Apache Cassandra 3.11.2 2. Configured one of seed nodes with num_tokens=8 and started it. 3. Using Cqlsh

  1   2   >