Re: Mutation of bytes is too large for the maxiumum size of
Soumya,Thanks for the suggestion and yes enabling debugging is an option but it is very tedious and often the chatter clutters and hard to debug. Naidu Saladi On Tuesday, September 18, 2018 5:04 PM, Soumya Jena wrote: The client should notice this on their side . If you want to see on the server log one idea may be is to enable the debug mode .You can set it specifically for org.apache.cassandra.transportSomething like nodetool setlogginglevel org.apache.cassandra.transport DEBUGIf you are lucky enough :) (i.e. not too much chatter around the same time) , you should see the query just before that WARN message appears in log . You can turn off the debugging once you get the info. Good luck !! On Mon, Sep 17, 2018 at 9:06 PM Saladi Naidu wrote: Any clues on this topic? Naidu Saladi On Thursday, September 6, 2018 9:41 AM, Saladi Naidu wrote: We are receiving following error 9140- at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.10.jar:3.0.10] 9141- at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]9142:WARN [SharedPool-Worker-1] 2018-09-06 14:29:46,071 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}9143-java.lang.IllegalArgumentException: Mutation of 16777251 bytes is too large for the maximum size of 167772169144- at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) ~[apache-cassandra-3.0.10.jar:3.0.10] I found following link that explained the cause By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.To elaborate; up to two 32MB segments will fit into 64MB, however 40MB will only fit once leaving a larger amount of unused space. "I would like to find what table/column family this write/mutation is causing this error so that I can reach out to right application team, log does not provide any details regarding the mutation at all, is there a way to find that out Mutation of bytes is too large for the maxiumum size of | | | | || | | | | | Mutation of bytes is too large for the maxiumum size of Summary Apache Cassandra will discard mutations larger than a predetermined size. This note addresses why this h... | | | | Naidu Saladi
Re: Mutation of bytes is too large for the maxiumum size of
Any clues on this topic? Naidu Saladi On Thursday, September 6, 2018 9:41 AM, Saladi Naidu wrote: We are receiving following error 9140- at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.10.jar:3.0.10] 9141- at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]9142:WARN [SharedPool-Worker-1] 2018-09-06 14:29:46,071 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}9143-java.lang.IllegalArgumentException: Mutation of 16777251 bytes is too large for the maximum size of 167772169144- at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) ~[apache-cassandra-3.0.10.jar:3.0.10] I found following link that explained the cause By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.To elaborate; up to two 32MB segments will fit into 64MB, however 40MB will only fit once leaving a larger amount of unused space. "I would like to find what table/column family this write/mutation is causing this error so that I can reach out to right application team, log does not provide any details regarding the mutation at all, is there a way to find that out Mutation of bytes is too large for the maxiumum size of | | | | || | | | | | Mutation of bytes is too large for the maxiumum size of Summary Apache Cassandra will discard mutations larger than a predetermined size. This note addresses why this h... | | | | Naidu Saladi
Mutation of bytes is too large for the maxiumum size of
We are receiving following error 9140- at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.10.jar:3.0.10] 9141- at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] 9142:WARN [SharedPool-Worker-1] 2018-09-06 14:29:46,071 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {} 9143-java.lang.IllegalArgumentException: Mutation of 16777251 bytes is too large for the maximum size of 16777216 9144- at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) ~[apache-cassandra-3.0.10.jar:3.0.10] I found following link that explained the cause By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.To elaborate; up to two 32MB segments will fit into 64MB, however 40MB will only fit once leaving a larger amount of unused space. "I would like to find what table/column family this write/mutation is causing this error so that I can reach out to right application team, log does not provide any details regarding the mutation at all, is there a way to find that out Mutation of bytes is too large for the maxiumum size of | | | | || | | | | | Mutation of bytes is too large for the maxiumum size of Summary Apache Cassandra will discard mutations larger than a predetermined size. This note addresses why this h... | | | | Naidu Saladi
Re: Write Time of a Row in Multi DC Cassandra Cluster
Simon,Trace would be significant burden on the cluster and it has to be on all the time. I am trying to find a way to know when a row is written on demand basis, is there a way to determine that? Naidu Saladi On Tuesday, July 10, 2018 2:24 AM, Simon Fontana Oscarsson wrote: Have you tried trace? -- SIMON FONTANA OSCARSSON Software Developer Ericsson Ölandsgatan 1 37133 Karlskrona, Sweden simon.fontana.oscars...@ericsson.com www.ericsson.com On mån, 2018-07-09 at 19:30 +, Saladi Naidu wrote: > Cassandra is an eventual consistent DB, how to find when a row is actually > written in multi DC environment? Here is the problem I am trying to solve > > - I have multi DC (3 DC's) Cassandra cluster/ring - One of the application > wrote a row to DC1(using Local Quorum) and within span of 50 ms, it tried to > read same row from DC2 and could not find the > row. Our both DC's have sub milli second latency at network level, usually <2 > ms. We promised 20 ms consistency. In this case Application could not find > the row in DC2 in 50 ms > > I tried to use "select WRITETIME(authorizations_json) from > token_authorizations where " to find when the Row is written in each DC, > but both DC's returned same Timestamp. After further research > I found that Client V3 onwards Timestamp is supplied at Client level so > WRITETIME does not help > "https://docs.datastax.com/en/developer/java-driver/3.4/manual/query_timestamps/; > > So how to determine when the row is actually written in each DC? > > > Naidu Saladi
Re: Write Time of a Row in Multi DC Cassandra Cluster
Alain,Thanks for the response and I completely agree with you about your approach but there is a small caveat, we have another DC in Europe, right now this keyspace is not replicating there but eventually will be added. EU DC has significant latency of 200 ms RTT, so going with EACH_QUORUM would not be feasible. We can reset the SLA's for consistency but my question is how to determine when was the row written to remote DC? Is there anyway to determine that Naidu Saladi On Tuesday, July 10, 2018 8:56 AM, Alain RODRIGUEZ wrote: Hello, I have multi DC (3 DC's) Cassandra cluster/ring - One of the application wrote a row to DC1(using Local Quorum) and within span of 50 ms, it tried to read same row from DC2 and could not find the row. [...] So how to determine when the row is actually written in each DC? To me, this guarantee you try to achieve could obtained using 'EACH_QUORUM' for writes (ie 'local_quorum' on each DC), and 'LOCAL_QUORUM' for reads for example. You would then have a strong consistency, as long as the same client application is running write then read or that it sends a trigger for the second call sequentially, after validating the write, in some way. Our both DC's have sub milli second latency at network level, usually <2 ms. We promised 20 ms consistency. In this case Application could not find the row in DC2 in 50 ms In these conditions, using 'EACH_QUORUM' might not be too much of a burden for the coordinator and the client. The writes are already being processed, this would increase the latency at the coordinator level (and thus at the client level), but you would be sure that all the clusters have the row in a majority of the replicas before triggering the read. C*heers,---Alain Rodriguez - @arodream - alain@thelastpickle.comFrance / Spain The Last Pickle - Apache Cassandra Consultinghttp://www.thelastpickle.com 2018-07-10 8:24 GMT+01:00 Simon Fontana Oscarsson : Have you tried trace? -- SIMON FONTANA OSCARSSON Software Developer Ericsson Ölandsgatan 1 37133 Karlskrona, Sweden simon.fontana.oscarsson@ ericsson.com www.ericsson.com On mån, 2018-07-09 at 19:30 +0000, Saladi Naidu wrote: > Cassandra is an eventual consistent DB, how to find when a row is actually > written in multi DC environment? Here is the problem I am trying to solve > > - I have multi DC (3 DC's) Cassandra cluster/ring - One of the application > wrote a row to DC1(using Local Quorum) and within span of 50 ms, it tried to > read same row from DC2 and could not find the > row. Our both DC's have sub milli second latency at network level, usually <2 > ms. We promised 20 ms consistency. In this case Application could not find > the row in DC2 in 50 ms > > I tried to use "select WRITETIME(authorizations_json) from > token_authorizations where " to find when the Row is written in each DC, > but both DC's returned same Timestamp. After further research > I found that Client V3 onwards Timestamp is supplied at Client level so > WRITETIME does not help "https://docs.datastax.com/en/ > developer/java-driver/3.4/ manual/query_timestamps/" > > So how to determine when the row is actually written in each DC? > > > Naidu Saladi
Write Time of a Row in Multi DC Cassandra Cluster
Cassandra is an eventual consistent DB, how to find when a row is actually written in multi DC environment? Here is the problem I am trying to solve - I have multi DC (3 DC's) Cassandra cluster/ring - One of the application wrote a row to DC1(using Local Quorum) and within span of 50 ms, it tried to read same row from DC2 and could not find the row. Our both DC's have sub milli second latency at network level, usually <2 ms. We promised 20 ms consistency. In this case Application could not find the row in DC2 in 50 ms I tried to use "selectWRITETIME(authorizations_json) from token_authorizations where " to find when the Row is written in each DC, but both DC's returned same Timestamp. After further research I found that Client V3 onwards Timestamp is supplied at Client level so WRITETIME does not help "https://docs.datastax.com/en/developer/java-driver/3.4/manual/query_timestamps/; So how to determine when the row is actually written in each DC? Naidu Saladi
Re: Partition Key - Wide rows?
It depends on Partition/Primary key design. In order to execute all 3 queries, Partition Key is Org id and others are Clustering keys. if there are many org's it will be ok, but if it is one org then a single partition will hold all the data and its not good Naidu Saladi On Thursday, October 6, 2016 12:14 PM, Ali Akhtarwrote: Thanks, Phil. 1- In my use-case, its probably okay to partition all the org data together. This is for a b2b enterprise SaaS application, the customers will be organizations. So it is probably okay to store each org's data next to each other, right? 2- I'm thinking of having the primary key be: (org_id, team_id, project_id, issue_id). In the above case, will there be a skinny row per issue, or a wide row per org / team / project? 3- Just to double check, with the above primary key, can I still query using just the org_id, org + team id, and org + team + project id? 4- If I wanted to refer to a particular issue, it looks like I'd need to send all 4 parameters. That may be problematic. Is there a better way of modeling this data? On Thu, Oct 6, 2016 at 9:30 PM, Philip Persad wrote: 1) No. Your first 3 queries will work but not the last one (get issue by id). In Cassandra when you query you must include every preceding portion of the primary key. 2) 64 bytes (16 * 4), or somewhat more if storing as strings? I don't think that's something I'd worry too much about. 3) Depends on how you build your partition key. If partition key is (org id), then you get one partition per org (probably bad depending on your dataset). If partition key is (org id, team id, project id) then you will have one partition per project which is probably fine ( again, depending on your dataset). Cheers, -PhilFrom: Ali Akhtar Sent: 2016-10-06 9:04 AM To: user@cassandra.apache.org Subject: Partition Key - Wide rows? Heya, I'm designing some tables, where data needs to be stored in the following hierarchy: Organization -> Team -> Project -> Issues I need to be able to retrieve issues: - For the whole org - using org id- For a team (org id + team id)- For a project (org id + team id + project id)- If possible, by using just the issue id I'm considering using all 4 ids as the primary key. The first 3 will use UUIDs, except issue id which will be an alphanumeric string, unique per project. 1) Will this setup allow using all 4 query scenarios?2) Will this make the primary key really long, 3 UUIDs + similar length'd issue id?3) Will this store issues as skinny rows, or wide rows? If an org has a lot of teams, which have a lot of projects, which have a lot of issues, etc, could I have issues w/ running out of the column limit of wide rows?4) Is there a better way of achieving this scenario?
Re: system_distributed.repair_history table
Thanks for the response. It makes sense to periodically truncate as it is only for debugging purposes Naidu Saladi On Wednesday, October 5, 2016 8:03 PM, Chris Lohfink <clohfin...@gmail.com> wrote: The only current solution is to truncate it periodically. I opened https://issues.apache.org/jira/browse/CASSANDRA-12701 about it if interested in following On Wed, Oct 5, 2016 at 4:23 PM, Saladi Naidu <naidusp2...@yahoo.com> wrote: We are seeing following warnings in system.log, As compaction_large_ partition_warning_threshold_mb in cassandra.yaml file is as default value 100, we are seeing these warnings 110:WARN [CompactionExecutor:91798] 2016-10-05 00:54:05,554 BigTableWriter.java:184 - Writing large partition system_distributed/repair_ history:gccatmer:mer_admin_job (115943239 bytes) 111:WARN [CompactionExecutor:91798] 2016-10-05 00:54:13,303 BigTableWriter.java:184 - Writing large partition system_distributed/repair_ history:gcconfigsrvcks:user_ activation (163926097 bytes) When I looked at the table definition it is partitioned by keyspace and cloumnfamily, under this partition, repair history is maintained. When I looked at the count of rows in this partition, most of the paritions have >200,000 rows and these will keep growing because of the partition strategy right. There is no TTL on this so any idea what is the solution for reducing partition size. I also looked at size_estimates table for this column family and found that the mean partition size for each range is 50,610,179 which is very large compared to any other tables.
system_distributed.repair_history table
We are seeing following warnings in system.log, As compaction_large_partition_warning_threshold_mb in cassandra.yaml file is as default value 100, we are seeing these warnings 110:WARN [CompactionExecutor:91798] 2016-10-05 00:54:05,554 BigTableWriter.java:184 - Writing large partition system_distributed/repair_history:gccatmer:mer_admin_job (115943239 bytes) 111:WARN [CompactionExecutor:91798] 2016-10-05 00:54:13,303 BigTableWriter.java:184 - Writing large partition system_distributed/repair_history:gcconfigsrvcks:user_activation (163926097 bytes) When I looked at the table definition it is partitioned by keyspace and cloumnfamily, under this partition, repair history is maintained. When I looked at the count of rows in this partition, most of the paritions have >200,000 rows and these will keep growing because of the partition strategy right. There is no TTL on this so any idea what is the solution for reducing partition size. I also looked at size_estimates table for this column family and found that the mean partition size for each range is 50,610,179 which is very large compared to any other tables.
Re: Many keyspaces pattern
I can think of following features to solve 1. If you know the time period of after how long data should be removed then use TTL feature2. Use Time Series to model the data and use inverted index to query the data by time period? Naidu Saladi On Tuesday, November 24, 2015 6:49 AM, Jack Krupanskywrote: How often is sometimes - closer to 20% of the batches or 2%? How are you querying batches, both current and older ones? As always, your queries should drive your data models. If deleting a batch is very infrequent, maybe best to not do it and simply have logic in the app to ignore deleted batches - if your queries would reference them at all. What reasons would you have to delete a batch? Depending on the nature of the reason there may be an alternative. Make sure your cluster is adequately provisioned so that these expensive operations can occur in parallel to reduce their time and resources per node. Do all batches eventually get aged and deleted or are you expecting that most batches will live for many years to come? Have you planned for how you will grow the cluster over time? Maybe bite the bullet and use a background process to delete a batch if deletion is competing too heavily with query access - if they really need to be deleted at all. Number of keyspaces - and/or tables - should be limited to "low hundreds", and even then you are limited by RAM and CPU of each node. If a keyspace has 14 tables, then 250/14 = 20 would be a recommended upper limit for number of key spaces. Even if your total number of tables was under 300 or even 200, you would need to do a proof of concept implementation to verify that your specific data works well on your specific hardware. -- Jack Krupansky On Tue, Nov 24, 2015 at 5:05 AM, Jonathan Ballet wrote: Hi, we are running an application which produces every night a batch with several hundreds of Gigabytes of data. Once a batch has been computed, it is never modified (nor updates nor deletes), we just keep producing new batches every day. Now, we are *sometimes* interested to remove a complete specific batch altogether. At the moment, we are accumulating all these data into only one keyspace which has a batch ID column in all our tables which is also part of the primary key. A sample table looks similar to this: CREATE TABLE computation_results ( batch_id int, id1 int, id2 int, value double, PRIMARY KEY ((batch_id, id1), id2) ) WITH CLUSTERING ORDER BY (id2 ASC); But we found out it is very difficult to remove a specific batch as we need to know all the IDs to delete the entries and it's both time and resource consuming (ie. it takes a long time and I'm not sure it's going to scale at all.) So, we are currently looking into having each of our batches in a keyspace of their own so removing a batch is merely equivalent to delete a keyspace. Potentially, it means we will end up having several hundreds of keyspaces in one cluster, although most of the time only the very last one will be used (we might still want to access the older ones, but that would be a very seldom use-case.) At the moment, the keyspace has about 14 tables and is probably not going to evolve much. Are there any counter-indications of using lot of keyspaces (300+) into one Cassandra cluster? Are there any good practices that we should follow? After reading the "Anti-patterns in Cassandra > Too many keyspaces or tables", does it mean we should plan ahead to already split our keyspace among several clusters? Finally, would there be any other way to achieve what we want to do? Thanks for your help! Jonathan
Collections (MAP) data in Column Family
We are running Apache Cassandra 2.1.9. In one of our Column Family, we have MAP type column. We are seeing unusual data size of the column family (SSTables) with few 1000's of rows, while debugging, I looked at one of the SSTable and I see some unusual data in it Below is JSON of one Row Key data 1. There is usual Column name, Key-Value pair and TS for the MAP - all_products column name 2. After Key Value pair, I see Cluster Column style data in MAP with a "t" marker in between, this is literally repeated millions of cells - all_products:_","all_products:!",1442797965371999,"t",1442797965 Any clues on what is happening here? I know "d" marker for marked for delete, "e" marker for TTL but dont know what "t" marker is for? [{"key": "55736100", "cells": [["","",1444101633184000], ["active","false",1444101633184000], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1443410022982999,"t",1443410022], ["all_products:_","all_products:!",1443410595224999,"t",1443410595], ["all_products:_","all_products:!",1443679978903999,"t",1443679978], ["all_products:_","all_products:!",1444011801906999,"t",1444011801], ["all_products:_","all_products:!",1444101633183999,"t",1444101633], ["all_products:3135393730323130305f63735f435a","313539373032313030",1444101633184000], ["all_products:3135393730323130305f64655f4154","313539373032313030",1444101633184000], ["all_products:3135393730323130305f64655f4348","313539373032313030",1444101633184000], ["all_products:3135393730323130305f64655f4445","313539373032313030",1444101633184000], ["all_products:3135393730323130305f656e5f4348","313539373032313030",1444101633184000], .["all_products:3233393238333430305f69745f4348","323339323833343030",1444101633184000], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442797965371999,"t",1442797965], ["all_products:_","all_products:!",1442806687091999,"t",1442806687], Naidu Saladi
Re: LTCS Strategy Resulting in multiple SSTables
Nate,Yes we are in process of upgrading to 2.1.9. Meanwhile I am looking for correcting the problem, do you know any recovery options to reduce the number of SS Tables. As SStbales are keep on increasing, the read performance is deteriorating Naidu Saladi From: Nate McCall <n...@thelastpickle.com> To: Cassandra Users <user@cassandra.apache.org>; Saladi Naidu <naidusp2...@yahoo.com> Sent: Tuesday, September 15, 2015 4:53 PM Subject: Re: LTCS Strategy Resulting in multiple SSTables That's an early 2.1/known buggy version. There have been several issues fixed since which could cause that behavior. Most likely https://issues.apache.org/jira/browse/CASSANDRA-9592 ? Upgrade to 2.1.9 and see if the problem persists. On Tue, Sep 15, 2015 at 8:31 AM, Saladi Naidu <naidusp2...@yahoo.com> wrote: We are on 2.1.2 and planning to upgrade to 2.1.9 Naidu Saladi From: Marcus Eriksson <krum...@gmail.com> To: user@cassandra.apache.org; Saladi Naidu <naidusp2...@yahoo.com> Sent: Tuesday, September 15, 2015 1:53 AM Subject: Re: LTCS Strategy Resulting in multiple SSTables if you are on Cassandra 2.2, it is probably this: https://issues.apache.org/jira/browse/CASSANDRA-10270 On Tue, Sep 15, 2015 at 4:37 AM, Saladi Naidu <naidusp2...@yahoo.com> wrote: We are using Level Tiered Compaction Strategy on a Column Family. Below are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0 whereas one node just has 1 SSTable in L0. In the node where there are multiple SStables, all of them are small size and created same time stamp. We ran Compaction, it did not result in much change, node remained with huge number of SStables. Due to this large number of SSTables, Read performance is being impacted In same cluster, under same keyspace, we are observing this discrepancy in other column families as well. What is going wrong? What is the solution to fix this ---NODE1--- Table: category_ranking_dedup SSTable count: 1 SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 2012037 Space used (total): 2012037 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.07677216119569073 Memtable cell count: 990 Memtable data size: 32082 Memtable switch count: 11 Local read count: 2842 Local read latency: 3.215 ms Local write count: 18309 Local write latency: 5.008 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 816 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 22844 Average live cells per slice (last five minutes): 338.84588318085855 Maximum live cells per slice (last five minutes): 10002.0 Average tombstones per slice (last five minutes): 36.53307529908515 Maximum tombstones per slice (last five minutes): 36895.0 NODE2--- Table: category_ranking_dedup SSTable count: 808 SSTables in each level: [808/4, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 291641980 Space used (total): 291641980 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.1431106696818256 Memtable cell count: 4365293 Memtable data size: 3742375 Memtable switch count: 44 Local read count: 2061 Local read latency: 31.983 ms Local write count: 30096 Local write latency: 27.449 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 54544 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 634491 Averag
Re: LTCS Strategy Resulting in multiple SSTables
We are on 2.1.2 and planning to upgrade to 2.1.9 Naidu Saladi From: Marcus Eriksson <krum...@gmail.com> To: user@cassandra.apache.org; Saladi Naidu <naidusp2...@yahoo.com> Sent: Tuesday, September 15, 2015 1:53 AM Subject: Re: LTCS Strategy Resulting in multiple SSTables if you are on Cassandra 2.2, it is probably this: https://issues.apache.org/jira/browse/CASSANDRA-10270 On Tue, Sep 15, 2015 at 4:37 AM, Saladi Naidu <naidusp2...@yahoo.com> wrote: We are using Level Tiered Compaction Strategy on a Column Family. Below are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0 whereas one node just has 1 SSTable in L0. In the node where there are multiple SStables, all of them are small size and created same time stamp. We ran Compaction, it did not result in much change, node remained with huge number of SStables. Due to this large number of SSTables, Read performance is being impacted In same cluster, under same keyspace, we are observing this discrepancy in other column families as well. What is going wrong? What is the solution to fix this ---NODE1--- Table: category_ranking_dedup SSTable count: 1 SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 2012037 Space used (total): 2012037 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.07677216119569073 Memtable cell count: 990 Memtable data size: 32082 Memtable switch count: 11 Local read count: 2842 Local read latency: 3.215 ms Local write count: 18309 Local write latency: 5.008 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 816 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 22844 Average live cells per slice (last five minutes): 338.84588318085855 Maximum live cells per slice (last five minutes): 10002.0 Average tombstones per slice (last five minutes): 36.53307529908515 Maximum tombstones per slice (last five minutes): 36895.0 NODE2--- Table: category_ranking_dedup SSTable count: 808 SSTables in each level: [808/4, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 291641980 Space used (total): 291641980 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.1431106696818256 Memtable cell count: 4365293 Memtable data size: 3742375 Memtable switch count: 44 Local read count: 2061 Local read latency: 31.983 ms Local write count: 30096 Local write latency: 27.449 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 54544 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 634491 Average live cells per slice (last five minutes): 416.1780688985929 Maximum live cells per slice (last five minutes): 10002.0 Average tombstones per slice (last five minutes): 45.11547792333818 Maximum tombstones per slice (last five minutes): 36895.0 Naidu Saladi
LTCS Strategy Resulting in multiple SSTables
We are using Level Tiered Compaction Strategy on a Column Family. Below are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0 whereas one node just has 1 SSTable in L0. In the node where there are multiple SStables, all of them are small size and created same time stamp. We ran Compaction, it did not result in much change, node remained with huge number of SStables. Due to this large number of SSTables, Read performance is being impacted In same cluster, under same keyspace, we are observing this discrepancy in other column families as well. What is going wrong? What is the solution to fix this ---NODE1--- Table: category_ranking_dedup SSTable count: 1 SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 2012037 Space used (total): 2012037 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.07677216119569073 Memtable cell count: 990 Memtable data size: 32082 Memtable switch count: 11 Local read count: 2842 Local read latency: 3.215 ms Local write count: 18309 Local write latency: 5.008 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 816 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 22844 Average live cells per slice (last five minutes): 338.84588318085855 Maximum live cells per slice (last five minutes): 10002.0 Average tombstones per slice (last five minutes): 36.53307529908515 Maximum tombstones per slice (last five minutes): 36895.0 NODE2--- Table: category_ranking_dedup SSTable count: 808 SSTables in each level: [808/4, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 291641980 Space used (total): 291641980 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.1431106696818256 Memtable cell count: 4365293 Memtable data size: 3742375 Memtable switch count: 44 Local read count: 2061 Local read latency: 31.983 ms Local write count: 30096 Local write latency: 27.449 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 54544 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 25109160 Compacted partition mean bytes: 634491 Average live cells per slice (last five minutes): 416.1780688985929 Maximum live cells per slice (last five minutes): 10002.0 Average tombstones per slice (last five minutes): 45.11547792333818 Maximum tombstones per slice (last five minutes): 36895.0 Naidu Saladi
Data Distribution in Table/Column Family
Is there a way to find out how data is distributed within column family by each node? Nodetool provides how data is distributed across nodes that only shows all the data by node. We are seeing heavy load on one node and I suspect that partitioning is not distributing data equally. But to prove that to development team we need to know the stats for that table Naidu Saladi
Re: DROP Table
Sebastian,Thank you so much for providing detailed explanation. I still have some questions and I need to provide some clarifications 1. We do not have code that is creating the tables dynamically. All DDL operations are done through Datastax DevCenter tool. When you say schema to settle, do you means we provide proper consistency level? I don't think there is a provision to do that in tool. Or I can change the SYSTEM KEYSPACE definition of replication factor equal to number of nodes? 2. In the steps described below for correcting this problem - when you say move data from old directory to new, do you mean move the .db file? It will override the current file right? 3. Do we have to rename the directory name to remove CFID i.e. just column family name without CFID? After that, update the System table as well? Naidu Saladi From: Sebastian Estevez sebastian.este...@datastax.com To: user@cassandra.apache.org; Saladi Naidu naidusp2...@yahoo.com Sent: Friday, July 10, 2015 5:25 PM Subject: Re: DROP Table #1 The cause of this problem is a CREATE TABLE statement collision. Do not generate tables dynamically from multiple clients, even with IF NOT EXISTS. First thing you need to do is fix your code so that this does not happen. Just create your tables manually from cqlsh allowing time for the schema to settle. #2 Here's the fix: 1) Change your code to not automatically re-create tables (even with IF NOT EXISTS). 2) Run a rolling restart to ensure schema matches across nodes. Run nodetool describecluster around your cluster. Check that there is only one schema version. ON EACH NODE:3) Check your filesystem and see if you have two directories for the table in question in the data directory. If THERE ARE TWO OR MORE DIRECTORIES:4)Identify from schema_column_families which cf ID is the new one (currently in use). cqlsh -e select * from system.schema_column_families|grep table name 5) Move the data from the old one to the new one and remove the old directory. 6) If there are multiple old ones repeat 5 for every old directory. 7) run nodetool refresh IF THERE IS ONLY ONE DIRECTORY: No further action is needed. All the best, Sebastián EstévezSolutions Architect | 954 905 8615 | sebastian.este...@datastax.com DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 12:15 PM, Saladi Naidu naidusp2...@yahoo.com wrote: My understanding is that Cassandra File Structure follows below naming convention /cassandra/data/ key-spaces table Whereas our file structure is as below, each table has multiple names and when we drop tables and recreate these directories remain. Also when we dropped the table one node was down, when it came back, we tried to do Nodetool repair and repair kept failing referring to CFID error listed below drwxr-xr-x. 16 cass cass 4096 May 24 06:49 ../drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09application_by_user-e0eec95019a211e58b954ffc8e9bfaa6/drwxr-xr-x. 2 cass cass 4096 Jun 25 10:15application_info-4dba2bf0054f11e58b954ffc8e9bfaa6/drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09application_info-a0ee65d019a311e58b954ffc8e9bfaa6/drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09configproperties-228ea2e0c13811e4aa1d4ffc8e9bfaa6/drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_activation-95d005f019a311e58b954ffc8e9bfaa6/drwxr-xr-x. 3 cass cass 4096 Jun 25 10:16user_app_permission-9fddcd62ffbe11e4a25a45259f96ec68/drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_credential-86cfff1019a311e58b954ffc8e9bfaa6/drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_info-2fa076221b1011e58b954ffc8e9bfaa6/drwxr-xr-x. 2 cass cass 4096 Jun 25 10:15user_info-36028c00054f11e58b954ffc8e9bfaa6/drwxr-xr-x. 3 cass cass 4096 Jun 25 10:15user_info-fe1d7b101a5711e58b954ffc8e9bfaa6/drwxr-xr-x. 4 cass cass 4096 Jun 25 10:16user_role-9ed0ca30ffbe11e4b71d09335ad2d5a9/ WARN [Thread-2579] 2015-07-02 16:02:27,523 IncomingTcpConnection.java:91 -UnknownColumnFamilyException reading from socket; closingorg.apache.cassandra.db.UnknownColumnFamilyException:Couldn't findcfId=218e3c90-1b0e-11e5-a34b-d7c17b3e318a atorg.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97)~[apache-cassandra-2.1.2.jar:2.1.2] atorg.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302
DROP Table
My understanding is that Cassandra File Structure follows below naming convention /cassandra/data/ key-spaces table Whereas our file structure is as below, each table has multiple names and when we drop tables and recreate these directories remain. Also when we dropped the table one node was down, when it came back, we tried to do Nodetool repair and repair kept failing referring to CFID error listed below drwxr-xr-x. 16 cass cass 4096 May 24 06:49 ../ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09application_by_user-e0eec95019a211e58b954ffc8e9bfaa6/ drwxr-xr-x. 2 cass cass 4096 Jun 25 10:15application_info-4dba2bf0054f11e58b954ffc8e9bfaa6/ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09application_info-a0ee65d019a311e58b954ffc8e9bfaa6/ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09configproperties-228ea2e0c13811e4aa1d4ffc8e9bfaa6/ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_activation-95d005f019a311e58b954ffc8e9bfaa6/ drwxr-xr-x. 3 cass cass 4096 Jun 25 10:16user_app_permission-9fddcd62ffbe11e4a25a45259f96ec68/ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_credential-86cfff1019a311e58b954ffc8e9bfaa6/ drwxr-xr-x. 4 cass cass 4096 Jul 2 11:09user_info-2fa076221b1011e58b954ffc8e9bfaa6/ drwxr-xr-x. 2 cass cass 4096 Jun 25 10:15user_info-36028c00054f11e58b954ffc8e9bfaa6/ drwxr-xr-x. 3 cass cass 4096 Jun 25 10:15user_info-fe1d7b101a5711e58b954ffc8e9bfaa6/ drwxr-xr-x. 4 cass cass 4096 Jun 25 10:16user_role-9ed0ca30ffbe11e4b71d09335ad2d5a9/ WARN [Thread-2579] 2015-07-02 16:02:27,523 IncomingTcpConnection.java:91 -UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException:Couldn't findcfId=218e3c90-1b0e-11e5-a34b-d7c17b3e318a atorg.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97)~[apache-cassandra-2.1.2.jar:2.1.2] atorg.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)~[apache-cassandra-2.1.2.jar:2.1.2] atorg.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168)~[apache-cassandra-2.1.2.jar:2.1.2] atorg.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150)~[apache-cassandra-2.1.2.jar:2.1.2] atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82)~[apache-cassandra-2.1.2.jar:2.1.2] Naidu Saladi
Read Repair
Suppose I have a row of existing data with set of values for attributes I call this State1, and issue an update to some columns with Quorum consistency. If the write is succeeded in one node, Node1 and and failed on remaining nodes. As there is no Rollback, Node1 row attributes will remain new state, State2 and rest of the nodes row will have old state, State1. If I do a Read and Cassandra detects state difference, it will issue a Read repair which will result in new state, State2 being propagated to other nodes. But from a application point of view the update never happened because it received an exception. How to handle this kind of a situation? Naidu Saladi
Re: Example Data Modelling
If going by Month as partition key then you need to duplicate the data. I dont think going with name as partition key is good datamodel practice as it will create a hotspot. Also I believe your queries will be mostly by employee not by month. You can create employee id as partition key and month as clustering and keep employee details as static columns so they wont be repeated Naidu Saladi From: Srinivasa T N seen...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Tuesday, July 7, 2015 3:07 AM Subject: Re: Example Data Modelling Thanks for the inputs. Now my question is how should the app populate the duplicate data, i.e., if I have an employee record (along with his FN, LN,..) for the month of Apr and later I am populating the same record for the month of may (with salary changed), should my application first read/fetch the corresponding data for apr and re-insert with modification for month of may? Regards, Seenu. On Tue, Jul 7, 2015 at 11:32 AM, Peer, Oded oded.p...@rsa.com wrote: The data model suggested isn’t optimal for the “end of month” query you want to run since you are not querying by partition key.The query would look like “select EmpID, FN, LN, basic from salaries where month = 1” which requires filtering and has unpredictable performance. For this type of query to be fast you can use the “month” column as the partition key and the “EmpID” and the clustering column.This approach also has drawbacks:1. This data model creates a wide row. Depending on the number of employees this partition might be very large. You should limit partition sizes to 25MB2. Distributing data according to month means that only a small number of nodes will hold all of the salary data for a specific month which might cause hotspots on those nodes. Choose the approach that works best for you. From: Carlos Alonso [mailto:i...@mrcalonso.com] Sent: Monday, July 06, 2015 7:04 PM To: user@cassandra.apache.org Subject: Re: Example Data Modelling Hi Srinivasa, I think you're right, In Cassandra you should favor denormalisation when in RDBMS you find a relationship like this. I'd suggest a cf like thisCREATE TABLE salaries ( EmpID varchar, FN varchar, LN varchar, Phone varchar, Address varchar, month integer, basic integer, flexible_allowance float, PRIMARY KEY(EmpID, month)) That way the salaries will be partitioned by EmpID and clustered by month, which I guess is the natural sorting you want. Hope it helps,Cheers! Carlos Alonso | Software Engineer | @calonso On 6 July 2015 at 13:01, Srinivasa T N seen...@gmail.com wrote:Hi, I have basic doubt: I have an RDBMS with the following two tables: Emp - EmpID, FN, LN, Phone, Address Sal - Month, Empid, Basic, Flexible Allowance My use case is to print the Salary slip at the end of each month and the slip contains emp name and his other details. Now, if I want to have the same in cassandra, I will have a single cf with emp personal details and his salary details. Is this the right approach? Should we have the employee personal details duplicated each month? Regards, Seenu.
Re: Catastrophy Recovery.
Alain great write-up on the recovery procedure. You had covered both RF factor and Consistency levels. As mentioned two anti entropy mechanisms, hinted hand off's and Read Repair work for temporary node outage and incremental recovery. In case of disaster/catastrophic recovery, nodetool repair is best way to recover back. Is below procedure would have ensured node being added properly to the cluster? Adding nodes to an existing cluster | DataStax Cassandra 2.0 Documentation | | | | | | | | | Adding nodes to an existing cluster | DataStax Cassandra 2.0 DocumentationSteps to add nodes when using virtual nodes. | Version 2.0 | | | | View on docs.datastax.com | Preview by Yahoo | | | | | Naidu Saladi From: Jean Tremblay jean.tremb...@zen-innovations.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Monday, June 15, 2015 10:58 AM Subject: Re: Catastrophy Recovery. That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate. Thanks again for you help. On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, it looks like your starting to use Cassandra. Welcome. I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html. When a node lose some data you have various anti entropy mechanism Hinted Handoff -- For writes that occurred while node was down and known as such by other nodes (exclusively)Read repair -- On each read, you can set a chance to check other nodes for auto correction.Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables). Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to local_quorum or quorum on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you. There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results. To answer your questions: For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? -- Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read. After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer. Testing is a really good idea but you also have to read a lot imho. Good luck, C*heers, Alain 2015-06-15 11:13 GMT+02:00 Jean Tremblay jean.tremb...@zen-innovations.com: Hi, I have a cluster of 3 nodes RF: 2. There are about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I am have tested a scenario where one node crashes and loose all its data.I have deleted all data on this node after having stopped Cassandra.At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB. I then restarted that node and I observed that the node was joining the cluster.After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours. When I was querying the DB, the cluster was giving me different results for successive identical queries.I guess the old “defect” node was giving me less rows than it should have. 1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its
Re: Is Table created in all the nodes if the default consistency level used
There are 3 different things we are talking here 1. SimpleStrategy vs. NetworkTopology matter when you have single DC vs. Multiple DC's2. In both cases you can specific replication factor, obviously in SimpleStratgey case you dont mention DC whereas in NetworkTopology, you can mentione mutiple options by each DC's replication requirements.3. Now if your question is referred to single DC then even if your System keyspace is SimpleStartegy and your user table is NetworkToplogy, it should not matter and Table_Test will be created in all nodes4. If your System_auth KS is set less than number of nodes, you will face AUTH issues. Naidu Saladi From: 鄢来琼 laiqiong@gtafe.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Monday, March 16, 2015 2:13 AM Subject: Re: Is Table created in all the nodes if the default consistency level used #yiv5346526530 #yiv5346526530 -- _filtered #yiv5346526530 {font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv5346526530 {font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv5346526530 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv5346526530 {panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv5346526530 {panose-1:3 15 7 2 3 3 2 2 2 4;} _filtered #yiv5346526530 {font-family:Consolas;panose-1:2 11 6 9 2 2 4 3 2 4;}#yiv5346526530 #yiv5346526530 p.yiv5346526530MsoNormal, #yiv5346526530 li.yiv5346526530MsoNormal, #yiv5346526530 div.yiv5346526530MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:宋体;}#yiv5346526530 a:link, #yiv5346526530 span.yiv5346526530MsoHyperlink {color:blue;text-decoration:underline;}#yiv5346526530 a:visited, #yiv5346526530 span.yiv5346526530MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv5346526530 code {font-family:宋体;}#yiv5346526530 pre {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:宋体;}#yiv5346526530 p.yiv5346526530MsoAcetate, #yiv5346526530 li.yiv5346526530MsoAcetate, #yiv5346526530 div.yiv5346526530MsoAcetate {margin:0cm;margin-bottom:.0001pt;font-size:9.0pt;font-family:宋体;}#yiv5346526530 span.yiv5346526530HTMLChar {}#yiv5346526530 span.yiv5346526530EmailStyle21 {color:#1F497D;}#yiv5346526530 span.yiv5346526530Char {font-family:宋体;}#yiv5346526530 .yiv5346526530MsoChpDefault {} _filtered #yiv5346526530 {margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv5346526530 div.yiv5346526530WordSection1 {}#yiv5346526530 Hi Daemeon, Yes, I use “NetworkTopologyStrategy” strategy for “Table_test”, but “System keyspace” is Cassandra internal keyspace, its strategy is localStrategy. So my question is how to guarantee “Table_test” is created in all the nodes before any R/W opertions? Thanks. Peter 发件人: daemeon reiydelle [mailto:daeme...@gmail.com] 发送时间: 2015年3月16日 14:35 收件人: user@cassandra.apache.org 主题: Re: Is Table created in all the nodes if the default consistency level used If you want to guarantee that the data is written to all nodes before the code returns, then yes you have to use consistency all. Otherwise there is a small risk of outdated data being served if a node goes offline longer than hints timeouts. Somewhat looser options that can assure multiple copies are written, as you probably know, are quorum or a hard coded value. This applies to a typical installation with a substantial number of nodes of course, not a small 2-3 node cluster. I am curious why localStrategy when you have such concerns about data consistency that you want to assure all nodes get data written. Can you elaborate on your use case? ... “Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter Thompson Daemeon C.M. Reiydelle USA (+1) 415.501.0198 London (+44) (0) 20 8144 9872 On Sun, Mar 15, 2015 at 8:11 PM, 鄢来琼 laiqiong@gtafe.com wrote: Could you tell me whether the meta data of the new table are build in all the nodes after execute the following statement. cassandra_session.execute_async( “““CREATE TABLE Table_test( ID uuid, Time timestamp, Value double, Date timestamp, PRIMARY KEY ((ID,Date), Time) ) WITH COMPACT STORAGE; ””” ) As I know, the system keyspace is used to store the meta data, but the strategy is localStrategy, which only store meta data of local node. So I want to know whether table is created in all the nodes, should I add consistency_level setting to the above statement to make sure “create table” will be executed in all the nodes? Thanks. Peter