from:"Jonathan Haddad"

Re: good monitoring tool for cassandra

2019-03-14 Thread Jonathan Haddad

I've worked with several teams using DataDog, folks are pretty happy with
it.  We (The Last Pickle) did the dashboards for them:
http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html

Prometheus + Grafana is great if you want to host it yourself.

On Fri, Mar 15, 2019 at 12:45 PM Jeff Jirsa  wrote:

>
> -dev, +user
>
> Datadog worked pretty well last time I used it.
>
>
> --
> Jeff Jirsa
>
>
> > On Mar 14, 2019, at 11:38 PM, Sundaramoorthy, Natarajan <
> natarajan_sundaramoor...@optum.com> wrote:
> >
> > Can someone share knowledge on good monitoring tool for cassandra? Thanks
> >
> > This e-mail, including attachments, may include confidential and/or
> > proprietary information, and may be used only by the person or entity
> > to which it is addressed. If the reader of this e-mail is not the
> intended
> > recipient or his or her authorized agent, the reader is hereby notified
> > that any dissemination, distribution or copying of this e-mail is
> > prohibited. If you have received this e-mail in error, please notify the
> > sender by replying to this message and delete this e-mail immediately.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: To Repair or Not to Repair

2019-03-14 Thread Jonathan Haddad

My coworker Alex (from The Last Pickle) wrote an in depth blog post on
TWCS.  We recommend not running repair on tables that use TWCS.

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

It's enough of a problem that we added a feature into Reaper to
auto-blacklist TWCS / DTCS tables from being repaired, we wrote about it
here: http://thelastpickle.com/blog/2019/02/15/reaper-1_4-released.html

Hope this helps!
Jon

On Fri, Mar 15, 2019 at 9:48 AM Nick Hatfield 
wrote:

> It seems that running a repair works really well, quickly and efficiently
> when repairing a column family that does not use TWCS. Has anyone else had
> a similar experience? Wondering if running TWCS is doing more harm than
> good as it chews up a lot of cpu and for extended periods of time in
> comparison to CF’s with a compaction strategy of STCS
>
>
>
>
>
> Thanks,
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: cassandra upgrades multi-DC in parallel

2019-03-12 Thread Jonathan Haddad

Nothing prevents it technically, but operationally you might not want to.
Personally I’d prefer have the safety net of a dc to fall back on in case
there’s an issue with the upgrade.

On Wed, Mar 13, 2019 at 7:48 AM Carl Mueller
 wrote:

> If there are multiple DCs in a cluster, is it safe to upgrade them in
> parallel, with each DC doing a node-at-a-time?
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Maximum memory usage reached

2019-03-06 Thread Jonathan Haddad

That’s not an error. To the left of the log message is the severity, level
INFO.

Generally, I don’t recommend running Cassandra on only 2GB ram or for small
datasets that can easily fit in memory. Is there a reason why you’re
picking Cassandra for this dataset?

On Thu, Mar 7, 2019 at 8:04 AM Kyrylo Lebediev 
wrote:

> Hi All,
>
>
>
> We have a tiny 3-node cluster
>
> C* version 3.9 (I know 3.11 is better/stable, but can’t upgrade
> immediately)
>
> HEAP_SIZE is 2G
>
> JVM options are default
>
> All setting in cassandra.yaml are default (file_cache_size_in_mb not set)
>
>
>
> Data per node – just ~ 1Gbyte
>
>
>
> We’re getting following errors messages:
>
>
>
> DEBUG [CompactionExecutor:87412] 2019-03-06 11:00:13,545
> CompactionTask.java:150 - Compacting (ed4a4d90-4028-11e9-adc0-230e0d6622df)
> [/cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-23248-big-Data.db:level=0,
> /cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-23247-big-Data.db:level=0,
> /cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-23246-big-Data.db:level=0,
> /cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-23245-big-Data.db:level=0,
> ]
>
> DEBUG [CompactionExecutor:87412] 2019-03-06 11:00:13,582
> CompactionTask.java:230 - Compacted (ed4a4d90-4028-11e9-adc0-230e0d6622df)
> 4 sstables to
> [/cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-23249-big,]
> to level=0.  6.264KiB to 1.485KiB (~23% of original) in 36ms.  Read
> Throughput = 170.754KiB/s, Write Throughput = 40.492KiB/s, Row Throughput =
> ~106/s.  194 total partitions merged to 44.  Partition merge counts were
> {1:18, 4:44, }
>
> INFO  [IndexSummaryManager:1] 2019-03-06 11:00:22,007
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
>
> INFO  [pool-1-thread-1] 2019-03-06 11:11:24,903 NoSpamLogger.java:91 -
> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
>
> INFO  [pool-1-thread-1] 2019-03-06 11:26:24,926 NoSpamLogger.java:91 -
> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
>
> INFO  [pool-1-thread-1] 2019-03-06 11:41:25,010 NoSpamLogger.java:91 -
> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
>
> INFO  [pool-1-thread-1] 2019-03-06 11:56:25,018 NoSpamLogger.java:91 -
> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB
>
>
>
> What’s interesting that “Maximum memory usage reached” messages appears
> each 15 minutes.
>
> Reboot temporary solve the issue, but it then appears again after some time
>
>
>
> Checked, there are no huge partitions (max partition size is ~2Mbytes )
>
>
>
> How such small amount of data may cause this issue?
>
> How to debug this issue further?
>
>
>
>
>
> Regards,
>
> Kyrill
>
>
>
>
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Jonathan Haddad

If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
use DSE Search or Ellesandra, they're two solutions designed to solve
problems that are Cassandra first, search second.

I'd go straight to elastic search for workloads that are primarily search
driven, like you listed above.  The idea of having one DB doing both things
sounds great until it's an operational nightmare.

On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh 
wrote:

> +1 on Datastax and could consider looking at Elassandra.
>
> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R 
> wrote:
>
>> Kenneth is right. Trying to port/support a relational model to a CQL
>> model the way you are doing it is not going to go well. You won’t be able
>> to scale or get the search flexibility that you want. It will make
>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>> availability, low latency, scalability, etc. so you need to store the data
>> the way you want to retrieve it (query first modeling!). You could look at
>> defining the “right” partition and clustering keys, so that the searches
>> are within a single, reasonably sized partition. And you could have lookup
>> tables for other common search patterns (item_by_model_name, etc.)
>>
>>
>>
>> If that kind of modeling gets you to a situation where you have too many
>> lookup tables to keep consistent, you could consider something like
>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>> searchable fields. A SOLR query will typically be an order of magnitude
>> slower than a partition key lookup, though.
>>
>>
>>
>> It really boils down to the purpose of the data store. If you are looking
>> for primarily an “anything goes” search engine, Cassandra may not be a good
>> choice. If you need Cassandra-level availability, extremely low latency
>> queries (on known access patterns), high volume/low latency writes, easy
>> scalability, etc. then you are going to have to rethink how you model the
>> data.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Kenneth Brotman 
>> *Sent:* Thursday, February 07, 2019 7:01 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Peter,
>>
>>
>>
>> Sounds like you may need to use a different architecture.  Perhaps you
>> need something like Presto or Kafka as a part of the solution.  If the data
>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>> have to transform the data you want to use with Cassandra so that a proper
>> data model for Cassandra can be used.
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us ]
>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I have read the material. The problem is that the application has a
>> query facility available to the user where they can type in "(A = foo AND B
>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>> many of which are columns in the mytable below while others are from other
>> tables. This query facility was implemented and shipped years before we
>> decided to move to Cassandra
>>
>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> The problem is you’re not using a query first design.  I would recommend
>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>> Carpenter and Eben Hewitt.  It’s available free online at this link
>> 
>> .
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
>> inefficient data model. I haven't figured out how to do it correctly yet
>>
>> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> Exactly.  When you design your data model correctly you shouldn’t have to
>> use ALLOW FILTERING in the queries.  That is not recommended.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:09 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> You are completely right! My problem is that I am

Reaper 1.4 released

2019-02-15 Thread Jonathan Haddad

Hey folks,

I'm happy to share we (The Last Pickle) have just released version 1.4 of
Reaper.  For those of you who aren't aware of the project, it's an open
source tool for managing sub-range repairs, originally created by Spotify,
which we picked up and adopted about two years ago.

There's a blog post covering all the changes here:
http://thelastpickle.com/blog/2019/02/15/reaper-1_4-released.html

We're continued to move past the original repair tool's features.  We now
support taking cluster wide snapshots, UI for examining thread pools,
viewing active streaming sessions, and a bunch more.  The webui changes are
built on top of the original work done by Stefan Podkowinski.  The
highlights in this latest version adds the ability to secure the reaper
environment via Apache Shiro and auto blacklisting TWCS / DTCS tables.

The reaper website is here if you want to grab it:
http://cassandra-reaper.io/


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread Jonathan Haddad

Create the first node, setting the tokens manually.
Create the keyspace.
Add the rest of the nodes with the allocate tokens uncommented.

On Thu, Feb 14, 2019 at 11:43 AM DuyHai Doan  wrote:

> Hello users
>
> By looking at the mailing list archive, there was already some questions
> about the flag "allocate_tokens_for_keyspace" from cassandra.yaml
>
> I'm starting a fresh new cluster (with 0 data).
>
> The keyspace used by the project is raw_data so I
> set allocate_tokens_for_keyspace = raw_data in the cassandra.yaml
>
> However the cluster fails to start, the keyspace does not exist yet (of
> course, it is not yet created).
>
> So to me it is like chicken and egg issue:
>
> 1. You create a fresh new cluster with the option
> "allocate_tokens_for_keyspace" commented out, in this case you cannot
> optimize the token allocations
> 2. You create a fresh new cluster with option
> "allocate_tokens_for_keyspace" pointing to a not-yet created keyspace, it
> fails (logically)
>
> The third option is:
>
>  a. create a new cluster with "allocate_tokens_for_keyspace" commented out
>  b. create the keyspace "raw_data"
>  c. set allocate_tokens_for_keyspace = raw_data
>
> My question is, since after step a. the token allocation is *already*
> done, what's the point setting the flag in step c. 
>
> Regards
>
> Duy Hai DOAN
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Max number of windows when using TWCS

2019-02-11 Thread Jonathan Haddad

Deleting SSTables manually can be useful if you don't know your TTL up
front.  For example, you have an ETL process that moves your raw Cassandra
data into S3 as parquet files, and you want to be sure that process is
completed before you delete the data.  You could also start out without
setting a TTL and later realize you need one.  This is a remarkably common
problem.

On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth  wrote:

> Jeff,
>
> It means we have to delete sstables manually?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
>
> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: datamodelling

2019-02-05 Thread Jonathan Haddad

We (The Last Pickle) wrote a blog post on scaling time series:
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Rather than an agent_type, you can use a application determined bucket, so
that agents with more data use more buckets.  That'll keep your partition
sizes under control.  The blog post goes into a bit of detail, I won't
rehash it all here.  It's a pretty standard solution to this problem.

On Tue, Feb 5, 2019 at 11:38 AM Bobbie Haynes  wrote:

> even if i try to create a agent_type it will be same issue again because
> agent_id and agent_type have same values...
>
> On Tue, Feb 5, 2019 at 11:36 AM Bobbie Haynes 
> wrote:
>
>> unfortunately i do not have different of agents(agent_type) .. i only
>> have agent_id which is also a UUID type.
>>
>> On Tue, Feb 5, 2019 at 11:34 AM Nitan Kainth 
>> wrote:
>>
>>> You could consider a sudo column like agent_type and make it a compound
>>> partition key. It will limit break your partition into smaller ones but you
>>> will have to query with agent_id and agent_type in that case.
>>>
>>> On Tue, Feb 5, 2019 at 12:59 PM Bobbie Haynes 
>>> wrote:
>>>
 Hi Everyone,
   Could you please help me in modeling my table
 below.I'm stuck here. My Partition key is agent_id and clustering column is
 rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
 busy the agent .I'm facing large partition issue for my busy agents.
 I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
 ratio) and have deletes also in the table by agentid.


 CREATE TABLE IF NOT EXISTS XXX (
  agent_id UUID,
  row_id BIGINT,
  path TEXT,
  security_attributes TEXT,
  actor TEXT,
  PRIMARY KEY (agent_id,row_id)
 )

>>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: High CPU usage on reading single row with Set column with short TTL

2019-01-28 Thread Jonathan Haddad

Your fastest route might be to run a profiler on Cassandra and get some
flame graphs.  I'm a fan of the async-profiler:

https://github.com/jvm-profiling-tools/async-profiler

Joey Lynch did a nice write up in the documentation on a different process,
which I haven't used yet:

http://cassandra.apache.org/doc/latest/troubleshooting/use_tools.html#cpu-flamegraphs

If I have some time today I'll put together a tlp-stress (
https://github.com/thelastpickle/tlp-stress) workload to see if I can
reproduce it locally.

Jon



On Mon, Jan 28, 2019 at 7:23 AM Tom Wollert 
wrote:

>
> Hello,
>
> We have noticed CPU usage spike after several minutes of consistent load
> when querying:
> - a single column of set type (same partition key)
> - relatively frequently (couple hundred times per second, for comparison,
> we do an order of magnitude more reads already with much bigger payloads)
> - with the elements in the set having a very short TTL ( single digit
> seconds) and several inserts per second
> - gc_grace set to 0 (that should remove hints and should prevent
> tombstones)
> - reads and writes are using local quorum consistency
> - replication factor of 3 (on 4 node setup)
>
> I am struggling to figure out where the high CPU usage comes from (and
> thus how to resolve it) and hoping that some one sees what we are doing
> wrong. I'd expect the data to stay in memory on the cluster and have
> constant read time.
>
> The use case is rate limiting. We basically limit a user (for example) to
> 20 requests per 5 seconds and are using cassandra's TTL to implement it
> across all live servers. So when a request comes along we run the following
> query:
>
> SELECT tokens
> FROM recent_request_token_bucket
> WHERE  usagekey = 'some user id'
>
> If the tokens' count is less than 20 we execute
> UPDATE recent_request_token_bucket
> USING TTL 5
> SET tokens = tokens + 'Guid.NewGuid()'
> WHERE usagekey =  'some user id'
>
> If the token's count is greater than 20 we reject the request
>
> The table definition is
> CREATE TABLE recent_request_token_bucket
> (
> usagekey text,
> tokens set,
> PRIMARY KEY (usagekey)
> )
> WITH
> compaction={'min_threshold': 2, 'class':
> 'SizeTieredCompactionStrategy', 'max_threshold': 32}
> AND
> compression={'sstable_compression': 'SnappyCompressor'}
> AND
> gc_grace_seconds=0;
>
> I have replicated it with the following:
> 200 reads per second
> 3 inserts per second
>
> This starts of with CPU load is ~10% and average response time (as
> reported by my console app) 1-2 ms
> After 5 the CPU load creeps up to ~20% and average response time 2-4ms
> After 10 minutes the CPU load is over 50 and average response times starts
> to hit 10ms
> After 15 minutes the CPU load is near 100% and response times over 100ms
> become normal.
>
> Interestingly, when aborting the application, waiting several minutes and
> then restarting, the response times and CPU load on the server remain
> terrible. It's like I poisoned that partition key permanently. This also
> survives flushes of the memtable.
>
> I'd expect a constant response time in our use case as there should be no
> more than 20 odd guids in the set. But it appears that cassandra maintains
> the tombstones in memory?
>
> We are running 2.1.20
>
> I'd appreciate any pointers!
>
> Cheers,
>
> Tom
>
> --
> Development Director
>
> | T: 0800 021 0888 | M: 0790 4489797 | www.codeweavers.net |
> | Codeweavers Limited | Barn 4 | Dunston Business Village | Dunston | ST18
> 9AB |
> | Registered in England and Wales No. 04092394 | VAT registration no. 974
> 9705 63 |
>
>  CUSTOMERS' BLOG  TWITTER
>    FACEBOOK
>   
> LINKED
> IN  DEVELOPERS' BLOG
>   YOUTUBE
> 
>
> 
>
> What's happened at Codeweavers in 2018?
> 
>  l *Codeweavers 2018 Stats & Trends
> *
>
> *Phone:* 0800 021 0888  * Email: *contac...@codeweavers.net
> *Codeweavers Ltd* | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
> Registered in England and Wales No. 04092394 | VAT registration no. 974
> 9705 63
>
> [image: Twitter]  [image: Facebook]
>  [image: linkedin]
> 
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Datastax Java Driver compatibility

2019-01-22 Thread Jonathan Haddad

The drivers are not maintained by the Cassandra project, it's up to each
driver maintainer to list their compatibility.

On Tue, Jan 22, 2019 at 10:48 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks for the response Amanda,
>
> Yes we can go with the latest version but we are trying one change at a
> time, so want to make sure the version compatibility. b/w any plans to
> update the documentation for the latest versions for apache cassandra?
>
> On Tue, Jan 22, 2019 at 10:28 AM Amanda Moran 
> wrote:
>
>> Hi there-
>>
>> I checked with the team here (at DataStax) and this should work. Any
>> reason you need to stick with Java Driver 3.2, there is a 3.6 release.
>>
>> Thanks!
>>
>> Amanda
>>
>> On Tue, Jan 22, 2019 at 8:45 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am looking for Datastax Driver compatibility vs apache cassandra
>>> 3.11.3 version.
>>> However the doc doesn't talk about the 3.11 version.
>>>
>>> https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html
>>>
>>> Can someone please confirm if the Datastax Java Driver 3.2.0 version
>>> work with 3.11.3 version of apache cassandra?
>>> Thanks
>>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Released an ACID-compliant transaction library on top of Cassandra

2019-01-16 Thread Jonathan Haddad

Sounds a bit like RAMP: http://rustyrazorblade.com/post/2015/ramp-made-easy/

On Wed, Jan 16, 2019 at 12:51 PM Carl Mueller
 wrote:

> "2) Overview: In essence, the protocol calls for each data item to
> maintain the last committed and perhaps also the currently active version,
> for the data and relevant metadata. Each version is tagged with meta-data
> pertaining to the transaction that created it. This includes the
> transaction commit time and transaction identifier that created it,
> pointing to a globally visible transaction status record (TSR) using a
> Universal Resource Identifier (URI). The TSR is used by the client to
> determine which version of the data item to use when reading it, and so
> that transaction commit can happen just by updating (in one step) the TSR.
> The transaction identifier, stored in the form of a URI, allows any client
> regardless of its location to inspect it the TSR in order to determine the
> transaction commitment state. Using the status of the TSR, any failure can
> be either rolled forward to the later version, or rolled back to the
> previous version. The test-andset capability on each item is used to
> determine a consistent winner when multiple transactions attempt concurrent
> activity on a conflicting set of items. A global order is put on the
> records, through a consistent hash of the record identifiers, and used when
> updating in order to prevent deadlocks. This approach is optimized to
> permit parallel processing of the commit activity."
>
> It seems to be a sort of log-structure/append/change tracking storage
> where multiple versions of the data to be updated are tracked as
> transactions are applied to them, and therefore can be rolled back.
>
> Probably all active versions are read and then reduced to the final
> product once all transactions are accounted for.
>
> Of course you can't have perpetual transaction changes stored so they must
> be ... compacted ... at some point?
>
>  which is basically what cassandra does at the node level in the
> read/write path with LSM, bloom filters, and merging data across disparate
> sstables...?
>
> The devil is in the details of these things of course. Is that about right?
>
> On Tue, Nov 13, 2018 at 9:54 AM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> Fanastic news!
>>
>> Ariel
>>
>> On Tue, Nov 13, 2018, at 10:36 AM, Hiroyuki Yamada wrote:
>> > Hi all,
>> >
>> > I am happy to release it under Apache 2 license now.
>> > https://github.com/scalar-labs/scalardb
>> >
>> > It passes not only jepsen but also our-built destructive testing.
>> > For jepsen tests, please check the following.
>> > https://github.com/scalar-labs/scalardb/tree/master/jepsen/scalardb
>> >
>> > Also, as Yuji mentioned the other day, we also fixed/updated jepsen
>> > tests for C* to make it work with the latest C* version properly and
>> > follow the new style.
>> > https://github.com/scalar-labs/jepsen/tree/cassandra
>> >
>> > In addition to that, we fixed/updated cassaforte used in the jepsen
>> > tests for C* to make it work with the latest java driver since
>> > cassaforte is not really maintained any more.
>> > https://github.com/scalar-labs/cassaforte/tree/driver-3.0-for-jepsen
>> >
>> > We are pleased to be able to contribute to the community by the above
>> updates.
>> > Please give us any feedbacks or questions.
>> >
>> > Thanks,
>> > Hiro
>> >
>> >
>> > On Wed, Oct 17, 2018 at 8:52 AM Hiroyuki Yamada 
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > Thank you for the comments and feedbacks.
>> > >
>> > > As Jonathan pointed out, it relies on LWT and uses the protocol
>> > > proposed in the paper.
>> > > Please read the design document for more detail.
>> > > https://github.com/scalar-labs/scalardb/blob/master/docs/design.md
>> > >
>> > > Regarding the licensing, we are thinking of releasing it with Apache 2
>> > > if lots of developers are interested in it.
>> > >
>> > > Best regards,
>> > > Hiroyuki
>> > > On Wed, Oct 17, 2018 at 3:13 AM Jonathan Ellis 
>> wrote:
>> > > >
>> > > > Which was followed up by
>> https://www.researchgate.net/profile/Akon_Dey/publication/282156834_Scalable_Distributed_Transactions_across_Heterogeneous_Stores/links/56058b9608ae5e8e3f32b98d.pdf
>> > > >
>> > > > On Tue, Oct 16, 2018 at 1:02 PM Jonathan Ellis 
>> wrote:
>> > > >>
>> > > >> It looks like it's based on this:
>> http://www.vldb.org/pvldb/vol6/p1434-dey.pdf
>> > > >>
>> > > >> On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg 
>> wrote:
>> > > >>>
>> > > >>> Hi,
>> > > >>>
>> > > >>> Yes this does sound great. Does this rely on Cassandra's internal
>> SERIAL consistency and CAS functionality or is that implemented at a higher
>> level?
>> > > >>>
>> > > >>> Regards,
>> > > >>> Ariel
>> > > >>>
>> > > >>> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
>> > > >>> > This is great!
>> > > >>> >
>> > > >>> > --
>> > > >>> > Jeff Jirsa
>> > > >>> >
>> > > >>> >
>> > > >>> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada <
>> mogwa...@gmail.com>

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad

> I’m still not sure if having tombstones vs. empty values / frozen UDTs
will have the same results.

When in doubt, benchmark.

Good luck,
Jon

On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos 
wrote:

> Loosing atomic updates is a good point, but in my use case its not a
> problem, since I always overwrite the whole record (no partitial updates).
>
> I’m still not sure if having tombstones vs. empty values / frozen UDTs
> will have the same results.
> When I update one row with 10 null columns it will create 10 tombstones.
> We do OLAP processing of data stored in Cassandra with Spark.
>
> When Spark requests range of data, lets say 1000 rows, I can easily hit
> the 10 000 tombstones threshold.
>
> Even if I would not hit the error threshold Spark requests would increase
> the heap pressure, because tombstones have to be collected and returned to
> coordinator.
>
> Are my assumptions correct ?
>
> On 4 Jan 2019, at 21:15, DuyHai Doan  wrote:
>
> The idea of storing your data as a single blob can be dangerous.
>
> Indeed, you loose the ability to perform atomic update on each column.
>
> In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
> row, 1st update changes column Firstname (let's say it's a Person record)
> and 2nd update changes column Lastname
>
> Now depending on the timestamp between the 2 updates, you'll have:
>
> - old Firstname, new Lastname
> - new Firstname, old Lastname
>
> having updates on columns atomically guarantees you to have new Firstname,
> new Lastname
>
> On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad  wrote:
>
>> Those are two different cases though.  It *sounds like* (again, I may be
>> missing the point) you're trying to overwrite a value with another value.
>> You're either going to serialize a blob and overwrite a single cell, or
>> you're going to overwrite all the cells and include a tombstone.
>>
>> When you do a read, reading a single tombstone vs a single vs is
>> essentially the same thing, performance wise.
>>
>> In your description you said "~ 20-100 events", and you're overwriting
>> the event each time, so I don't know how you go to 10K tombstones either.
>> Compaction will bring multiple tombstones together for a cell in the same
>> way it compacts multiple values for a single cell.
>>
>> I sounds to make like you're taking some advice about tombstones out of
>> context and trying to apply the advice to a different problem.  Again, I
>> might be misunderstanding what you're doing.
>>
>>
>> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
>> wrote:
>>
>>> Hello Jon,
>>>
>>> I thought having tombstones is much higher overhead than just
>>> overwriting values. The compaction overhead can be l similar, but I think
>>> the read performance is much worse.
>>>
>>> Tombstones accumulate and hang for 10 days (by default) before they are
>>> eligible for compaction.
>>>
>>> Also we have tombstone warning and error thresholds. If cassandra scans
>>> more than 10 000 tombstones, she will abort the query.
>>>
>>> According to this article:
>>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>>>
>>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>>> scan, within or across a partition, we need to keep the tombstones seen in
>>> memory so we can return them to the coordinator, which will use them to
>>> make sure other replicas also know about the deleted rows. With workloads
>>> that generate a lot of tombstones, this can cause performance problems and
>>> even exhaust the server heap. "*
>>>
>>> Regards,
>>> Tomas
>>>
>>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >>
>>>> If you're overwriting values, it really doesn't matter much if it's a
>>>> tombstone or any other value, they still need to be compacted and have the
>>>> same overhead at read time.
>>>>
>>>> Tombstones are problematic when you try to use Cassandra as a queue (or
>>>> something like a queue) and you need to scan over thousands of tombstones
>>>> in order to get to the real data.  You're simply overwriting a row and
>>>> trying to avoid a single tombstone.
>>>>
>>>> Maybe I'm missing something here.  Why do you think overwriting a
>>>> single cell with a tombstone is any worse than overwriting a single cell
>>>> with a value?
>>>>
>>>> Jon
>>>>
>>>>
>>>> On

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad

Not sure why they put that in there, it's definitely misleading.  There's
nothing arrow related in Cassandra.

There's an open JIRA, but nothing has been committed yet:
https://issues.apache.org/jira/browse/CASSANDRA-9259

On Wed, Jan 9, 2019 at 3:48 PM Tomas Bartalos 
wrote:

> There is a diagram on the homepage displaying Cassandra (with other
> storages) as source of data.
> https://arrow.apache.org/img/shared.png
>
> Which made me think there should be some integration...
>
> On Thu, 10 Jan 2019, 12:38 am Jonathan Haddad 
>> Where are you seeing that it works with Cassandra?  There's no mention of
>> it under https://arrow.apache.org/powered_by/, and on the homepage it
>> says only says that a Cassandra developer worked on it.
>>
>> We (unfortunately) don't do anything with it at the moment.
>>
>> On Wed, Jan 9, 2019 at 3:24 PM Tomas Bartalos 
>> wrote:
>>
>>> I’ve read lot of nice things about Apache Arrow in-memory columnar
>>> format. On their homepage they mention Cassandra as a possible storage
>>> which could interoperate with Arrow. Unfortunately I was not able to find
>>> any working example which would demonstrate their cooperation.
>>>
>>> *My use case:* I’m doing OLAP processing of data stored in Cassandra
>>> with Spark. I need to deduplicate data with Cassandra’s upserts, so other
>>> (more-suitable) storages like HDFS + parquet, ORC didn’t seem like an
>>> option.
>>> *What I’d like to achieve: *speed-up spark’s data ingestion from
>>> Cassandra.
>>>
>>> Is it possible to query data from Cassandra in Arrow format ?
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad

Where are you seeing that it works with Cassandra?  There's no mention of
it under https://arrow.apache.org/powered_by/, and on the homepage it says
only says that a Cassandra developer worked on it.

We (unfortunately) don't do anything with it at the moment.

On Wed, Jan 9, 2019 at 3:24 PM Tomas Bartalos 
wrote:

> I’ve read lot of nice things about Apache Arrow in-memory columnar format.
> On their homepage they mention Cassandra as a possible storage which could
> interoperate with Arrow. Unfortunately I was not able to find any working
> example which would demonstrate their cooperation.
>
> *My use case:* I’m doing OLAP processing of data stored in Cassandra with
> Spark. I need to deduplicate data with Cassandra’s upserts, so other
> (more-suitable) storages like HDFS + parquet, ORC didn’t seem like an
> option.
> *What I’d like to achieve: *speed-up spark’s data ingestion from
> Cassandra.
>
> Is it possible to query data from Cassandra in Arrow format ?
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jonathan Haddad

I've done some gossip simulations in the past and found virtually no
difference in the time it takes for messages to propagate in almost any
sized cluster.  IIRC it always converges by 17 iterations.  Thus, I
completely agree with Jeff's comment here.  If you aren't pushing 800-1000
nodes, it's not even worth bothering with.  Just be sure you have seeds in
each DC.

Something to be aware of - there's only a chance to gossip with a seed.
That chance goes down as cluster size increases, meaning seeds have less
and less of an impact as the cluster grows.  Once you get to 100+ nodes, a
given node is very rarely talking to a seed.

Just make sure when you start a node it's not in its own seed list and
you're good.


On Tue, Jan 8, 2019 at 9:39 AM Jeff Jirsa  wrote:

>
>
> On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet  wrote:
>
>> Hi Jeff,
>>
>> thanks for answering to most of my points!
>> From the reloadseeds' ticket, I followed to
>> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very
>> instructive, although a bit old.
>>
>>
>> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa  wrote:
>>
>>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet 
>>> wrote:
>>> >
>>> [...]
>>>
>>> >   In essence, in my example that would be:
>>> >
>>> >   - decide that #2 and #3 will be the new seed nodes
>>> >   - update all the configuration files of all the nodes to write the
>>> IP addresses of #2 and #3
>>> >   - DON'T restart any node - the new seed configuration will be picked
>>> up only if the Cassandra process restarts
>>> >
>>> > * If I can manage to sort my Cassandra nodes by their age, could it be
>>> a strategy to have the seeds set to the 2 oldest nodes in the cluster?
>>> (This implies these nodes would change as the cluster's nodes get
>>> upgraded/replaced).
>>>
>>> You could do this, seems like a lot of headache for little benefit.
>>> Could be done with simple seed provider and config management
>>> (puppet/chef/ansible) laying  down new yaml or with your own seed provider
>>>
>>
>> So, just to make it clear: sorting by age isn't a goal in itself, it was
>> just an example on how I could get a stable list.
>>
>> Right now, we have a dedicated group of seed nodes + a dedicated group
>> for non-seeds: doing rolling-upgrade of the nodes from the second list is
>> relatively painless (although slow) whereas we are facing the issues
>> discussed in CASSANDRA-3829 for the first group which are non-seeds nodes
>> are not bootstrapping automatically and we need to operate them in a more
>> careful way.
>>
>>
> Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host
> should need a new bootstrap. That should be a new host in your list, so it
> seems like this should be fairly rare?
>
>
>> What I'm really looking for is a way to simplify adding and removing
>> nodes into our (small) cluster: I can easily provide a small list of nodes
>> from our cluster with our config management tool so that new nodes are
>> discovering the rest of the cluster, but the documentation seems to imply
>> that seed nodes also have other functions and I'm not sure what problems we
>> could face trying to simplify this approach.
>>
>> Ideally, what I would like to have would be:
>>
>> * Considering a stable cluster (no new nodes, no nodes leaving), the N
>> seeds should be always the same N nodes
>> * Adding new nodes should not change that list
>> * Stopping/removing one of these N nodes should "promote" another
>> (non-seed) node as a seed
>>   - that would not restart the already running Cassandra nodes but would
>> update their configuration files.
>>   - if a node restart for whatever reason it would pick up this new
>> configuration
>>
>> So: no node would start its life as a seed, only a few already existing
>> node would have this status. We would not have to deal with the "a seed
>> node doesn't bootstrap" problem and it would make our operation process
>> simpler.
>>
>>
>>> > I also have some more general questions about seed nodes and how they
>>> work:
>>> >
>>> > * I understand that seed nodes are used when a node starts and needs
>>> to discover the rest of the cluster's nodes. Once the node has joined and
>>> the cluster is stable, are seed nodes still playing a role in day to day
>>> operations?
>>>
>>> They’re used probabilistically in gossip to encourage convergence.
>>> Mostly useful in large clusters.
>>>
>>
>> How "large" are we speaking here? How many nodes would it start to be
>> considered "large"?
>>
>
> ~800-1000
>
>
>> Also, about the convergence: is this related to how fast/often the
>> cluster topology is changing? (new nodes, leaving nodes, underlying IP
>> addresses changing, etc.)
>>
>>
> New nodes, nodes going up/down, and schema propagation.
>
>
>> Thanks for your answers!
>>
>>  Jonathan
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: SSTableMetadata Util

2019-01-07 Thread Jonathan Haddad

Try installing the cassandra-tools package.

On Mon, Jan 7, 2019 at 1:20 AM Igor Zubchenok  wrote:

> Same issue with 3.11.3:
>
> # find / -name sstable*
> /usr/bin/sstableverify
> /usr/bin/sstableupgrade
> /usr/bin/sstableloader
> /usr/bin/sstableutil
> /usr/bin/sstablescrub
>
> only these sstable utilities are installed
>
> what have I done wrong?
>
>
> On Tue, 2 Oct 2018 at 08:11 kurt greaves  wrote:
>
>> Pranay,
>>
>> 3.11.3 should include all the C* binaries in /usr/bin. Maybe try
>> reinstalling? Sounds like something got messed up along the way.
>>
>> Kurt
>>
>> On Tue, 2 Oct 2018 at 12:45, Pranay akula 
>> wrote:
>>
>>> Thanks Christophe,
>>>
>>> I have installed using rpm package I actually ran locate command to find
>>> the sstable utils I could find only those 4
>>>
>>> Probably I may need to manually copy them.
>>>
>>> Regards
>>> Pranay
>>>
>>> On Mon, Oct 1, 2018, 9:01 PM Christophe Schmitz <
>>> christo...@instaclustr.com> wrote:
>>>
 Hi Pranay,

 The sstablemetadata is still available in the tarball file
 ($CASSANDRA_HOME/tools/bin) in 3.11.3. Not sure why it is not available in
 your packaged installation, you might want to manually copy the one from
 the package to your /usr/bin/

 Additionaly, you can have a look at
 https://github.com/instaclustr/cassandra-sstable-tools which will
 provided you with the desired info, plus more info you might find useful.


 Christophe Schmitz - Instaclustr  -
 Cassandra | Kafka | Spark Consulting





 On Tue, 2 Oct 2018 at 11:31 Pranay akula 
 wrote:

> Hi,
>
> I am testing apache 3.11.3 i couldn't find sstablemetadata util
>
> All i can see is only these utilities in /usr/bin
>
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstableverify
> -rwxr-xr-x.   1 root root2045 Jul 25 06:12 sstableutil
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstableupgrade
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstablescrub
> -rwxr-xr-x.   1 root root2034 Jul 25 06:12 sstableloader
>
>
> If this utility is no longer available how can i get sstable metadata
> like repaired_at, Estimated droppable tombstones
>
>
> Thanks
> Pranay
>
 --
> Regards,
> Igor Zubchenok
>
> CTO at Multi Brains LLC
> Founder of taxistartup.com saytaxi.com chauffy.com
> Skype: igor.zubchenok
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-04 Thread Jonathan Haddad

If you absolutely have to use Cassandra as the source of your data, I agree
with Dor.

That being said, if you're going to be doing a lot of analytics, I
recommend using something other than Cassandra with Spark.  The performance
isn't particularly wonderful and you'll likely get anywhere from 10-50x
improvement from putting the data in an analytics friendly format (parquet)
and on a block / blob store (DFS or S3) instead.

On Fri, Jan 4, 2019 at 1:43 PM Goutham reddy 
wrote:

> Thank you very much Dor for the detailed information, yes that should be
> the primary reason why we have to isolate from Cassandra.
>
> Thanks and Regards,
> Goutham Reddy
>
>
> On Fri, Jan 4, 2019 at 1:29 PM Dor Laor  wrote:
>
>> I strongly recommend option B, separate clusters. Reasons:
>>  - Networking of node-node is negligible compared to networking within
>> the node
>>  - Different scaling considerations
>>Your workload may require 10 Spark nodes and 20 database nodes, so why
>> bundle them?
>>This ratio may also change over time as your application evolves and
>> amount of data changes.
>>  - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't
>> want it to affect Cassandra and the opposite.
>>If you isolate it with cgroups, you may have too much idle time when
>> the above doesn't happen.
>>
>>
>> On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
>> wrote:
>>
>>> Hi,
>>> We have requirement of heavy data lifting and analytics requirement and
>>> decided to go with Apache Spark. In the process we have come up with two
>>> patterns
>>> a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
>>> b. Apache Spark on one independent cluster and Apache Cassandra as one
>>> independent cluster.
>>>
>>> Need good pattern how to use the analytic engine for Cassandra. Thanks
>>> in advance.
>>>
>>> Regards
>>> Goutham.
>>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad

Those are two different cases though.  It *sounds like* (again, I may be
missing the point) you're trying to overwrite a value with another value.
You're either going to serialize a blob and overwrite a single cell, or
you're going to overwrite all the cells and include a tombstone.

When you do a read, reading a single tombstone vs a single vs is
essentially the same thing, performance wise.

In your description you said "~ 20-100 events", and you're overwriting the
event each time, so I don't know how you go to 10K tombstones either.
Compaction will bring multiple tombstones together for a cell in the same
way it compacts multiple values for a single cell.

I sounds to make like you're taking some advice about tombstones out of
context and trying to apply the advice to a different problem.  Again, I
might be misunderstanding what you're doing.


On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
wrote:

> Hello Jon,
>
> I thought having tombstones is much higher overhead than just overwriting
> values. The compaction overhead can be l similar, but I think the read
> performance is much worse.
>
> Tombstones accumulate and hang for 10 days (by default) before they are
> eligible for compaction.
>
> Also we have tombstone warning and error thresholds. If cassandra scans
> more than 10 000 tombstones, she will abort the query.
>
> According to this article:
> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>
> "The cassandra.yaml comments explain in perfectly: *“When executing a
> scan, within or across a partition, we need to keep the tombstones seen in
> memory so we can return them to the coordinator, which will use them to
> make sure other replicas also know about the deleted rows. With workloads
> that generate a lot of tombstones, this can cause performance problems and
> even exhaust the server heap. "*
>
> Regards,
> Tomas
>
> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad 
>> If you're overwriting values, it really doesn't matter much if it's a
>> tombstone or any other value, they still need to be compacted and have the
>> same overhead at read time.
>>
>> Tombstones are problematic when you try to use Cassandra as a queue (or
>> something like a queue) and you need to scan over thousands of tombstones
>> in order to get to the real data.  You're simply overwriting a row and
>> trying to avoid a single tombstone.
>>
>> Maybe I'm missing something here.  Why do you think overwriting a single
>> cell with a tombstone is any worse than overwriting a single cell with a
>> value?
>>
>> Jon
>>
>>
>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
>> wrote:
>>
>>> Hello,
>>>
>>> I beleive your approach is the same as using spark with "
>>> spark.cassandra.output.ignoreNulls=true"
>>> This will not cover the situation when a value have to be overwriten
>>> with null.
>>>
>>> I found one possible solution - change the schema to keep only primary
>>> key fields and move all other fields to frozen UDT.
>>> create table (year, month, day, id, frozen, primary key((year,
>>> month, day), id) )
>>> In this way anything that is null inside event doesn't create tombstone,
>>> since event is serialized to BLOB.
>>> The penalty is in need of deserializing the whole Event when selecting
>>> only few columns.
>>> Can anyone confirm if this is good solution performance wise?
>>>
>>> Thank you,
>>>
>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>
>>>> "The problem is I can't know the combination of set/unset values" -->
>>>> Just for this requirement, Achilles has a working solution for many years
>>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>>
>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>>>
>>>> Or you can use the Update API that by design only perform update on not
>>>> null fields:
>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>>>
>>>>
>>>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>>>> statement, Achilles will check its prepared statement cache and if the
>>>> statement does not exist yet, create a new prepared statement and put it
>>>> into the cache for later re-use for you
>>>>
>>>> Disclaiment: I'm the creator of Achilles
>>>>
>>>>
>>>>
>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>> tomas.barta...@gma

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad

If you're overwriting values, it really doesn't matter much if it's a
tombstone or any other value, they still need to be compacted and have the
same overhead at read time.

Tombstones are problematic when you try to use Cassandra as a queue (or
something like a queue) and you need to scan over thousands of tombstones
in order to get to the real data.  You're simply overwriting a row and
trying to avoid a single tombstone.

Maybe I'm missing something here.  Why do you think overwriting a single
cell with a tombstone is any worse than overwriting a single cell with a
value?

Jon


On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
wrote:

> Hello,
>
> I beleive your approach is the same as using spark with "
> spark.cassandra.output.ignoreNulls=true"
> This will not cover the situation when a value have to be overwriten with
> null.
>
> I found one possible solution - change the schema to keep only primary key
> fields and move all other fields to frozen UDT.
> create table (year, month, day, id, frozen, primary key((year,
> month, day), id) )
> In this way anything that is null inside event doesn't create tombstone,
> since event is serialized to BLOB.
> The penalty is in need of deserializing the whole Event when selecting
> only few columns.
> Can anyone confirm if this is good solution performance wise?
>
> Thank you,
>
> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan 
>> "The problem is I can't know the combination of set/unset values" -->
>> Just for this requirement, Achilles has a working solution for many years
>> using INSERT_NOT_NULL_FIELDS strategy:
>>
>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>
>> Or you can use the Update API that by design only perform update on not
>> null fields:
>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>
>>
>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>> statement, Achilles will check its prepared statement cache and if the
>> statement does not exist yet, create a new prepared statement and put it
>> into the cache for later re-use for you
>>
>> Disclaiment: I'm the creator of Achilles
>>
>>
>>
>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
>> wrote:
>>
>>> Hello,
>>>
>>> The problem is I can't know the combination of set/unset values. From my
>>> perspective every value should be set. The event from Kafka represents the
>>> complete state of the happening at certain point in time. In my table I
>>> want to store the latest event so the most recent state of the happening
>>> (in this table I don't care about the history). Actually I used wrong
>>> expression since its just the opposite of "incremental update", every event
>>> carries all data (state) for specific point of time.
>>>
>>> The event is represented with nested json structure. Top level elements
>>> of the json are table fields with type like text, boolean, timestamp, list
>>> and the nested elements are UDT fields.
>>>
>>> Simplified example:
>>> There is a new purchase for the happening, event:
>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>> I don't know what actually happened for this event, maybe there is a new
>>> item purchased, maybe some customer info have been changed, maybe the
>>> specials have been revoked and I have to reset them. I just need to store
>>> the state as it artived from Kafka, there might already be an event for
>>> this happening saved before, or maybe this is the first one.
>>>
>>> BR,
>>> Tomas
>>>
>>>
>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>
 Depending on the use case, creating separate prepared statements for
 each combination of set / unset values in large INSERT/UPDATE statements
 may be prohibitive.

 Instead, you can look into driver level support for UNSET values.
 Requires Cassandra 2.2 or later IIRC.

 See:
 Java Driver:
 https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
 Python Driver:
 https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
 Node Driver:
 https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset

 On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
 sean_r_dur...@homedepot.com> wrote:

> You say the events are incremental updates. I am interpreting this to
> mean only some columns are updated. Others should keep their original
> values.
>
> You are correct that inserting null creates a tombstone.
>
> Can you only insert the columns that actually have new values? Just
> skip the columns with no information. (Make the insert generator a bit
> smarter.)
>
> Create table happening (id text primary key, event text, a text, b
> text, c text);
> Insert into table

Re: Sub range repair

2019-01-01 Thread Jonathan Haddad

We (the last pickle) maintain an open source tool for dealing with this:
http://cassandra-reaper.io

On Tue, Jan 1, 2019 at 12:31 PM Rahul Reddy 
wrote:

> Hello,
>
> Is it possible to find subrange needed for repair in Apache Cassandra like
> dse which uses dsetool list_subranges like below doc
>
>
> https://docs.datastax.com/en/archived/datastax_enterprise/4.8/datastax_enterprise/srch/srchRepair.html?hl=repair
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Cassandra Integrated Auth for JMX

2018-12-16 Thread Jonathan Haddad

Jolokia is running as an agent, which means it runs in process and has
access to everything within the JVM.

JMX credentials are supplies to the JMX server, which Jolokia is bypassing.

You'll need to read up on Jolokia's security if you want to keep using it:
https://jolokia.org/reference/html/security.html

Jon

On Sun, Dec 16, 2018 at 7:26 AM Cyril Scetbon  wrote:

> Hey guys,
>
> I’ve followed
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureJmxAuthentication.html
>  to
> setup JMX with Cassandra’s internal auth using Cassandra 3.11.3
>
> However I still can connect to JMX without authenticating. You can see in
> the following attempts that authentication is set up :
>
> cassandra@ 2a1d064ce844 / $ cqlsh -u cassandra -p cassandra
> Connected to MyCluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
> cassandra@cqlsh>
>
> cassandra@ 2a1d064ce844 / $ cqlsh -u cassandra -p cassandra2
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> AuthenticationFailed('Failed to authenticate to 127.0.0.1: Error from
> server: code=0100 [Bad credentials] message="Provided username cassandra
> and/or password are incorrect"',)})
>
> Here is my whole JVM's configuration :
>
> -Xloggc:/var/log/cassandra/gc.log, -XX:+UseThreadPriorities,
> -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k,
> -XX:StringTableSize=103, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking,
> -XX:+UseTLAB, -XX:+ResizeTLAB, -Djava.net.preferIPv4Stack=true, -Xms128M,
> -Xmx128M, -XX:+UseG1GC, -XX:G1RSetUpdatingPauseTimePercent=5,
> -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC,
> -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime,
> -XX:+PrintPromotionFailure,
> -javaagent:/usr/local/share/jolokia-agent.jar=host=0.0.0.0,executor=fixed,
> -javaagent:/usr/local/share/prometheus-agent.jar=1234:/etc/cassandra/prometheus.yaml,
> -XX:+PrintCommandLineFlags, -Xloggc:/var/lib/cassandra/log/gc.log,
> -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10,
> -XX:GCLogFileSize=10M, -Dcassandra.migration_task_wait_in_seconds=1,
> -Dcassandra.ring_delay_ms=3,
> -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler,
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar,
> -Dcassandra.jmx.remote.port=7199,
> -Dcom.sun.management.jmxremote.rmi.port=7199,
> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin,
> -Dcom.sun.management.jmxremote.authenticate=true,
> -Dcassandra.jmx.remote.login.config=CassandraLogin,
> -Djava.security.auth.login.config=/etc/cassandra/cassandra-jaas.config,
> -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy,
> -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.ssl=false,
> -Dcom.sun.management.jmxremote.local.only=false,
> -Dcassandra.jmx.remote.port=7199,
> -Dcom.sun.management.jmxremote.rmi.port=7199, -Djava.rmi.server.hostname=
> 2a1d064ce844,
> -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1,
> -XX:OnOutOfMemoryError=kill -9 %p, -Dlogback.configurationFile=logback.xml,
> -Dcassandra.logdir=/var/log/cassandra,
> -Dcassandra.storagedir=/var/lib/cassandra, -Dcassandra-foreground=yes
>
> But I still can query JMX without authenticating :
>
> echo '{"mbean": "org.apache.cassandra.db:type=StorageService",
> "attribute": "OperationMode", "type": "read"}' | http -a
> cassandra:cassandra POST http://localhost:8778/jolokia/
> HTTP/1.1 200 OK
> Cache-control: no-cache
> Content-type: text/plain; charset=utf-8
> Date: Sun, 16 Dec 2018 05:15:36 GMT
> Expires: Sun, 16 Dec 2018 04:15:36 GMT
> Pragma: no-cache
> Transfer-encoding: chunked
>
> {
>"request": {
>"attribute": "OperationMode",
>"mbean": "org.apache.cassandra.db:type=StorageService",
>"type": "read"
>},
>"status": 200,
>"timestamp": 1544937336,
>"value": "NORMAL"
> }
>
>
> I also have to add that I had to change permissions on the file
> $JAVA_HOME/lib/management/jmxremote.password which is weird as it should
> not be used in that case, but Cassandra was complaining before I did it.
>
> Is there anything I'm missing ?
>
> Thanks
> —
> Cyril Scetbon
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Jonathan Haddad

Seeing high kswapd usage means there's a lot of churn in the page cache.
It doesn't mean you're using swap, it means the box is spending time
clearing pages out of the page cache to make room for the stuff you're
reading now.  The machines don't have enough memory - they are way
undersized for a production workload.

Things that make it worse:
* high readahead (use 8kb on ssd)
* high compression chunk length when reading small rows / partitions.
Nobody specifies this, 64KB by default is awful.  I almost always switch to
4KB-16KB here but on these boxes you're kind of screwed since you're
already basically out of memory.

I'd never put Cassandra in production with less than 30GB ram and 8 cores
per box.

On Wed, Dec 5, 2018 at 10:44 AM Jon Meredith  wrote:

> The kswapd issue is interesting, is it possible you're being affected by
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 - although I
> don't see a fix for Trusty listed on there?
>
> On Wed, Dec 5, 2018 at 11:34 AM Riccardo Ferrari 
> wrote:
>
>> Hi Alex,
>>
>> I saw that behaviout in the past. I can tell you the kswapd0 usage is
>> connected to the `disk_access_mode` property. On 64bit systems defaults to
>> mmap. That also explains why your virtual memory is so high (it somehow
>> matches the node load, right?). I can not find and good reference however
>> googling for "kswapd0 cassandra" you'll find quite some resources.
>>
>> HTH,
>>
>> On Wed, Dec 5, 2018 at 6:39 PM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> Hello,
>>>
>>> We are running the following setup on AWS EC2:
>>>
>>> Host system (AWS AMI): Ubuntu 14.04.4 LTS,
>>> Linux  4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5
>>> 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Cassandra process runs inside a docker container.
>>> Docker image is based on Ubuntu 18.04.1 LTS.
>>>
>>> Apache Cassandra 3.0.17, installed from .deb packages.
>>>
>>> $ java -version
>>> openjdk version "1.8.0_181"
>>> OpenJDK Runtime Environment (build
>>> 1.8.0_181-8u181-b13-1ubuntu0.18.04.1-b13)
>>> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
>>>
>>> We have a total of 36 nodes.  All are r4.large instances, they have 2
>>> vCPUs and ~15 GB RAM.
>>> On each instance we have:
>>> - 2TB gp2 SSD EBS volume for data and commit log,
>>> - 8GB gp2 SSD EBS for system (root volume).
>>>
>>> Non-default settings in cassandra.yaml:
>>> num_tokens: 16
>>> memtable_flush_writers: 1
>>> concurrent_compactors: 1
>>> snitch: Ec2Snitch
>>>
>>> JVM heap/stack size options: -Xms8G -Xmx8G -Xmn800M -Xss256k
>>> Garbage collection: CMS with default settings.
>>>
>>> We repair once a week using Cassandra Reaper: parallel, intensity 1, 64
>>> segments per node.  The issue also happens outside of repair time.
>>>
>>> The symptoms:
>>> 
>>>
>>> Sporadically a node becomes unavailable for a period of time between few
>>> minutes and a few hours.  According to our analysis and as pointed out by
>>> AWS support team, the unavailability is caused by exceptionally high read
>>> bandwidth on the *root* EBS volume.  I repeat, on the root volume, *not* on
>>> the data/commitlog volume.  Basically, the amount if IO exceeds instance's
>>> bandwidth (~52MB/s) and all other network communication becomes impossible.
>>>
>>> The root volume contains operating system, docker container with OpenJDK
>>> and Cassandra binaries, and the logs.
>>>
>>> Most of the time, whenever this happens it is too late to SSH into the
>>> instance to troubleshoot: it becomes completely unavailable within very
>>> short period of time.
>>> Rebooting the affected instance helps to bring it back to life.
>>>
>>> Starting from the middle of last week we have seen this problem
>>> repeatedly 1-3 times a day, affecting different instances in a seemingly
>>> random fashion.  Most of the time it affects only one instance, but we've
>>> had one incident when 9 nodes (3 from each of the 3 Availability Zones)
>>> were down at the same time due to this exact issue.
>>>
>>> Actually, we've had the same issue previously on the same Cassandra
>>> cluster around 3 months ago (beginning to mid of September 2018).  At that
>>> time we were running on m4.xlarge instances (these have 4 vCPUs and 16GB
>>> RAM).
>>>
>>> As a mitigation measure we have migrated away from those to r4.2xlarge.
>>> Then we didn't observe any issues for a few weeks, so we have scaled down
>>> two times: to r4.xlarge and then to r4.large.  The last migration was
>>> completed before Nov 13th.  No changes to the cluster or application
>>> happened since that time.
>>>
>>> Now, after some weeks the issue appears again...
>>>
>>> When we are not fast enough to react and reboot the affected instance,
>>> we can see that ultimately Linux OOM killer kicks in and kills the java
>>> process running Cassandra.  After that the instance becomes available
>>> almost immediately.  This allows us to rule out other processes running in
>>> background as

Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-01 Thread Jonathan Haddad

Dmitry is right. Generally speaking always go with the latest bug fix
release.

On Sat, Dec 1, 2018 at 10:14 AM Dmitry Saprykin 
wrote:

> See more here
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13004
>
> On Sat, Dec 1, 2018 at 1:02 PM Dmitry Saprykin 
> wrote:
>
>> Even more, 3.0.9 is a terrible target choice by itself. It has a nasty
>> bug corrupting sstables on alter.
>>
>> On Sat, Dec 1, 2018 at 11:55 AM Marc Selwan 
>> wrote:
>>
>>> Hi Shravan,
>>>
>>> Did you upgrade Apache Cassandra 2.1.9 to the latest patch release
>>> before doing the major upgrade? It's generally favorable to go to the
>>> latest patch release as often times they include fixes that smooth over the
>>> upgrade process. There are hundreds of bug fixes between 2.1.9 and 2.1.20
>>> (current version)
>>>
>>> Best,
>>> Marc
>>>
>>> On Fri, Nov 30, 2018 at 3:13 PM Shravan R  wrote:
>>>
 Hello,

 I am planning to upgrade Apache Cassandra 2.1.9 to Apache
 Cassandra-3.0.9. I came up with the version based on [1]. I followed
 upgrade steps as in [2]. I was testing the same in the lab and encountered
 issues (streaming just fails and hangs for ever) with bootstrapping a 3.0.9
 node on a partially upgraded cluster. [50% of nodes on 2.1.9 and 50% on
 3.0.9]. The production cluster that I am supporting is pretty large and I
 am anticipating to end up in a situation like this (Hope not) and would
 like to be prepared.

 1) How do deal with decommissioning a 2.1.9 node in a partially
 upgraded cluster?
 2) How to bootstrap a 3.x node to a partially upgraded cluster?
 3) Is there an alternative approach to the upgrade large clusters. i.e
 instead of going through nodetool upgradesstables on each node in rolling
 fashion

 As per [1] the general restriction is to avoid decommissioning or
 adding nodes but in reality there can be failures or maintenance that
 warrants us to do so.

 Please point me in the right direction.

 Thanks,
 Shravan

 [1]
 https://docs.datastax.com/en/upgrade/doc/upgrade/datastax_enterprise/upgdDSE50.html#upgdDSE50__cstar-version-change

 [2]
 https://myopsblog.wordpress.com/2017/12/04/upgrade-cassandra-cluster-from-2-x-to-3-x/

 --
>>> Marc Selwan | DataStax | Product Management | (925) 413-7079
>>>
>>>
>>> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: multiple node bootstrapping

2018-11-28 Thread Jonathan Haddad

Agree with Jeff here, using auto_bootstrap:false is probably not what you
want.

Have you increased your streaming throughput?

Upgrading to 3.11 might reduce the time by quite a bit:
https://issues.apache.org/jira/browse/CASSANDRA-9766

You'd be doing committers a huge favor if you grabbed some histograms and
flame graphs on both the sending an receiving nodes:
http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html and
sent them to the dev mailing list.



On Wed, Nov 28, 2018 at 3:59 AM Jeff Jirsa  wrote:

> This violates any consistency guarantees you have and isn’t the right
> approach unless you know what you’re giving up (correctness, typically)
>
> --
> Jeff Jirsa
>
>
> On Nov 28, 2018, at 2:40 AM, Vitali Dyachuk  wrote:
>
> You can use auto_bootstrap set to false to add a new node to the ring, it
> will calculate the token range for the new node, but will not start
> streaming the data.
> In this case you can add several nodes into the ring quickly. After that
> you can start nodetool rebuild -dc  <> to start streaming data.
> In your case 50Tb of data per node is quite a large amount of data i would
> recommend, based on own experience keeping 1Tb per node, since when
> streaming can be interrupted for some reason and it cannot be resumed so
> you'll have to restart streaming. Also there will be compaction problems.
>
> Vitali.
> On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU <
> osman.yozgatlio...@krontech.com> wrote:
>
>> Hello,
>>
>> I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
>>
>> I started one node in dc1 and its already joining. 3TB of 50TB finished
>> in 2 weeks. One year ttl time series data with twcs.
>>
>> I know, its not best practise..
>>
>> I want to start one node in dc2 and cassandra refused to start with
>> mentioning already one node in joining state.
>>
>> I find some workaround with jmx directives, but i'm not sure if I broke
>> something on the way.
>>
>> Is it wise to bootstrap in both dc at the same time?
>>
>>
>> Regards,
>>
>> Osman
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: system_auth keyspace replication factor

2018-11-23 Thread Jonathan Haddad

Any chance you’re logging in with the Cassandra user? It uses quorum reads.


On Fri, Nov 23, 2018 at 11:38 AM Vitali Dyachuk  wrote:

> Hi,
> We have recently met a problem when we added 60 nodes in 1 region to the
> cluster
> and set an RF=60 for the system_auth ks, following this documentation
> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useUpdateKeyspaceRF.html
> However we've started to see increased login latencies in the cluster 5x
> bigger than before changing RF of system_auth ks.
> We have casandra runner written is csharp, running against the cluster,
> when analyzing the logs we notices that   Rebuilding token map  is taking
> most of the time ~20s.
> When we changed RF to 3 the issue has resolved.
> We are using C* 3.0.17 , 4 DC, system_auth RF=3, "CassandraCSharpDriver"
> version="3.2.1"
> I've found somehow related to my problem ticket
> https://datastax-oss.atlassian.net/browse/CSHARP-436 but it says in the
> related tickets, that the issue with the token map rebuild time has been
> fixed in the previous versions of the driver.
> So my question is what is the best recommendation of the setting
> system_auth ks RF?
>
> Regards,
> Vitali Djatsuk.
>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Jonathan Haddad

Just because Cassandra doesn't do it doesn't mean you aren't able to
encrypt your data at rest, and you definitely don't need DSE to do it.  I
recommend checking out the LUKS project.

https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md

This, IMO, is a better option than having the database do it since with
this you are able to encrypt everything.  Your logs, indexes, etc.

Jon



On Wed, Nov 14, 2018 at 10:47 AM Durity, Sean R 
wrote:

> I think you are asking about **encryption** at rest. To my knowledge,
> open source Cassandra does not support this natively. There are options,
> like encrypting the data in the application before it gets to Cassandra.
> Some companies offer other solutions. IMO, if you need the increased
> security, it is worth using something like DataStax Enterprise.
>
>
>
>
>
> Sean Durity
>
> *From:* Goutham reddy 
> *Sent:* Tuesday, November 13, 2018 1:22 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Is Apache Cassandra supports Data at rest
>
>
>
> Hi,
>
> Does Apache Cassandra supports data at rest, because datastax Cassandra
> supports it. Can anybody help me.
>
>
>
> Thanks and Regards,
>
> Goutham.
>
> --
>
> Regards
>
> Goutham Reddy
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Multiple cluster for a single application

2018-11-07 Thread Jonathan Haddad

Interesting approach Eric, thanks for sharing that.

Regarding this:

> I've read documents recommended to use clusters with less than 50 or 100
nodes (Netflix got hundreds of clusters with less 100 nodes on each).

Not sure where you read that, but it's nonsense.  We work with quite a few
clusters that are several hundred nodes each.  Your problems can get a bit
amplified, for instance dynamic snitch can make a cluster perform
significantly worse than if you just flat out disable it, which is what I
usually recommend.

I'm curious how you arrived at the estimate of needing > 100 nodes.  Is
that due to space constraints or performance ones?



On Wed, Nov 7, 2018 at 12:52 PM Eric Stevens  wrote:

> We are engaging in both strategies at the same time:
>
> 1) We call it functional sharding - we write to clusters targeted
> according to the type of data being written.  Because different data types
> often have different workloads this has the nice side effect of being able
> to tune each cluster according to its workload.  Your ability to grow in
> this dimension is limited by the number of business object types you're
> recording.
>
> 2) We write to clusters sharded by time.  Our objects are network security
> events, so there's always an element of time.  We encode that time into
> deterministic object IDs so that we are able to identify in the read path
> which shard to direct the request to by extracting the time component.
> This basic idea should be able to work any time you're able to use
> surrogate keys instead of natural keys.  If you are using natural keys, you
> may be facing an unpleasant migration should you need to increase the
> number of shards in this dimension.
>
> Our reason for engaging in the second strategy was not purely Cassandra's
> fault, rather we were using DSE with a search workload, and the cost of
> rebuilding Solr indexes on streaming operations (such as adding nodes to an
> existing cluster) required enough resources that we found it prohibitive.
> That's because the bootstrapping node was also taking a production write
> workload, and we didn't want to run our cluster with enough overhead that a
> node could bootstrap and take production workload at the same time.
>
> For vanilla Cassandra workloads we have run clusters with quite a bit more
> nodes than 100 without any appreciable trouble.  Curious if you can share
> documents about clusters over 100 nodes causing troubles for users.  I'm
> wondering if it's related to node failure rate combined with vnodes meaning
> that several concurrent node failures cause a part of the ring to go
> offline too reliably.
>
> On Mon, Nov 5, 2018 at 7:38 AM onmstester onmstester
>  wrote:
>
>> Hi,
>>
>> One of my applications requires to create a cluster with more than 100
>> nodes, I've read documents recommended to use clusters with less than 50 or
>> 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each).
>> Is it a good idea to use multiple clusters for a single application, just
>> to decrease maintenance problems and system complexity/performance?
>> If So, which one of below policies is more suitable to distribute data
>> among clusters and Why?
>> 1. each cluster' would be responsible for a specific partial set of
>> tables only (table sizes are almost equal so easy calculations here) for
>> example inserts to table X would go to cluster Y
>> 2. shard data at loader level by some business logic grouping of data,
>> for example all rows with some column starting with X would go to cluster Y
>>
>> I would appreciate sharing your experiences working with big clusters,
>> problem encountered and solutions.
>>
>> Thanks in Advance
>>
>> Sent using Zoho Mail 
>>
>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad

Well, generally speaking I like to understand the problem before trying to
fit a solution.  If you're looking to set up millions of appointments for a
business, that might quality for some amount of partitioning / bucketing.
That said, you might be better off using time based buckets, say monthly or
yearly, and as part of the process consider the worst case scenario for
data size.  Is there a chance that in a given month there will be more than
50MB of data associated with a single account / entity?

If you design the table using the startdatetime as the clustering key,
you'll get your events back in the order they are scheduled, which has
obvious advantages but does come at the cost of increased complexity when
updating the start time.  The short answer is - you can't update it, you
have to delete the record and re-insert it with the updated data (you can't
update a clustering key).

Hope this helps.
Jon



On Sun, Nov 4, 2018 at 2:28 PM I PVP  wrote:

> For people(invitee), you are correct. They will not have millions of
> appointments. But, the organizer is a business.. a chain of businesses
> (Franchisor and Franchisees) that together across the country have dozens
> of thousands of appointments per day.
>
> Do you suggest removing the bucket , making the startdatetime clustering
> key and quering against the startdatetime  with > and 
> Wouldn't still have the issue to be able to update startdatetime  when an
> appointment gets rescheduled ?
>
> thanks.
>
> IPVP
>
> On November 4, 2018 at 7:25:05 PM, Jonathan Haddad (j...@jonhaddad.com)
> wrote:
>
> Maybe I’m missing something, but it seems to me that the bucket might be a
> little overkill for a scheduling system. Do you expect people to have
> millions of appointments?
>
> On Sun, Nov 4, 2018 at 12:46 PM I PVP  wrote:
>
>> Could you please provide advice on the modeling approach for the
>> following   appointment scheduling scenario?
>>
>> I am struggling to model in an way that allows to satisfy the requirement
>> to be able to update an appointment, specially to be able to change the
>> start datetime and consequently the bucket.
>>
>> Queries/requirements:
>> 1)The ability to select all appointments by invitee and by date range on
>> the start date
>>
>> 2)The ability to select all appointments by organizer and by date range
>> on the start date
>>
>> 3)The ability to update(date, location, status) of an specific
>> appointment.
>>
>> 4)The ability to delete an specific appointment
>>
>> Note: The bucket column is intended to allow date querying and to help
>> spread data evenly around the cluster. The bucket value is composed by
>> year+month+day sample bucket value: 20181104 )
>>
>>
>> CREATE TABLE appointment_by_invitee(
>> objectid timeuuid,
>> organizerid timeuuid,
>> inviteeid timeuuid,
>> bucket bigint,
>> status text,
>> location text,
>> startdatetime timestamp,
>> enddatetime timestamp,
>> PRIMARY KEY ((inviteeid, bucket), objectid)
>> );
>>
>> CREATE TABLE appointment_by_organizer(
>> objectid timeuuid,
>> organizerid timeuuid,
>> inviteeid timeuuid,
>> bucket bigint,
>> status text,
>> location text,
>> startdatetime timestamp,
>> enddatetime timestamp,
>> PRIMARY KEY ((organizerid, bucket), objectid)
>> );
>>
>>
>> Any help will be appreciated.
>>
>> Thanks
>>
>> IPVP
>>
>>
>> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad

Maybe I’m missing something, but it seems to me that the bucket might be a
little overkill for a scheduling system. Do you expect people to have
millions of appointments?

On Sun, Nov 4, 2018 at 12:46 PM I PVP  wrote:

> Could you please provide advice on the modeling approach for the following
>   appointment scheduling scenario?
>
> I am struggling to model in an way that allows to satisfy the requirement
> to be able to update an appointment, specially to be able to change the
> start datetime and consequently the bucket.
>
> Queries/requirements:
> 1)The ability to select all appointments by invitee and by date range on
> the start date
>
> 2)The ability to select all appointments by organizer and by date range on
> the start date
>
> 3)The ability to update(date, location, status) of an specific appointment.
>
> 4)The ability to delete an specific appointment
>
> Note: The bucket column is intended to allow date querying and to help
> spread data evenly around the cluster. The bucket value is composed by
> year+month+day sample bucket value: 20181104 )
>
>
> CREATE TABLE appointment_by_invitee(
> objectid timeuuid,
> organizerid timeuuid,
> inviteeid timeuuid,
> bucket bigint,
> status text,
> location text,
> startdatetime timestamp,
> enddatetime timestamp,
> PRIMARY KEY ((inviteeid, bucket), objectid)
> );
>
> CREATE TABLE appointment_by_organizer(
> objectid timeuuid,
> organizerid timeuuid,
> inviteeid timeuuid,
> bucket bigint,
> status text,
> location text,
> startdatetime timestamp,
> enddatetime timestamp,
> PRIMARY KEY ((organizerid, bucket), objectid)
> );
>
>
> Any help will be appreciated.
>
> Thanks
>
> IPVP
>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [ANNOUNCE] StratIO's Lucene plugin fork

2018-10-30 Thread Jonathan Haddad

Very cool Ben, thanks for sharing!

On Tue, Oct 30, 2018 at 6:14 PM Ben Slater 
wrote:

> For anyone who is interested, we’ve published a blog with some more
> background on this and some more detail of our ongoing plans:
> https://www.instaclustr.com/instaclustr-support-cassandra-lucene-index/
>
> Cheers
> Ben
>
> On Fri, 19 Oct 2018 at 09:42 kurt greaves  wrote:
>
>> Hi all,
>>
>> We've had confirmation from Stratio that they are no longer maintaining
>> their Lucene plugin for Apache Cassandra. We've thus decided to fork the
>> plugin to continue maintaining it. At this stage we won't be making any
>> additions to the plugin in the short term unless absolutely necessary, and
>> as 4.0 nears we'll begin making it compatible with the new major release.
>> We plan on taking the existing PR's and issues from the Stratio repository
>> and getting them merged/resolved, however this likely won't happen until
>> early next year. Having said that, we welcome all contributions and will
>> dedicate time to reviewing bugs in the current versions if people lodge
>> them and can help.
>>
>> I'll note that this is new ground for us, we don't have much existing
>> knowledge of the plugin but are determined to learn. If anyone out there
>> has established knowledge about the plugin we'd be grateful for any
>> assistance!
>>
>> You can find our fork here:
>> https://github.com/instaclustr/cassandra-lucene-index
>> At the moment, the only difference is that there is a 3.11.3 branch which
>> just has some minor changes to dependencies to better support 3.11.3.
>>
>> Cheers,
>> Kurt
>>
> --
>
>
> *Ben Slater*
>
> *Chief Product Officer *
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Cassandra | Cross Data Centre Replication Status

2018-10-30 Thread Jonathan Haddad

You need to run "nodetool rebuild -- " on each node in
the new DC to get the old data to replicate.  It doesn't do it
automatically because Cassandra has no way of knowing if you're done adding
nodes and if it were to migrate automatically, it could cause a lot of
problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
fun.

On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:

> Hi Experts,
>
> I previously had 1 Cassandra data centre in AWS Singapore region with 5
> nodes, with my keyspace's replication factor as 3 in Network topology.
>
> After this cluster has been running smoothly for 4 months (500 GB of data
> on each node's disk), I added 2nd data centre in AWS Mumbai region with yet
> again 5 nodes in Network topology.
>
> After updating my keyspace's replication factor to
> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
> region will immediately start replicating on the Mum region's nodes.
> However even after 2 weeks I do not see historical data to be replicated,
> but new data being written on Sgp region is present in Mum region as well.
>
> Any help or suggestions to debug this issue will be highly appreciated.
>
> Regards
> Akshay Bhardwaj
> +91-97111-33849
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Best compaction strategy

2018-10-25 Thread Jonathan Haddad

To add to what Alex suggested, if you know what keys use what TTL you could
store them in different tables, with different window settings.

Jon

On Fri, Oct 26, 2018 at 1:28 AM Alexander Dejanovski 
wrote:

> Hi Raman,
>
> TWCS is the best compaction strategy for TTL data, even if you have
> different TTLs (set the time window based on your largest TTL, so it would
> be 1 day in your case).
> Enable unchecked tombstone compaction to clear the data with 2 days TTL
> along the way. This is done by setting :
>
> ALTER TABLE my_table WITH compaction =
> {'class':'TimeWindowCompactionStrategy',
> 'unchecked_tombstone_compaction':'true', ...}
>
> If you're running 3.11.1 at least, you can turn on the
> unsafe_aggressive_sstable_expiration introduced by CASSANDRA-13418
> .
>
> Cheers,
>
> On Thu, Oct 25, 2018 at 2:59 PM raman gugnani 
> wrote:
>
>> Hi All,
>>
>> I have one table in which i have some data which has TTL of 2days and
>> some data which has TTL of 60 days. What compaction strategy will suits the
>> most.
>>
>>1. LeveledCompactionStrategy (LCS)
>>2. SizeTieredCompactionStrategy (STCS)
>>3. TimeWindowCompactionStrategy (TWCS)
>>
>>
>> --
>> Raman Gugnani
>>
>> 8588892293 <(858)%20889-2293>
>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Cassandra running Multiple JVM's

2018-10-24 Thread Jonathan Haddad

Another issue you'll need to consider is how the JVM allocates resources
towards GC, especially if you're using G1 with a pause time goal.
Specifically, if you let it pick it's own numbers for ParallelGCThreads &
ConcGCThreads they'll be based on the total number of CPUs, not the number
you've restricted access to, and you'll end up with GC taking way longer
than it should.  Anything that auto sizes based on available cores will
also be affected.  Set -XX:ParallelGCThreads=n and -XX:ConcGCThreads=n to
be no more than the number of cores you're going to allocate to each JVM
(by default CMS uses ConcGCThreads=8+(# proc-8)*(5/8)).

This is also a problem when working with containers, the JVM sees all the
cores not the resource limit.

Some other good reading on the topic:
https://xmlandmore.blogspot.com/2014/06/oracle-has-published-some-tuning-guides.html

Jon

On Thu, Oct 25, 2018 at 7:51 AM Jeff Jirsa  wrote:

> I don't have time to reply to your stackoverflow post, but what you
> proposed is a great idea for a server that size.
>
> You can use taskset or numactl to bind each JVM to the appropriate
> cores/zones.
> Setup a data directory on each SSD for the data
>
> There are two caveats you need to think about:
>
> 1) You'll need an IP per instance - so bind a second IP to each NIC and
> you should be good to go.
> 2) You need to make sure you dont have 2 replicas on the same host.
> Cassandra has "rack aware" snitches that can help with this, but you need
> to make sure that you treat each server as it's own rack to force replicas
> to land on different physical machines.
>
> If you do both of those things, you'll probably be happy with the result.
>
>
>
>
> On Wed, Oct 24, 2018 at 12:33 PM Bobbie Haynes 
> wrote:
>
>>  I have three Physical servers. I want to run cassandra on multiple JVM's
>> i.e each physical node contains 2 cassandra nodes so that i could able to
>> run 6 node cluster.Could anyone help me pointing setup guide.
>>
>> Each Physical node Configuration:- RAM -256 GB --- I want to assign each
>> JVM(64GB) cores --64 --- 32 cores each node. 6TB SSD .. seperate SSD disk
>> for each node 3TB each.
>>
>>
>>
>> https://stackoverflow.com/questions/52976646/cassandra-running-multiple-jvms
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: TWCS: Repair create new buckets with old data

2018-10-24 Thread Jonathan Haddad

Hey Meg, a couple thoughts.

>   Set a table level TTL with TWCS, and stop setting it with
inserts/updates (insert TTL overrides table level TTL). So, that your
entire sstable expires at the same time, as opposed to each insert expiring
at its own pace. So that for tombstone clean up, the system can just drop
the entire sstable at once.

Setting the TTL on a table or the query gives you the same exact result.
Setting it on the table is just there for convenience.  If it's not set at
the query level, it uses the default value.
See org.apache.cassandra.cql3.Attributes#getTimeToLive.  Generally speaking
I'd rather set it at the table level as well, but it's just to avoid weird
application bugs, not as an optimization.

>   I’d suggest removing the -pr. Running incremental repair with TWCS is
better.

If incremental repair worked correctly I would agree with you, but
unfortunately it doesn't.  Incremental repair has some subtle bugs that can
result in massive overstreaming due to the fact that it will sometimes not
correctly mark data as repaired.  My coworker Alex wrote up a good summary
on the changes to incremental going into 4.0 to fix these issues, it's
worth a read.
http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html
.

Reaper (OSS, maintained by us @ TLP, see http://cassandra-reaper.io/) has
the ability to schedule subrange repairs on one or more tables, or all
tables except those in a blacklist.  Doing frequent subrange repairs will
limit the amount of data that will get streamed in and should help keep
things pretty consistent unless you're dropping a lot of mutations.  It's
not perfect but should cause less headache than incremental repair will.

Hope this helps.
Jon



On Thu, Oct 25, 2018 at 4:21 AM Meg Mara  wrote:

> Hi Maik,
>
>
>
> I have a similar Cassandra env, with similar table requirements. So these
> would be my suggestions:
>
>
>
> ·   Set a table level TTL with TWCS, and stop setting it with
> inserts/updates (insert TTL overrides table level TTL). So, that your
> entire sstable expires at the same time, as opposed to each insert expiring
> at its own pace. So that for tombstone clean up, the system can just drop
> the entire sstable at once.
>
> ·   Since you’re on v3.0.9, nodetool repair command runs incremental
> repair by default. And with inc repair, -pr option is not recommended.
> (ref. link below)
>
> ·   I’d suggest removing the -pr. Running incremental repair with
> TWCS is better.
>
> ·   Here’s why I think so -> Full repair and Full repair with –PR
> option would include all  the sstables in the repair process, which means
> the chance of your oldest and newest data mixing is very high.
>
> ·   Whereas, if you run incremental repair every 5 days for example,
> only the last five days of data would be included in that repair operation.
> So, the maximum ‘damage’ it would do is mixing 5 day old data in a new
> sstable.
>
> ·   Your table level TTL would then tombstone this data on 4 month +
> 5 day mark instead of on the 4 month mark. Which shouldn’t be a big
> concern. At least in our case it isn’t!
>
> ·   I wouldn’t stop running repairs on our TWCS tables, because we
> are too concerned with data consistency.
>
>
>
>
>
> Please read the note here:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html
>
>
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com]
> *Sent:* Wednesday, October 24, 2018 2:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Meg,
>
> the ttl (4 month) is set during insert via insert statement with the
> application.
>
> The repair is started each day on one of ten hosts with command : nodetool
> --host hostname_# repair –pr
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* Meg Mara [mailto:mm...@digitalriver.com ]
> *Sent:* Dienstag, 23. Oktober 2018 17:05
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Maik,
>
>
>
> I noticed in your table description that your default_time_to_live = 0,
> so where is your TTL property set? At what point do your sstables get
> tombstoned?
>
>
>
> Also, could you please mention what kind of repair you performed on this
> table? (Incremental, Full, Full repair with -pr option)
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com ]
> *Sent:* Monday, October 22, 2018 10:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Ok, thanks.
>
> My conclusion:
>
> 1.   I will set unchecked_tombstone_compaction to true to get old
> data with tombstones removed
>
> 2.   I will exclude TWCS tables from repair
>
>
>
> Regarding exclude table from repair, is there any easy way to do this?
> Nodetool repaire do not support excludes.
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* wxn...@zjqunshuo.com

Re: openjdk for cassandra production cluster

2018-10-10 Thread Jonathan Haddad

The warning should be removed (if it hasn’t already), it’s unnecessary at
this point

On Wed, Oct 10, 2018 at 7:41 AM Prachi Rath  wrote:

> HI users,
> I have created a cassandra cluster with openjdk 1.8.0_181
> version.(cassandra 2.1.17)
> started each node, cluster looks healthy,but in  the log files saw the
> WARN
> message below:
>
> WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155) OpenJDK
> is not recommended. Please upgrade to the newest Oracle Java release
>
> Is this message  WARN informational only or can it be real issue?
> Did any one noticed something like this (Or using openjdk for production
> environment)
>
> Thanks ,
> Prachi
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: SNAPSHOT builds?

2018-09-29 Thread Jonathan Haddad

Hey James, you’ll have to build it. Java 11 is out  but the build
instructions still apply:

http://thelastpickle.com/blog/2018/08/16/java11.html

On Sat, Sep 29, 2018 at 7:01 AM James Carman 
wrote:

> I am trying to find 4.x SNAPSHOT builds.  Are they available anywhere
> handy?  I'm trying to work on Java 11 compatibility for a library.
>
> Thanks,
>
> James
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Large partitions

2018-09-13 Thread Jonathan Haddad

It depends on a number of factors, such as compaction strategy and read
patterns.  I recommend sticking to the 100MB per partition limit (and I aim
for significantly less than that).

If you're doing time series with TWCS & TTL'ed data and small enough
windows, and you're only querying for a small subset of the data, sure, you
could do it.  Outside of that, I don't see a reason why you'd want to.  I
wrote a blog post on how to scale time series workloads in Cassandra a ways
back, might be worth a read:
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Regarding your write performance, since you're only bound by commit log
performance + memtable inserts, if your writes are slow there's a good
chance you're hitting long GC pauses.  Those *could* be caused by
compaction.  If your compaction throughput is too high you could see high
rates of object allocation which lead to long GC pauses, slowing down your
writes.  There's other things that can cause long GC pauses, sometimes you
just need some basic tuning. I recommend reading up on it:
http://thelastpickle.com/blog/2018/04/11/gc-tuning.html

Jon

On Thu, Sep 13, 2018 at 9:47 AM Mun Dega  wrote:

> I disagree.
>
> We had several over 150MB in 3.11 and we were able to break cluster doing
> r/w from these partitions in a short period of time.
>
> On Thu, Sep 13, 2018, 12:42 Gedeon Kamga  wrote:
>
>> Folks,
>>
>> Based on the information found here
>> https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
>>  ,
>> the recommended limit for a partition size is 100MB. Even though, DataStax
>> clearly states that this is a rule of thumb, some team members are claiming
>> that our Cassandra *Write *is very slow because the partitions on some
>> tables are over 100MB. I know for a fact that this rule has changed since
>> 2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
>> size is *a few hundreds MB*, given the improvement on the architecture.
>> Now, I am unable to find the reference (maybe I got it at a Cassandra
>> training by DataStax). I would like to share it with my team. Did anyone
>> come across this information? If yes, can you please share it?
>>
>> Thanks!
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-09 Thread Jonathan Haddad

Your example only really applies if someone is using a 20 node cluster at
RF=1, something I've never seen, but I'm sure exists somewhere.
Realistically, RF=3 using racks (or AWS regions) and 21 nodes, means you'll
have 3 racks with 7 nodes per rack.  Adding a single node is an unlikely
operation, you'd probably add 3, one in each region / rack.  In that case
you'd distribute the load from 7 to 8 nodes, a 14% improvement, and for
that number it would help every node even with only 4 tokens.

Generally speaking I would expand cluster capacity by a percentage (say
15-20%), so you'd be looking at adding a handful of nodes to each rack,
which will still be beneficial when using only 4 tokens.

On Sat, Sep 8, 2018 at 1:00 PM Jeff Jirsa  wrote:

> Virtual nodes accomplish two primary goals
>
> 1) it makes it easier to gradually add/remove capacity to your cluster by
> distributing the new host capacity around the ring in smaller increments
>
> 2) it increases the number of sources for streaming, which speeds up
> bootstrap and decommission
>
> Whether or not either of these actually is true depends on a number of
> factors, like your cluster size (for #1) and your replication factor (for
> #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll
> probably add a neighbor near each existing host (#1) and stream from every
> other host (#2), so that’s great. If you have 20 hosts and add a new host
> with 4 tokens, most of your existing ranges won’t change at all - you’re
> nominally adding 5% of your cluster capacity but you won’t see a 5%
> improvement because you don’t have enough tokens to move 5% of your ranges.
> If you had 32 tokens, you’d probably actually see that 5% improvement,
> because you’d likely add a new range near each of the existing ranges.
>
> Going down to 1 token would mean you’d probably need to manually move
> tokens after each bootstrap to rebalance, which is fine, it just takes more
> operator awareness.
>
> I don’t know how DSE calculates which replication factor to use for their
> token allocation logic, maybe they guess or take the highest or something.
> Cassandra doesn’t - we require you to be explicit, but we could probably do
> better here.
>
>
>
> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad,  wrote:
>
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I
>> recommend folks use 4 tokens for new clusters,
>>
>
> I wonder why not setting it to all the way down to 1 then? What's the key
> difference once you have so few vnodes?
>
> with some caveats.
>>
>
> And those are?
>
> When you fire up a cluster, there's no way to make the initial tokens be
>> distributed evenly, you'll get random ones.  You'll want to set them
>> explicitly using:
>>
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>>
>>
>> After you fire up the first seed, create a keyspace using RF=3 (or
>> whatever you're planning on using) and set allocate_tokens_for_keyspace to
>> that keyspace in your config, and join the rest of the nodes.  That gives
>> even distribution.
>>
>
> Do you possibly know if the DSE-style option which doesn't require a
> keyspace to be there also works to allocate evenly distributed tokens for
> the very first seed node?
>
> Thanks,
> --
> Alex
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-09 Thread Jonathan Haddad

I'll be honest, I'm having a hard time wrapping my head around an
architecture where you use CDC to push data into Kafka.  I've worked on
plenty of systems that use Kafka as a means of communication, and one of
the consumers is a process that stores data in Cassandra.  That's pretty
normal.  Sending Cassandra mutations to Kafka, on the other hand, feels
backwards and for 99% of teams, more work than it's worth.

There may be some use cases for it.. but I'm not sure what they are.  It
might help if you shared the use cases where the extra complexity is
required?  When does writing to Cassandra which then dedupes and writes to
Kafka a preferred design then using Kafka and simply writing to Cassandra?

If the answer is "because it's fun to solve hard problems" that's OK too!

Jon


On Thu, Sep 6, 2018 at 1:53 PM Joy Gao  wrote:

> Hi all,
>
> We are fairly new to Cassandra. We began looking into the CDC feature
> introduced in 3.0. As we spent more time looking into it, the complexity
> began to add up (i.e. duplicated mutation based on RF, out of order
> mutation, mutation does not contain full row of data, etc). These
> limitations have already been mentioned in the discussion thread in
> CASSANDRA-8844, so we understand the design decisions around this. However,
> we do not want to push solving this complexity to every downstream
> consumers, where they each have to handle
> deduping/ordering/read-before-write to get full row; instead we want to
> solve them earlier in the pipeline, so the change message are
> deduped/ordered/complete by the time they arrive in Kafka. Dedupe can be
> solved with a cache, and ordering can be solved since mutations have
> timestamps, but the one we have the most trouble with is not having the
> full row of data.
>
> We had a couple discussions with some folks in other companies who are
> working on applying CDC feature for their real-time data pipelines. On a
> high-level, the common feedback we gathered is to use a stateful processing
> approach to maintain a separate db which mutations are applied to, which
> then allows them to construct the "before" and "after" data without having
> to query the original Cassandra db on each mutation. The downside of this
> is the operational overhead of having to maintain this intermediary db for
> CDC.
>
> We have an unconventional idea (inspired by DSE Advanced Replication) that
> eliminates some of the operational overhead, but with tradeoff of
> increasing code complexity and memory pressure. The high level idea is a
> stateless processing approach where we have a process in each C* node that
> parse mutation from CDC logs and query local node to get the "after" data,
> which avoid network hops and thus making reading full-row of data more
> efficient. We essentially treat the mutations in CDC log as change
> notifications. To solve dedupe/ordering, only the primary node for each
> token range will send the data to Kafka, but data are reconciled with peer
> nodes to prevent data loss.
>
> We have a* WIP design doc
> * that goes
> over this idea in details.
>
> We haven't sort out all the edge cases yet, but would love to get some
> feedback from the community on the general feasibility of this approach.
> Any ideas/concerns/questions would be helpful to us. Thanks!
>
> Joy
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad

> I wonder why not setting it to all the way down to 1 then? What's the key
difference once you have so few vnodes?

4 tokens lets you have balanced clusters when they're small and imposes
very little overhead when they get big.  Using multiple tokens let's
multiple nodes stream data to the new node, which helps keeps bootstrap
times down.

> And those are?

That using 4 tokens by itself isn't enough, you need to start the cluster
out in a manner that distributes the initial tokens, using the technique I
listed.

> Do you possibly know if the DSE-style option which doesn't require a
keyspace to be there also works to allocate evenly distributed tokens for
the very first seed node?

I have no idea what DSE does, sorry.  I don't have access to the source.
Sounds like a question for their slack channel.

On Sat, Sep 8, 2018 at 11:17 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad,  wrote:
>
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I
>> recommend folks use 4 tokens for new clusters,
>>
>
> I wonder why not setting it to all the way down to 1 then? What's the key
> difference once you have so few vnodes?
>
> with some caveats.
>>
>
> And those are?
>
> When you fire up a cluster, there's no way to make the initial tokens be
>> distributed evenly, you'll get random ones.  You'll want to set them
>> explicitly using:
>>
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>>
>>
>> After you fire up the first seed, create a keyspace using RF=3 (or
>> whatever you're planning on using) and set allocate_tokens_for_keyspace to
>> that keyspace in your config, and join the rest of the nodes.  That gives
>> even distribution.
>>
>
> Do you possibly know if the DSE-style option which doesn't require a
> keyspace to be there also works to allocate evenly distributed tokens for
> the very first seed node?
>
> Thanks,
> --
> Alex
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad

Keep using whatever settings you've been using.  I'd still use allocate
tokens for keyspace but it probably won't make much of a difference with
256 tokens.

On Sat, Sep 8, 2018 at 10:40 AM onmstester onmstester 
wrote:

> Thanks Jon,
> But i never concerned about num_tokens config before, because no official
> cluster setup documents (on datastax:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html
> or other blogs) warned us-beginners to be concerned about it.
> I always setup my clusters with nodes having same hardware spec
> (homogeneous) and num_tokens = 256, and data seems to be evenly
> distributed, at least nodetool status report it that way + killing any
> node, i still got all of my data and application was working, So i assumed
> data perfectly and evenly distributed among nodes.
> So could you please explain more why should i run that python command and
> config allocate_tokens_for_keyspace? i only have one keyspace per cluster.
> Im using Network replication strategy, and a rack-aware topology config.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  On Sat, 08 Sep 2018 17:17:10 +0430 *Jonathan Haddad
> >* wrote 
>
> 256 tokens is a pretty terrible default setting especially post 3.0.  I
> recommend folks use 4 tokens for new clusters, with some caveats.
>
> When you fire up a cluster, there's no way to make the initial tokens be
> distributed evenly, you'll get random ones.  You'll want to set them
> explicitly using:
>
> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>
>
> After you fire up the first seed, create a keyspace using RF=3 (or
> whatever you're planning on using) and set allocate_tokens_for_keyspace to
> that keyspace in your config, and join the rest of the nodes.  That gives
> even distribution.
>
> On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester 
> wrote:
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad

256 tokens is a pretty terrible default setting especially post 3.0.  I
recommend folks use 4 tokens for new clusters, with some caveats.

When you fire up a cluster, there's no way to make the initial tokens be
distributed evenly, you'll get random ones.  You'll want to set them
explicitly using:

python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'

After you fire up the first seed, create a keyspace using RF=3 (or whatever
you're planning on using) and set allocate_tokens_for_keyspace to that
keyspace in your config, and join the rest of the nodes.  That gives even
distribution.

On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester 
wrote:

> Why not setting default vnodes count to that recommendation in Cassandra
> installation files?
>
> Sent using Zoho Mail 
>
>
>  On Tue, 04 Sep 2018 17:35:54 +0430 *Durity, Sean R
> >* wrote 
>
>
>
> Longer term, I agree with Oleksandr, the recommendation for number of
> vnodes is now much smaller than 256. I am using 8 or 16.
>
>
>
>
>
> Sean Durity
>
>
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: JBOD disk failure - just say no

2018-08-22 Thread Jonathan Haddad

We recently helped a team deal with some JBOD issues, they can be quite
painful, and the experience depends a bit on the C* version in use.  We
wrote a blog post about it (published today):

http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html

Hope this helps.

Jon

On Mon, Aug 20, 2018 at 5:49 PM James Briggs 
wrote:

> Cassandra JBOD has a bunch of issues, so I don't recommend it for
> production:
>
> 1) disks fill up with load (data) unevenly, meaning you can run out on a
> disk while some are half-full
> 2) one bad disk can take out the whole node
> 3) instead of a small failure probability on an LVM/RAID volume, with JBOD
> you end up near 100% chance of failure after 3 years or so.
> 4) generally you will not have enough warning of a looming failure with
> JBOD compared to LVM/RAID. (Some
> companies take a week or two to replace a failed disk.)
>
> JBOD is easy to setup, but hard to manage.
>
> Thanks, James.
>
>
>
> --
> *From:* kurt greaves 
> *To:* User 
> *Sent:* Friday, August 17, 2018 5:42 AM
> *Subject:* Re: JBOD disk failure
>
> As far as I'm aware, yes. I recall hearing someone mention tying system
> tables to a particular disk but at the moment that doesn't exist.
>
> On Fri., 17 Aug. 2018, 01:04 Eric Evans, 
> wrote:
>
> On Wed, Aug 15, 2018 at 3:23 AM kurt greaves  wrote:
> > Yep. It might require a full node replace depending on what data is lost
> from the system tables. In some cases you might be able to recover from
> partially lost system info, but it's not a sure thing.
>
> Ugh, does it really just boil down to what part of `system` happens to
> be on the disk in question?  In my mind, that makes the only sane
> operational procedure for a failed disk to be: "replace the entire
> node".  IOW, I don't think we can realistically claim you can survive
> a failed a JBOD device if it relies on happenstance.
>
> > On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <
> christian.lor...@webtrekk.com > wrote:
> >>
> >> Thank you for the answers. We are using the current version 3.11.3 So
> this one includes CASSANDRA-6696.
> >>
> >> So if I get this right, losing system tables will need a full node
> rebuild. Otherwise repair will get the node consistent again.
> >
> > [ ... ]
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
> -- -- -
> To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Java 11 support in Cassandra 4.0 + Early Testing and Feedback

2018-08-16 Thread Jonathan Haddad

Hey folks,

As we start to get ready to feature freeze trunk for 4.0, it's going to be
important to get a lot of community feedback.  This is going to be a big
release for a number of reasons.

* Virtual tables.  Finally a nice way of querying for system metrics &
status
* Streaming optimizations (
https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html
)
* Groundwork for strongly consistent schema changes
* Optimizations to internode communcation
* Experimental support for Java 11

I (and many others) would like Cassandra to be rock solid on day one if its
release.  The best way to ensure that happens is if people provide
feedback.  One of the areas we're going to need a lot of feedback on is on
how things work with Java 11, especially if you have a way of simulating a
real world workload on a staging cluster.  I've written up instructions
here on how to start testing:
http://thelastpickle.com/blog/2018/08/16/java11.html

Java 11 hasn't been released yet, but that doesn't mean it's not a good
time to test.  Any bugs we can identify now will help us get to a stable
release faster.  If you rely on Cassandra for your business, please take
some time to participate in the spirit of OSS by helping test & provide
feedback to the team.

Thanks everyone!
---
Jon Haddad
Principal Consultant, The Last Pickle

Re: Compression Tuning Tutorial

2018-08-09 Thread Jonathan Haddad

There's a discussion about direct I/O here you might find interesting:
https://issues.apache.org/jira/browse/CASSANDRA-14466

I suspect the main reason is that O_DIRECT wasn't added till Java 10, and
while it could be used with some workarounds, there's a lot of entropy
around changing something like this.  It's not a trivial task to do it
right, and mixing has some really nasty issues.

At least it means there's lots of room for improvement though :)


On Thu, Aug 9, 2018 at 5:36 AM Kyrylo Lebediev
 wrote:

> Thank you Jon, great article as usually!
>
>
> One topic that was discussed in the article is filesystem cache which is
> traditionally leveraged for data caching in Cassandra (with row-caching
> disabled by default).
>
> IIRC mmap() is used.
>
> Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + maintain
> own, not kernel-managed, DB Cache thus improving overall performance.
>
> As Cassandra is designed to be a DB with low response time, this approach
> with DIO/AIO/DB Cache seems to be a really useful feature.
>
> Just out of curiosity, are there reasons why this advanced IO stack wasn't
> implemented, except lack of resources to do this?
>
>
> Regards,
>
> Kyrill
> --
> *From:* Eric Plowe 
> *Sent:* Wednesday, August 8, 2018 9:39:44 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Compression Tuning Tutorial
>
> Great post, Jonathan! Thank you very much.
>
> ~Eric
>
> On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad  wrote:
>
> Hey folks,
>
> We've noticed a lot over the years that people create tables usually
> leaving the default compression parameters, and have spent a lot of time
> helping teams figure out the right settings for their cluster based on
> their workload.  I finally managed to write some thoughts down along with a
> high level breakdown of how the internals function that should help people
> pick better settings for their cluster.
>
> This post focuses on a mixed 50:50 read:write workload, but the same
> conclusions are drawn from a read heavy workload.  Hopefully this helps
> some folks get better performance / save some money on hardware!
>
> http://thelastpickle.com/blog/2018/08/08/compression_performance.html
>
>
> --
> Jon Haddad
> Principal Consultant, The Last Pickle
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Compression Tuning Tutorial

2018-08-08 Thread Jonathan Haddad

Hey folks,

We've noticed a lot over the years that people create tables usually
leaving the default compression parameters, and have spent a lot of time
helping teams figure out the right settings for their cluster based on
their workload.  I finally managed to write some thoughts down along with a
high level breakdown of how the internals function that should help people
pick better settings for their cluster.

This post focuses on a mixed 50:50 read:write workload, but the same
conclusions are drawn from a read heavy workload.  Hopefully this helps
some folks get better performance / save some money on hardware!

http://thelastpickle.com/blog/2018/08/08/compression_performance.html

-- 
Jon Haddad
Principal Consultant, The Last Pickle

Re: TWCS Compaction backed up

2018-08-07 Thread Jonathan Haddad

What's your window size?

When you say backed up, how are you measuring that?  Are there pending
tasks or do you just see more files than you expect?

On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
wrote:

> Hey guys, quick question:
>
> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
> one drive, data on nvme.  That was working very well, it's a ts db and has
> been accumulating data for about 4weeks.
>
> The nodes have increased in load and compaction seems to be falling
> behind.  I used to get about 1 file per day for this column family, about
> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
> 50mb.
>
> How to recover from this?
>
> I can scale out to give some breathing room but will it go back and
> compact the old days into nicely packed files for the day?
>
> I tried setting compaction throughput to 1000 from 256 and it seemed to
> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
> threads.
>
> -B
>
> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to
> get rid of old tombstones, however running repairs in 2.1 on TWCS column
> families causes a very large spike in sstable counts due to anti-compaction
> which causes a lot of disruption, is there any other way?
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jonathan Haddad

By default Cassandra is set to generate a heap dump on OOM. It can be a bit
tricky to figure out what’s going on exactly but it’s the best evidence you
can work with.

On Tue, Aug 7, 2018 at 6:30 AM Laszlo Szabo 
wrote:

> Hi,
>
> Thanks for the fast response!
>
> We are not using any materialized views, but there are several indexes.  I
> don't have a recent heap dump, and it will be about 24 before I can
> generate an interesting one, but most of the memory was allocated to byte
> buffers, so not entirely helpful.
>
> nodetool cfstats is also below.
>
> I also see a lot of flushing happening, but it seems like there are too
> many small allocations to be effective.  Here are the messages I see,
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.014KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,460
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,461 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.011KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.024KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.013KiB (0%)
>> off-heap
>
>
>>
>
> Stack traces from errors are below.
>
>
>> java.io.IOException: Broken pipe
>
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>> ~[na:1.8.0_181]
>
> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
>
> ERROR [MutationStage-226] 2018-08-06 07:16:08,236
>> JVMStabilityInspector.java:142 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_181]
>
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>> ~[na:1.8.0_181]
>
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>> ~[na:1.8.0_181]
>
> at
>>

Re: Apache Cassandra 3.11.3 Question

2018-08-04 Thread Jonathan Haddad

This strategy is a lot more work than just replacing nodes one at a time.
For a large cluster it would be months of work instead of a couple days.

On Sat, Aug 4, 2018 at 7:04 AM R1 J1  wrote:

> Can a cluster having 3.11.0 node(s) accept a 3.11.3 node as a new node
> for  eventual migration  and  decom of  older 3.11.0 nodes ?
>
> Can a cluster having  3.11.2(s) node accept a 3.11.3 node as a new node
> for   eventual migration and decom of  older 3.11.2 nodes ?
>
>
> Regards
> R1J1
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Secure data

2018-08-01 Thread Jonathan Haddad

Ben has a good point here.  There's an advantage to encrypting in the
application, you can encrypt data per-account / user / [some other thing].
It's possible to revoke all access to all the data for a particular
[whatever] by simply deleting the encryption key.

Lots of options available.

On Wed, Aug 1, 2018 at 4:39 PM Ben Slater 
wrote:

> My recommendation is generally to look at encrypting in your application
> as it’s likely to be overall more secure than DB-level encryption anyway
> (generally the closer to the user you encrypt the better). I wrote a blog
> on this last year:
> https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/
>
> We also use encrypted GP2 EBS pretty widely without issue.
>
> Cheers
> Ben
>
> On Thu, 2 Aug 2018 at 05:38 Jonathan Haddad  wrote:
>
>> You can also get full disk encryption with LUKS, which I've used before.
>>
>> On Wed, Aug 1, 2018 at 12:36 PM Jeff Jirsa  wrote:
>>
>>> EBS encryption worked well on gp2 volumes (never tried it on any others)
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 1, 2018, at 7:57 AM, Rahul Reddy 
>>> wrote:
>>>
>>> Hello,
>>>
>>> Any one tried aws ec2 volume encryption for Cassandra instances?
>>>
>>> On Tue, Jul 31, 2018, 12:25 PM Rahul Reddy 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to find a good document on to enable encryption for Apache
>>>> Cassandra  (not on dse) tables and commilogs and store the keystore in kms
>>>> or vault. If any of you already configured please direct me to
>>>> documentation for it.
>>>>
>>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
> --
>
>
> *Ben Slater*
>
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Secure data

2018-08-01 Thread Jonathan Haddad

You can also get full disk encryption with LUKS, which I've used before.

On Wed, Aug 1, 2018 at 12:36 PM Jeff Jirsa  wrote:

> EBS encryption worked well on gp2 volumes (never tried it on any others)
>
> --
> Jeff Jirsa
>
>
> On Aug 1, 2018, at 7:57 AM, Rahul Reddy  wrote:
>
> Hello,
>
> Any one tried aws ec2 volume encryption for Cassandra instances?
>
> On Tue, Jul 31, 2018, 12:25 PM Rahul Reddy 
> wrote:
>
>> Hello,
>>
>> I'm trying to find a good document on to enable encryption for Apache
>> Cassandra  (not on dse) tables and commilogs and store the keystore in kms
>> or vault. If any of you already configured please direct me to
>> documentation for it.
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Reaper 1.2 released

2018-07-24 Thread Jonathan Haddad

Hey folks,

Just wanted to share with the list that after a bit of a long wait, we've
released Reaper 1.2.  We have a short blog post here outlining the new
features: https://twitter.com/TheLastPickle/status/1021830663605870592

With each release we've worked on performance improvements and stability as
our primary focus.  We're helping quite a few teams manage repair on
hundreds or thousands of nodes with Reaper so first and foremost we need it
to work well at that sort of scale :)

We also recognize the need for features other than repair, which is why
we've added support for taking & listing cluster wide snapshots.

Looking forward, we're planning on adding support for more operations and
reporting - we're already got some code to pull thread pool stats and we'd
like to expose a lot of table level information as well.

http://cassandra-reaper.io/

Looking forward to hearing your feedback!
-- 
Jon Haddad
Principal Consultant, The Last Pickle

Re: Timeout for only one keyspace in cluster

2018-07-23 Thread Jonathan Haddad

You don’t get this guarantee with counters.  Do not use them for unique
values. Use a UUID instead.

On Mon, Jul 23, 2018 at 9:11 AM learner dba 
wrote:

> James,
>
> Yes, counter is implemented due to valid reasons. We need this value
> column to  have unique values being used at the time of registering new
> devices.
> On Monday, July 23, 2018, 10:07:54 AM CDT, James Shaw 
> wrote:
>
>
> does your application really need counter ?  just an option.
>
> Thanks,
>
> James
> On Mon, Jul 23, 2018 at 10:57 AM, learner dba  invalid > wrote:
>
> Thanks a lot Ben. This makes sense but feel bad that we don't have a
> solution yet. We can try consistency level one but that will be against
> general rule for having local_quorum for production. Also, consistency ONE
> will not guarantee 0 race condition.
>
> Is there any better solution?
>
> On Saturday, July 21, 2018, 8:27:57 PM CDT, Ben Slater <
> ben.sla...@instaclustr.com> wrote:
>
>
> Note that that writetimeout exception can be C*s way of telling you when
> there is contention on a LWT (rather than actually timing out). See 
> https://issues.apache.org/
> jira/browse/CASSANDRA-9328
> 
>
> Cheers
> Ben
>
> On Sun, 22 Jul 2018 at 11:20 Goutham reddy 
> wrote:
>
> Hi,
> As it is a single partition key, try to update the key with only partition
> key instead of passing other columns. And try to set consistency level ONE.
>
> Cheers,
> Goutham.
>
> On Fri, Jul 20, 2018 at 6:57 AM learner dba  d> wrote:
>
> Anybody has any ideas about this? This is happening in production and we
> really need to fix it.
>
> On Thursday, July 19, 2018, 10:41:59 AM CDT, learner dba <
> cassandra...@yahoo.com.INVALI D> wrote:
>
>
> Our foreignid is unique idetifier and we did check for wide partitions;
> cfhistorgrams show all partitions are evenly sized:
>
> Percentile  SSTables Write Latency  Read LatencyPartition Size
>   Cell Count
>
>   (micros)  (micros)   (bytes)
>
>
> 50% 0.00 29.52  0.00  1916
>   12
>
> 75% 0.00 42.51  0.00  2299
>   12
>
> 95% 0.00 61.21  0.00  2759
>   14
>
> 98% 0.00 73.46  0.00  2759
>   17
>
> 99% 0.00 88.15  0.00  2759
>   17
>
> Min 0.00  9.89  0.00   150
> 2
>
> Max 0.00 88.15  0.00   7007506
> 42510
> any thing else that we can check?
>
> On Wednesday, July 18, 2018, 10:44:29 PM CDT, wxn...@zjqunshuo.com <
> wxn...@zjqunshuo.com> wrote:
>
>
> Your partition key is foreignid. You may have a large partition. Why not
> use foreignid+timebucket as partition key?
>
> *From:* learner dba 
> *Date:* 2018-07-19 01:48
> *To:* User cassandra.apache.org 
> *Subject:* Timeout for only one keyspace in cluster
>
> Hi,
>
> We have a cluster with multiple keyspaces. All queries are performing good
> but write operation on few tables in one specific keyspace gets write
> timeout. Table has counter column and counter update query times out
> always. Any idea?
>
> CREATE TABLE x.y (
>
> foreignid uuid,
>
> timebucket text,
>
> key text,
>
> timevalue int,
>
> value counter,
>
> PRIMARY KEY (foreignid, timebucket, key, timevalue)
>
> ) WITH CLUSTERING ORDER BY (timebucket ASC, key ASC, timevalue ASC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compa
> ction.SizeTieredCompactionStra tegy', 'max_threshold': '32',
> 'min_threshold': '4'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compr ess.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
> Query and Error:
>
> UPDATE x.y SET value = value + 1 where foreignid = ? AND timebucket = ? AND 
> key = ? AND timevalue = ?, err = {s:\"gocql: no response 
> received from cassandra within timeout period
>
>
> I verified CL=local_serial
>
> We had been working on this issue for many days; any help will be much 
> appreciated.
>
>
>
> --
> Regards
> Goutham Reddy
>
> --
>
>
> *Ben Slater*
>
> *Chief Product Officer *
>
>    
>

Re: Incremental Backup Hardlinks

2018-07-19 Thread Jonathan Haddad

The hard links are created after the SSTables have finished writing.



On Thu, Jul 19, 2018 at 9:51 AM David Payne  wrote:

> Hello Cassandra Experts and Committers,
>
>
>
> Hopefully this is just a dumb question, but without the skill set to read
> the source code, I must ask.
>
>
>
> Consider incremental backups are enabled on 2.x or 3.x. As memtables flush
> to sstables on disk, hardlinks are created in backups for the column
> family. My question is about the sequencing of flush IO and the creation of
> these hardlinks. Is the file IO to the flushed sstables completed before
> hardlinks are created? In other words, if I see a hardlink, then can I know
> for sure that the hardlink is safe to copy?
>
> I suppose I could figure this out by using lsof, iostat, and pidstat, but
> I’d like to avoid this, if possible.
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Cassandra Client Program not Working with NettySSLOptions

2018-06-19 Thread Jonathan Haddad

Is the server configured to use encryption?

On Tue, Jun 19, 2018 at 3:59 AM Jahar Tyagi  wrote:

> Hi,
>
> I referred to this link
> https://docs.datastax.com/en/developer/java-driver/3.0/manual/ssl/
>   to
> implement a simple Cassandra client using datastax driver 3.0.0 on SSL with
> OpenSSL options but unable to run it.
>
> Getting generic exception as " 
> *com.datastax.driver.core.exceptions.NoHostAvailableException"
> *at line
> mySession = myCluster.connect();
>
> *Code snippet to setup cluster connection is below.*
>
> public void connectToCluster()
> {
> String[] theCassandraHosts = {"myip"};
> myCluster =
>
> Cluster.builder().withSSL(getSSLOption()).withReconnectionPolicy(new
> ConstantReconnectionPolicy(2000)).addContactPoints(theCassandraHosts).withPort(10742)
> .withCredentials("username",
> "password").withLoadBalancingPolicy(DCAwareRoundRobinPolicy.builder().build())
> .withSocketOptions(new
> SocketOptions().setConnectTimeoutMillis(800).setKeepAlive(true)).build();
> try {
> mySession = myCluster.connect();
> }
> catch(Exception e) {
> e.printStackTrace();
> }
> System.out.println("Session Established");
> }
>
>
>  private SSLOptions getSSLOption()
> {
> InputStream trustStore = null;
> try
> {
> String theTrustStorePath =
> "/var/opt/SecureInterface/myTrustStore.jks";
> String theTrustStorePassword = "mypassword";
> List theCipherSuites = new ArrayList();
> theCipherSuites.add("TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384");
> KeyStore ks = KeyStore.getInstance("JKS");
> *trustStore = new FileInputStream(theTrustStorePath);*
> ks.load(trustStore, theTrustStorePassword.toCharArray());
> TrustManagerFactory tmf =
> TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
> tmf.init(ks);
> SslContextBuilder builder =
> SslContextBuilder.forClient()
> .sslProvider(SslProvider.OPENSSL)
> .trustManager(tmf)
> .ciphers(theCipherSuites)
> // only if you use client authentication
> .keyManager(new
> File("/var/opt/SecureInterface/keystore/Cass.crt"),
> new
> File("/var/opt/vs/SecureInterface/keystore/Cass_enc.key"));
> SSLOptions sslOptions = new NettySSLOptions(builder.build());
> return sslOptions;
> }
> catch (Exception e)
> {
> e.printStackTrace();
> }
> finally
> {
> try
> {
> trustStore.close();
> }
> catch (IOException e)
> {
> e.printStackTrace();
> }
> }
> return null;
> }
>
> Cassandra server is running fine with client and server encryption
> options. Moreover  I am able to run my client using JdkSSLOptions but have
> problem with NettySSLOptions.
>
> Has anyone implemented the  NettySSLOptions for Cassandra client
> application?
>
>
> Regards,
> Jahar Tyagi
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Compaction strategy for update heavy workload

2018-06-13 Thread Jonathan Haddad

I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever.  If you use really large buckets, what's the point of TWCS?

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.
On Wed, Jun 13, 2018 at 3:34 PM kurt greaves  wrote:
>
> TWCS is probably still worth trying. If you mean updating old rows in TWCS 
> "out of order updates" will only really mean you'll hit more SSTables on 
> read. This might add a bit of complexity in your client if your bucketing 
> partitions (not strictly necessary), but that's about it. As long as you're 
> not specifying "USING TIMESTAMP" you still get the main benefit of efficient 
> dropping of SSTables - C* only cares about the write timestamp of the data in 
> regards to TTL's, not timestamps stored in your partition/clustering key.
> Also keep in mind that you can specify the window size in TWCS, so if you can 
> increase it enough to cover the "out of order" updates then that will also 
> solve the problem w.r.t old buckets.
>
> In regards to LCS, the only way to really know if it'll be too much 
> compaction overhead is to test it, but for the most part you should consider 
> your read/write ratio, rather than the total number of reads/writes (unless 
> it's so small that it's irrelevant, which it may well be).
>
> On 13 June 2018 at 19:25, manuj singh  wrote:
>>
>> Hi all,
>> I am trying to determine compaction strategy for our use case.
>> In our use case we will have updates on a row a few times. And we have a ttl 
>> also defined on the table level.
>> Our typical workload is less then 1000 writes + reads per second. At the max 
>> it could go up to 2500 per second.
>> We use SSD and have around 64 gb of ram on each node. Our cluster size is 
>> around 70 nodes.
>>
>> I looked at time series but we cant guarantee that the updates will happen 
>> within a give time window. And if we have out of order updates it might 
>> impact on when we remove that data from the disk.
>>
>> So i was looking at level tiered, which supposedly is good when you have 
>> updates. However its io bound and will affect the writes. everywhere i read 
>> it says its not good for write heavy workload.
>> But Looking at our write velocity, is it really write heavy ?
>>
>> I guess what i am trying to find out is will level tiered compaction will 
>> impact the writes in our use case or it will be fine given our write rate is 
>> not that much.
>> Also is there anything else i should keep in mind while deciding on the 
>> compaction strategy.
>>
>> Thanks!!
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Mongo DB vs Cassandra

2018-05-31 Thread Jonathan Haddad

I haven’t seen any query requirements, which is going to be the thing that
makes Cassandra difficult.

If you can’t define your queries beforehand, cassandra is a no go. If you
just want to store data somewhere, and it’s just CSV, I’d go with a simple
blob store like s3 and pick a DB later when you understand the problem
better.

On Thu, May 31, 2018 at 9:06 AM daemeon reiydelle 
wrote:

> If you are starting with a modest amount of data (e.g. under .25 PB) and
> do not have extremely high availability requirements, then it is easier to
> start with MongoDB, avoiding HA clusters. I would suggest you start with
> MongoDB. Both are great, but C* scales far beyond MongoDB FOR A GIVEN LEVEL
> OF DBA ADMIN AND CONFIG.
>
>
> <==>
> "When I finish a project for a client, I have ... learned their issues
> with life,
> their personal secrets, I have come to care about them.
> Once the project is over, I lose them as if I lost family.
> For the client, however, they’ve just dismissed a service worker." ...
> "Thought on the Gig Economy" by Francine Brevetti
>
>
> *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198/London 44 020 8144
> 9872/Skype daemeon.c.m.reiydelle*
>
>
> On Thu, May 31, 2018 at 4:49 AM, Sudhakar Ganesan <
> sudhakar.gane...@flex.com.invalid> wrote:
>
>> Team,
>>
>>
>>
>> I need to make a decision on Mongo DB vs Cassandra for loading the csv
>> file data and store csv file as well. If any of you did such study in last
>> couple of months, please share your analysis or observations.
>>
>>
>>
>> Regards,
>>
>> Sudhakar
>> Legal Disclaimer :
>> The information contained in this message may be privileged and
>> confidential.
>> It is intended to be read only by the individual or entity to whom it is
>> addressed
>> or by their designee. If the reader of this message is not the intended
>> recipient,
>> you are on notice that any distribution of this message, in any form,
>> is strictly prohibited. If you have received this message in error,
>> please immediately notify the sender and delete or destroy any copy of
>> this message!
>>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Time Series schema performance

2018-05-29 Thread Jonathan Haddad

I wrote a post on this topic a while ago, might be worth reading over:
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
On Tue, May 29, 2018 at 8:02 AM Jeff Jirsa  wrote:

> There’s a third option which is doing bucketing by time instead of by
hash, which tends to perform quite well if you’re using TWCS as it makes it
quite likely that a read can be served by a single sstable

> --
> Jeff Jirsa


> On May 29, 2018, at 6:49 AM, sujeet jog  wrote:

> Folks,
> I have two alternatives for the time series schema i have, and wanted to
weigh of on one of the schema .

> The query is given id, & timestamp, read the metrics associated with the
id

> The records are inserted every 5 mins, and the number of id's = 2
million,
> so at every 5mins  it will be 2 million records that will be written.

> Bucket Range  : 0 - 5K.

> Schema 1 )

> create table (
> id timeuuid,
> bucketid Int,
> date date,
> timestamp timestamp,
> metricName1   BigInt,
> metricName2 BigInt.
> ...
> .
> metricName300 BigInt,

> Primary Key (( day, bucketid ) ,  id, timestamp)
> )

> BucketId is just a murmur3 hash of the id  which acts as a splitter to
group id's in a partition


> Pros : -

> Efficient write performance, since data is written to minimal partitions

> Cons : -

> While the first schema works best when queried programmatically, but is a
bit inflexible If it has to be integrated with 3rd party BI tools like
tableau, bucket-id cannot be generated from tableau as it's not part of the
view etc..


> Schema 2 )
> Same as above, without bucketid &  date.

> Primary Key (id, timestamp )

> Pros : -

> BI tools don't need to generate bucket id lookups,

> Cons :-
> Too many partitions are written every 5 mins,  say 2 million records
written in distinct 2 million partitions.



> I believe writing this data to commit log is same in case of Schema 1 &
Schema 2 ) , but the actual performance bottleneck could be compaction,
since the data from memtable is transformed to ssTables often based on the
memory settings, and
> the header for every SSTable would maintain partitionIndex with
  byteoffsets,

>   wanted to guage how bad can the performance of Schema-2 go with respect
to Write/Compaction having to do many diskseeks.

> compacting many tables but with too many partitionIndex entries because
of the high number of parititions ,  can this be a bottleneck ?..

> Any indept performance explanation of Schema-2 would be very much helpful


> Thanks,




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: cassandra update vs insert + delete

2018-05-27 Thread Jonathan Haddad

What is a “soft delete”?

My 2 cents, if you want to update some information just update it. There’s
no need to overthink it.

Batches are good if they’re constrained to a single partition, not so hot
otherwise.


On Sun, May 27, 2018 at 8:19 AM Rahul Singh 
wrote:

> Deletes create tombstones — not really something to consider. Better to
> add / update or insert data and do a soft delete on old data and apply a
> TTL to remove it at a future time.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On May 27, 2018, 5:36 AM -0400, onmstester onmstester ,
> wrote:
>
> Hi
> I want to load all rows from many partitions and change a column value in
> each row, which of following ways is better concerning disk space and
> performance?
> 1. create a update statement for every row and batch update for each
> partitions
> 2. create an insert statement for every row and batch insert for each
> partition, then run a single statement to delete the whole old partition
>
> Thanks in advance
>
> Sent using Zoho Mail 
>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Reading from big partitions

2018-05-19 Thread Jonathan Haddad

What disks are you using? How many sstables are you hitting? Did you try
tracing the request?

On Sat, May 19, 2018 at 8:43 PM onmstester onmstester 
wrote:

> Hi,
> Due to some unpredictable behavior in input data i end up with some
> hundred partitions having more than 300MB size. Reading any sequence of data
> from these partitions took about 5 seconds while reading from other
> partitions (with less than 50MB sizes) took less than 10ms.
> Since i can't change the data model in sake of a few problematic
> partitions, Is there any tuning at Cassandra side that could boost up read
> performance from the big partitions?
> Thanks in advance
>
> Sent using Zoho Mail 
>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Solve Busy pool at Cassandra side

2018-05-13 Thread Jonathan Haddad

This error comes from com.datastax.driver.core.HostConnectionPool#enqueue,
which is the client side pool.  Cassandra can handle more requests, the
application needs to be fixed.

As per the java docs:

/**
 * Indicates that a connection pool has run out of available connections.
 * 
 * This happens if the pool has no connections (for example if it's
currently reconnecting to its host), or if all
 * connections have reached their maximum number of in flight queries. The
query will be retried on the next host in the
 * {@link
com.datastax.driver.core.policies.LoadBalancingPolicy#newQueryPlan(String,
Statement) query plan}.
 * 
 * This exception is a symptom that the driver is experiencing a high
workload. If it happens regularly on all hosts,
 * you should consider tuning one (or a combination of) the following
pooling options:
 * 
 * {@link
com.datastax.driver.core.PoolingOptions#setMaxRequestsPerConnection(HostDistance,
int)}: maximum number of
 * requests per connection;
 * {@link
com.datastax.driver.core.PoolingOptions#setMaxConnectionsPerHost(HostDistance,
int)}: maximum number of
 * connections in the pool;
 * {@link
com.datastax.driver.core.PoolingOptions#setMaxQueueSize(int)}: maximum
number of enqueued requests before
 * this exception is thrown.
 * 
 */

Using a third party lib is a bit of a pain, but with a little effort you
can modify the session no matter where it is.  It's possible to access and
modify private variables in the JVM, the Apache Commons library has a
module just for that:
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/FieldUtils.html#readField-java.lang.reflect.Field-java.lang.Object-boolean-

I wrote an article with some detail here:
http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/

If you're not sure where the session is, fire up your program with a
debugger and go look for it.  You can also find it using the Reflections
library (https://github.com/ronmamo/reflections) by
using getSubTypesOf(com.datastax.driver.core.Session).

Once you have the session you can bump up setMaxRequestsPerConnection.

This will allow you to circumvent the current issue, but you might run into
another - the client may be creating async requests without considering the
in flight queries, which isn't very smart.  Ideally it would either use a
Semaphore to limit the number of in flight or simply do SOME_SMALL_NUMBER
of queries at a time and wait for the ResultSetFuture to complete.  This is
in the FAQ of the driver:
https://github.com/datastax/java-driver/blob/3.x/faq/README.md#i-am-encountering-busypoolexception-what-does-this-mean-and-how-do-i-avoid-it

Hope this helps,
Jon

On Sun, May 13, 2018 at 8:29 AM onmstester onmstester 
wrote:

> Hi,
> I'm getting "Pool is Busy (limit is 256)", while connecting to a single
> node cassandra cluster. The whole client side application is a 3rd-party lib
> which i can't change it's source and its session builder is not using any
> PoolingOptions.
> Is there any config on cassandra side that could handle increasing max
> requests per connections from 256 to 32K?
>
> Sent using Zoho Mail 
>
>
>

Re: Switching to TWCS

2018-04-27 Thread Jonathan Haddad

TWCS uses the max timestamp in an sstable to determine what to compact
together, it won't anti-compact your data.  The goal is to minimize I/O.

You'll have to wait for all your mixed-timestamp sstable data to TTL out
before TWCS's windowing kicks in optimally.

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

On Fri, Apr 27, 2018 at 11:23 AM Pranay akula 
wrote:

> Yes the data is TTLed, but I don't think that's the criteria for the TWCS.
> My understanding is the data is divided into buckets based on written
> timestamp.
>
> Thanks
> Pranay
>
> On Fri, Apr 27, 2018, 1:17 PM Nitan Kainth  wrote:
>
>> Is old data TTLed already? If not, then I don't think TWCS will know when
>> to delete data.
>>
>> My understanding about TWCS is, data has to be written with TTL. (Please
>> correct me, if wrong)
>>
>>
>> Regards,
>> Nitan K.
>> Cassandra and Oracle Architect/SME
>> Datastax Certified Cassandra expert
>> Oracle 10g Certified
>>
>> On Fri, Apr 27, 2018 at 1:15 PM, Pranay akula > > wrote:
>>
>>> Hi,
>>>
>>> Testing to switch from sizetiered to Timewindow, did changed compaction
>>> strategy on a table with a buckets of 3 days
>>>
>>> After switching when I checked min and max timestamps on sstables I did
>>> see data older than 3 days range in my case 30-60 days
>>>
>>> So when we switch from sizetired to Timewindow, will the existing data
>>> will be rebucketed ??
>>>
>>> Thanks
>>> Pranay
>>>
>>
>>

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Jonathan Haddad

Your compaction time won't improve immediately simply by adding nodes
because the old data still needs to be cleaned up.

What's your end goal?  Why is having a spike in pending compaction tasks
following a massive write an issue?  Are you seeing a dip in performance,
violating an SLA, or do you just not like it?


On Fri, Apr 27, 2018 at 10:54 AM Mikhail Tsaplin  wrote:

> The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD
> disks), Cassandra 3.0.9.
> Increased compaction throughput from 16 to 200 - active compaction
> remaining time decreased.
> What will happen if another node will join the cluster? - will former
> nodes move part of theirs SSTables to the new node unchanged and compaction
> time will be reduced?
>
>
>
> $ nodetool cfstats -H  dump_es
>
>
> Keyspace: table_b
> Read Count: 0
> Read Latency: NaN ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: table_b
> SSTable count: 18155
> Space used (live): 1.2 TB
> Space used (total): 1.2 TB
> Space used by snapshots (total): 0 bytes
> Off heap memory used (total): 3.62 GB
> SSTable Compression Ratio: 0.20371982719658258
> Number of keys (estimate): 712032622
> Memtable cell count: 0
> Memtable data size: 0 bytes
> Memtable off heap memory used: 0 bytes
> Memtable switch count: 0
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 2.22 GB
> Bloom filter off heap memory used: 2.56 GB
> Index summary off heap memory used: 357.51 MB
> Compression metadata off heap memory used: 724.97 MB
> Compacted partition minimum bytes: 771 bytes
> Compacted partition maximum bytes: 1.55 MB
> Compacted partition mean bytes: 3.47 KB
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
>
> 2018-04-27 22:21 GMT+07:00 Nicolas Guyomar :
>
>> Hi Mikhail,
>>
>> Could you please provide :
>> - your cluster version/topology (number of nodes, cpu, ram available etc)
>> - what kind of underlying storage you are using
>> - cfstat using -H option cause I'm never sure I'm converting bytes=>GB
>>
>> You are storing 1Tb per node, so long running compaction is not really a
>> surprise, you can play with concurrent compaction thread number, compaction
>> throughput to begin with
>>
>>
>> On 27 April 2018 at 16:59, Mikhail Tsaplin  wrote:
>>
>>> Hi,
>>> I have a five nodes C* cluster suffering from a big number of pending
>>> compaction tasks: 1) 571; 2) 91; 3) 367; 4) 22; 5) 232
>>>
>>> Initially, it was holding one big table (table_a). With Spark, I read
>>> that table, extended its data and stored in a second table_b. After this
>>> copying/extending process the number of compaction tasks in the cluster has
>>> grown up. From nodetool cfstats (see output at the bottom): table_a has 20
>>> SSTables and table_b has 18219.
>>>
>>> As I understand table_b has a big SSTables number because data was
>>> transferred from one table to another within a short time and eventually
>>> this tables will be compacted. But now I have to read whole data from this
>>> table_b and send it to Elasticsearch. When Spark reads this table some
>>> Cassandra nodes are dying because of OOM.
>>>
>>> I think that when compaction will be completed - the Spark reading job
>>> will work fine.
>>>
>>> The question is how can I speed up compaction process, what if I will
>>> add another two nodes to cluster - will compaction finish faster? Or data
>>> will be copied to new nodes but compaction will continue on the original
>>> set of SSTables?
>>>
>>>
>>> *Nodetool cfstats output:
>>>
>>> Table: table_a
>>> SSTable count: 20
>>> Space used (live): 1064889308052
>>> Space used (total): 1064889308052
>>> Space used by snapshots (total): 0
>>> Off heap memory used (total): 1118106937
>>> SSTable Compression Ratio: 0.12564594959566894
>>> Number of keys (estimate): 56238959
>>> Memtable cell count: 76824
>>> Memtable data size: 115531402
>>> Memtable off heap memory used: 0
>>>

Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Jonathan Haddad

I can't say for sure, because I haven't measured it, but I've seen a
combination of readahead + large chunk size with compression cause serious
issues with read amplification, although I'm not sure if or how it would
apply here.  Likely depends on the size of your partitions and the
fragmentation of the sstables, although at only 5GB I'm really surprised to
hear 32GB read in, that seems a bit absurd.

Definitely something to dig deeper into.

On Thu, Apr 26, 2018 at 5:02 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> yet another question/issue with repair.
>
>
>
> Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node
> only. A repair (nodetool repair -par) issued on a single node at this data
> volume takes around 36min with an AVG of ~ 15MByte/s disk throughput
> (read+write) for the entire time-frame, thus processing ~ 32GByte from a
> disk perspective so ~ 6 times of the real data volume reported by nodetool
> status. Does this make any sense? This is with 4 compaction threads and
> compaction throughput = 64. Similar results doing this test a few times,
> where most/all inconsistent data should be already sorted out by previous
> runs.
>
>
>
> I know there is e.g. reaper, but the above is a simple use case simply
> after a single failed node recovers beyond the 3h hinted handoff window.
> How should this finish in a timely manner for > 500G on a recovering node?
>
>
>
> I have to admit this is with NFS as storage. I know, NFS might not be the
> best idea, but with the above test at ~ 5GB data volume, we see an IOPS
> rate at ~ 700 at a disk latency of ~ 15ms, thus I wouldn’t treat it as that
> bad. This all is using/running Cassandra on-premise (at the customer, so
> not hosted by us), so while we can make recommendations storage-wise (of
> course preferring local disks), it may and will happen that NFS is being in
> use then.
>
>
>
> Why we are using -par in combination with NFS is a different story and
> related to this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-8743. Without switching
> from sequential to parallel repair, we basically kill Cassandra.
>
>
>
> Throughput-wise, I also don’t think it is related to NFS, cause we see
> similar repair throughput values with AWS EBS (gp2, SSD based) running
> regular repairs on small-sized CFs.
>
>
>
> Thanks for any input.
>
> Thomas
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>

Re: Version Upgrade

2018-04-25 Thread Jonathan Haddad

There's no harm in running it during any upgrade, and I always recommend
doing it just to be in the habit.

My 2 cents.

On Wed, Apr 25, 2018 at 3:39 PM Christophe Schmitz <
christo...@instaclustr.com> wrote:

> Hi Pranay,
>
> You only need to upgrade your SSTables when you perform a major Cassandra
> version upgrade, so you don't need to run it for upgrading in the 3.x.x
> series.
> One way to check which storage version your SSTables are using is to look
> at the SSTables name. It is structured as:
> --.db The version is a string that
> represents the SSTable storage format version.
> The version is "mc" in the 3.x.x series.
>
> Cheers,
> Christophe
>
>
>
> On 26 April 2018 at 06:06, Pranay akula 
> wrote:
>
>> When is it necessary to upgrade SSTables ?? For a minor upgrade do we
>> need to run upgrade stables??
>>
>> I knew when we are doing a major upgrade we have to run upgrade sstables
>> so that sstables will be re-written to newer version with additional meta
>> data.
>>
>> But do we need to run upgrade sstables for upgrading from let's say
>> 3.0.15 to 3.0.16 or 3.0.y to 3.11.y??
>>
>>
>> Thanks
>> Pranay
>>
>
>
>
> --
>
> *Christophe Schmitz - **VP Consulting*
>
> AU: +61 4 03751980 / FR: +33 7 82022899 <+33%207%2082%2002%2028%2099>
>
>
> 
> 
>
> Read our latest technical blog posts here
> . This email has been sent on behalf
> of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
> email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>

Re: Reading Cassandra's Blob from Apache Ignite

2018-04-25 Thread Jonathan Haddad

I think you’ll have better luck with the ignite list, as this looks like an
ignite configuration problem.
On Wed, Apr 25, 2018 at 3:09 AM  wrote:

> Dear Community,
>
>
>
> I'm trying to read the contents of Cassandra table from Ignite(acting as
> cache). The table is given below::
>
> CREATE TABLE test.epc_table (
>
> imsi text PRIMARY KEY,
>
> data blob
>
> )
>
> The data blob is being used to store a C++ class object(Class name is
> 'RtCassEpcTableDataVo').
>
> I'm trying to use the following c++ program to pass the value of 'imsi'
> and get the corresponding 'data'.
>
> #include "ignite/ignite.h"
>
> #include "ignite/ignition.h"
>
>
>
>
>
> #include 
>
> #include
>
>
>
> #include "RtCassEpcTableDataVo.hpp"
>
>
>
> using namespace ignite;
>
> using namespace cache;
>
> using namespace std;
>
>
>
> int main()
>
> {
>
> IgniteConfiguration cfg;
>
>
>
> cfg.springCfgPath =
> "/home/ignite/apache-ignite-fabric-2.4.0-bin/config/cassandra-config.xml";
>
>
>
> try
>
> {
>
> // Start a node.
>
> Ignite ignite = Ignition::Start(cfg);
>
> Cache cache = ignite.GetCache RtCassEpcTableDataVo>("cache1");
>
>
>
> cout<
> string l_imsi;
>
> getline(cin>>ws,l_imsi);
>
>
>
> RtCassEpcTableDataVo l_blob=cache.Get(l_imsi);
>
> Ignition::StopAll(false);
>
> }
>
> catch (IgniteError& err)
>
> {
>
> std::cout << "An error occurred: " << err.GetText() <<
> std::endl;
>
>
>
> return err.GetCode();
>
> }
>
>
>
> std::cout << std::endl;
>
>
>
> return 0;
>
> }
>
> However, I'm getting compilation errors.
>
> In file included from
> /usr/local/include/ignite/impl/binary/binary_writer_impl.h:32:0,
>
>  from /usr/local/include/ignite/binary/binary_raw_writer.h:
> 30,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/cache/query/query_scan.h:29,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/cache/cache_impl.h:21,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/ignite_impl.h:27,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/ignite.h:26,
>
>  from sample.cpp:1:
>
> /usr/local/include/ignite/impl/binary/binary_utils.h: In instantiation of
> âstatic T ignite::impl::binary::BinaryUtils::GetDefaultValue() [with T =
> RtCassEpcTableDataVo]â:
>
> /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/operations.h:349:62:
> required from âvoid ignite::impl::Out1Operation::SetNull() [with T =
> RtCassEpcTableDataVo]â
>
> sample.cpp:72:1:   required from here
>
> /usr/local/include/ignite/impl/binary/binary_utils.h:475:59: error: â
> GetNullâ is not a member of âignite::binary::BinaryType<
> RtCassEpcTableDataVo>â
>
>  ignite::binary::BinaryType::GetNull(res);
>
>  ~~^
>
> In file included from
> /usr/local/include/ignite/impl/binary/binary_object_impl.h:31:0,
>
>  from
> /usr/local/include/ignite/impl/binary/binary_writer_impl.h:35,
>
>  from /usr/local/include/ignite/binary/binary_raw_writer.h:
> 30,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/cache/query/query_scan.h:29,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/cache/cache_impl.h:21,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/ignite_impl.h:27,
>
>  from /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/ignite.h:26,
>
>  from sample.cpp:1:
>
> /usr/local/include/ignite/impl/binary/binary_reader_impl.h: In
> instantiation of âvoid ignite::impl::binary::BinaryReaderImpl::
> ReadTopObject0(T&) [with T = RtCassEpcTableDataVo]â:
>
> /usr/local/include/ignite/impl/binary/binary_type_impl.h:100:17:
> required from âstatic T ignite::binary::ReadHelper::Read(R&) [with R =
> ignite::impl::binary::BinaryReaderImpl; T = RtCassEpcTableDataVo]â
>
> /usr/local/include/ignite/impl/binary/binary_reader_impl.h:887:63:
> required from âT ignite::impl::binary::BinaryReaderImpl::ReadTopObject()
> [with T = RtCassEpcTableDataVo]â
>
> /home/ignite/apache-ignite-fabric-2.4.0
> -bin/platforms/cpp/core/include/ignite/impl/operations.h:344:21:
> required from âvoid ignite::impl::Out1Operation::ProcessOutput
> (ignite::impl::binary::BinaryReaderImpl&) [with T = RtCassEpcTableDataVo]â
>
> sample.cpp:72:1:   required from here
>
>

Re: 答复: Time serial column family design

2018-04-17 Thread Jonathan Haddad

To add to what Nate suggested, we have an entire blog post on scaling time
series data models:

http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Jon


On Tue, Apr 17, 2018 at 7:39 PM Nate McCall  wrote:

> I disagree. Create date as a raw integer is an excellent surrogate for
> controlling time series "buckets" as it gives you complete control over the
> granularity. You can even have multiple granularities in the same table -
> remember that partition key "misses" in Cassandra are pretty lightweight as
> they won't make it past the bloom filter on the read path.
>
> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja 
> wrote:
>
>> Hi David,
>>
>> Could you describe why you chose to include the create date in the
>> partition key? If the vin in enough "partitioning", meaning that the size
>> (number of rows x size of row) of each partition is less than 100MB, then
>> remove the date and just use the create_time, because the date is already
>> included in that column anyways.
>>
>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>> per partition. You can actually have many more but you don't want to go
>> much higher for performance reasons.
>>
>> If this is not enough you could use create_month instead of create_date,
>> for example, to reduce the partition size while not being too granular.
>>
>>
>> On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:
>>
>>> Your table design will work fine as you have appropriately bucketed by
>>> an integer-based 'create_date' field.
>>>
>>> Your goal for this refactor should be to remove the "IN" clause from
>>> your code. This will move the rollup of multiple partition keys being
>>> retrieved into the client instead of relying on the coordinator assembling
>>> the results. You have to do more work and add some complexity, but the
>>> trade off will be much higher performance as you are removing the single
>>> coordinator as the bottleneck.
>>>
>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
>>> wrote:
>>>
 Hi Nate,

 Thanks for your reply!

 Is there other way to design this table to meet this requirement?



 Best Regards,



 倪项菲*/ **David Ni*

 中移德电网络科技有限公司

 Virtue Intelligent Network Ltd, co.

 Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

 Mob: +86 13797007811|Tel: + 86 27 5024 2516



 *发件人:* Nate McCall 
 *发送时间:* 2018年4月17日 7:12
 *收件人:* Cassandra Users 
 *主题:* Re: Time serial column family design





 Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
 in (20180416, 20180415, 20180414, 20180413, 20180412….);

 But this cause the cql query is very long,and I don’t know whether
 there is limitation for the length of the cql.

 Please give me some advice,thanks in advance.



 Using the SELECT ... IN syntax  means that:

 - the driver will not be able to route the queries to the nodes which
 have the partition

 - a single coordinator must scatter-gather the query and results



 Break this up into a series of single statements using the executeAsync
 method and gather the results via something like Futures in Guava or
 similar.

>>>
>>>
>>>
>>> --
>>> -
>>> Nate McCall
>>> Wellington, NZ
>>> @zznate
>>>
>>> CTO
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Cassandra datastax cerrification

2018-04-14 Thread Jonathan Haddad

The original question was about prepping. I think that might be a question
best suited for datastax, since you’re paying them for the cert.
On Sat, Apr 14, 2018 at 9:02 AM Ben Bromhead  wrote:

> Certification is only as good as the organizations that recognize it.
> Identify what you want to get out of certification, whether that
> certificate will get what you want and make a decision based on that. The
> delivered content can be the best in the world, but if no one recognizes it
> then the certification element becomes meaningless.
>
> I would be asking any certification provider which companies actually
> accept it or recognize it as a demonstration of skill. Especially if you
> are going to pay money for it.
>
> On the other hand if you are just looking to build knowledge / get some
> classroom experience then you can look at the various options with a more
> subjective approach.
>
> This is just my 2c based on a past life in the IT Security world where
> "certifications" are a massive business and acceptance of certifications
> can vary place to place and not a commentary on any of the certification
> providers you directly mentioned :)
>
>
> On Sat, Apr 14, 2018 at 11:09 AM Abdul Patel  wrote:
>
>> Hi All,
>>
>> I am preparing for cassandra certification(dba) orielly has stopped the
>> cassandra cerrification so the best bet is datastax now ..as per my
>> knwledge ds201 and 220 should be enough for cerrification and also i am
>> reading definitive guide on cassandra ..any other material required ? Any
>> practise test websites? As certification is costly and wanna clear in one
>> go ...
>>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>

Re: Latest version and Features

2018-04-11 Thread Jonathan Haddad

I was going to say the same thing, but then I remembered 3.1 == 3.0.1.
There's nothing nothing that makes 3.11 a requirement, so that means
3.0.latest is the safest bet, with 3.11.2 being the one I'd personally go
with also.

On Wed, Apr 11, 2018 at 4:13 PM Carlos Rolo <r...@pythian.com> wrote:

> If you are on 3.1.0 I would move forward to 3.11.2.
>
> I blogged about this decision recently here:
> https://blog.pythian.com/what-cassandra-version-should-i-use-2018/
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
> www.pythian.com
>
> On Wed, Apr 11, 2018 at 4:27 PM, Nicolas Guyomar <
> nicolas.guyo...@gmail.com> wrote:
>
>> Everything is in the same document, you have a "New features" section
>> plus an "Upgrading" one
>>
>> On 11 April 2018 at 17:24, Abdul Patel <abd786...@gmail.com> wrote:
>>
>>> Nicolas,
>>> I do see all new features but instructions for upgrade are mentioned in
>>> next section ..not sure if i missed it ..can you share that section?
>>>
>>>
>>> On Wednesday, April 11, 2018, Abdul Patel <abd786...@gmail.com> wrote:
>>>
>>>> Thanks .this is perfect
>>>>
>>>> On Wednesday, April 11, 2018, Nicolas Guyomar <
>>>> nicolas.guyo...@gmail.com> wrote:
>>>>
>>>>> Sorry, I should have give you this link instead :
>>>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt
>>>>>
>>>>> You'll find everything you need IMHO
>>>>>
>>>>> On 11 April 2018 at 17:05, Abdul Patel <abd786...@gmail.com> wrote:
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Is the upgrade process staright forward do we have any documentation
>>>>>> to upgrade?
>>>>>>
>>>>>>
>>>>>> On Wednesday, April 11, 2018, Jonathan Haddad <j...@jonhaddad.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Move to the latest 3.0, or if you're feeling a little more
>>>>>>> adventurous, 3.11.2.
>>>>>>>
>>>>>>> 4.0 discussion is happening now, nothing is decided.
>>>>>>>
>>>>>>> On Wed, Apr 11, 2018 at 7:35 AM Abdul Patel <abd786...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Our company is planning for upgrading cassandra to maitain the
>>>>>>>> audit gudilines for patch cycle.
>>>>>>>> We are currently on 3.1.0, whats the latest stable version and what
>>>>>>>> are the new features?
>>>>>>>> Will it be better to wait for 4.0? Any news on what will be new
>>>>>>>> features in 4.0 ?
>>>>>>>>
>>>>>>>
>>>>>
>>
>
> --
>
>
>
>

Re: JVM Tuning post

2018-04-11 Thread Jonathan Haddad

Re G1GC in Java 9, yes it's the default, but we explicitly specify the
collector when we start Cassandra.

Regarding load testing, some folks like cassandra-stress, but personally I
think second to production itself, there's nothing better than an
environment running the full applications stack with simulated load.  Yes,
it's a lot of setup, but imo it's necessary for any sort of performance /
acceptance testing.

On Wed, Apr 11, 2018 at 10:50 AM Pradeep Chhetri 
wrote:

> Thank you for writing this. The post is really very helpful.
>
> One question - My understanding is GC tuning depends a lot on the
> read/write workload and the data size. What will be the right way to
> simulate the production workload on a non-production environment in
> cassandra world.
>
> On Wed, Apr 11, 2018 at 8:54 PM, Russell Bateman 
> wrote:
>
>> Nice write-up. G1GC became the default garbage collection mechanism
>> beginning in Java 9, right?
>>
>>
>> On 04/11/2018 09:05 AM, Joao Serrachinha wrote:
>>
>> Many thanks to "The Last Pickle", also for TWCS advice's. Especially for
>> C* new features on version 3.11.1
>>
>> Regards,
>> João
>>
>> On 11/04/2018 16:00, Jon Haddad wrote:
>>
>> Hey folks,
>>
>> We (The Last Pickle) have helped a lot of teams with JVM tuning over
>> the years, finally managed to write some stuff down.  We’re hoping the
>> community finds it helpful.
>> http://thelastpickle.com/blog/2018/04/11/gc-tuning.html
>>
>> Jon
>>
>>
>>
>>
>>
>

Re: Latest version and Features

2018-04-11 Thread Jonathan Haddad

Move to the latest 3.0, or if you're feeling a little more adventurous,
3.11.2.

4.0 discussion is happening now, nothing is decided.

On Wed, Apr 11, 2018 at 7:35 AM Abdul Patel  wrote:

> Hi All,
>
> Our company is planning for upgrading cassandra to maitain the audit
> gudilines for patch cycle.
> We are currently on 3.1.0, whats the latest stable version and what are
> the new features?
> Will it be better to wait for 4.0? Any news on what will be new features
> in 4.0 ?
>

Re: Is Cassandra used in Medical industry?

2018-03-29 Thread Jonathan Haddad

If you require a full audit trail then you'll need to do this in your data
model.  I recommend looking to event sourcing, which is a way of tracking
all changes to an entity over its lifetime.

https://martinfowler.com/eaaDev/EventSourcing.html

Instead of thinking of data as global mutable state, think of it as a time
series where you save each change as a completely new object.  Then you can
go back in time to any point to see how it got to be the way it is.

On Thu, Mar 29, 2018 at 9:59 AM sam sriramadhesikan <
sam.sriramadhesi...@oracle.com> wrote:

> Rahul,
>
> CFR 21 (part 11) is an FDA-mandated electronics records standard. For any
> software solution built for the life sciences / pharma industries,
> compliance with this standard is a must. There are three parts to this:
>
> (1) Controls and audit of user logins / forcing re-login when session
> times out
> (2) Tracking change history of key software records (for example, a work
> order)
> (3) Protecting the data from unauthorized access / establishing data was
> not tampered with
>
> Most of the compliance is built into the business application layer in the
> form of data validations, audit trails, and process workflows.
>
> Cassandra’s RBAC plus encryption at rest would satisfy (3). If there was a
> granular audit trail capability, that would address (2). (1) is a business
> application function, I think.
>
> Thanks,
>
> Sam
>
>
> On Mar 29, 2018, at 12:29 PM, Rahul Singh 
> wrote:
>
> Is that an encryption related policy? If you can clarify — maybe able to
> get better answers. There are products like Vormetrics (?) which can
> encrypt data at rest.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 29, 2018, 12:23 AM -0400, Sudhakar Ganesan <
> sudhakar.gane...@flex.com>, wrote:
>
> Hi,
>
>
> Did anyone used Cassandra in medical industry since FDA enforces CFR 21
> (part 11) compliance ?
>
>
> Regards,
>
> Sudhakar
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>
>
>

Re: Is Cassandra used in Medical industry?

2018-03-29 Thread Jonathan Haddad

I haven't use Vormetric, but have worked with a couple teams doing disk
encryption using LUKS:
https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md

I haven't read through that FDA guideline, and tbh I'm not going to - if
there's a specific question you have it would be better to ask it rather
than require everyone to get up to speed on your particular problem with
federal regulations.  Remember this is a list of volunteers trying to help
people out, not a support contract.  The more info you can provide to help
us understand your problem the better answers you will get.

On Thu, Mar 29, 2018 at 9:30 AM Rahul Singh 
wrote:

> Is that an encryption related policy? If you can clarify — maybe able to
> get better answers. There are products like Vormetrics (?) which can
> encrypt data at rest.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 29, 2018, 12:23 AM -0400, Sudhakar Ganesan <
> sudhakar.gane...@flex.com>, wrote:
>
> Hi,
>
>
>
> Did anyone used Cassandra in medical industry since FDA enforces CFR 21
> (part 11) compliance ?
>
>
>
> Regards,
>
> Sudhakar
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>
>

Re: Can "data_file_directories" make use of multiple disks?

2018-03-27 Thread Jonathan Haddad

In Cassandra 3.2 and later, data is partitioned by token range, which
should give you even distribution of data.

If you're going to go into 3.x, please use the latest 3.11, which at this
time is 3.11.2.

On Tue, Mar 27, 2018 at 8:05 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> Hi,
>
> I am trying to replace machines having HDD with little powerful machines
> having SSD in production. The data present in each node is around 300gb.
> But the newer machines have 2 X 200GB SSDs instead of a single disk.
>
> "data_file_directories" looks like a multi-valued config which I can use.
> Am I looking at the right config?
>
> How does the data is distributed evenly? Leveled Compaction Strategy is
> used for the tables.
>
> Thanks!
>

Re: Update to C* 3.0.14 from 3.0.10

2018-03-23 Thread Jonathan Haddad

3.0.16 is the latest, I recommend going all the way up.  About a hundred
bug fixes:
https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt

Jon

On Fri, Mar 23, 2018 at 2:22 PM Dmitry Saprykin 
wrote:

> Hi,
>
> I successfully used 3.0.14 more than a year in production. And moreover
> 3.0.10 is definitely not stable and you need to upgrade ASAP. 3.0.10
> contains known bug which corrupts data during schema changes
>
> Regards,
> Dmitrii
>
> On Fri, Mar 23, 2018 at 5:01 PM Nitan Kainth 
> wrote:
>
>> Hi All,
>>
>> Our repairs are consuming CPU and some research shows that moving to
>> 3.0.14 will help us fix them. I just want to know community's experience
>> about 3.0.14 version.
>>
>>  Is it stable?
>> Anybody had any issues after upgrading this?
>>
>> Regards,
>> Nitan K.
>>
>>

Re: Using Spark to delete from Transactional Cluster

2018-03-23 Thread Jonathan Haddad

I'm confused as to what the difference between deleting with prepared
statements and deleting through spark is?  To the best of my knowledge
either way it's the same thing - normal deletion with tombstones
replicated.  Is it that you're doing deletes in the analytics DC instead of
your real time one?

On Fri, Mar 23, 2018 at 11:38 AM Charulata Sharma (charshar) <
chars...@cisco.com> wrote:

> Hi Rahul,
>
>  Thanks for your answer. Why do you say that deleting from spark
> is not elegant?? This is the exact feedback I want. Basically why is it not
> elegant?
>
> I can either delete using delete prepared statements or through spark. TTL
> approach doesn’t work for us
>
> Because first of all ttl is there at a column level and there are business
> rules for purge which make the TTL solution not very clean in our case.
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *Rahul Singh 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Thursday, March 22, 2018 at 5:08 PM
> *To: *"user@cassandra.apache.org" , "
> user@cassandra.apache.org" 
> *Subject: *Re: Using Spark to delete from Transactional Cluster
>
>
>
> Short answer : it works. You can even run “delete” statements from within
> Spark once you know which keys to delete. Not elegant but it works.
>
> It will create a bunch of tombstones and you may need to spread your
> deletes over days. Another thing to consider is instead of deleting setting
> a TTL which will eventually get cleansed.
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
>
> On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) <
> chars...@cisco.com>, wrote:
>
> Hi,
>
>Wanted to know the community’s experiences and feedback on using Apache
> Spark to delete data from C* transactional cluster.
>
> We have spark installed in our analytical C* cluster and so far we have
> been using Spark only for analytics purposes.
>
>
>
> However, now with advanced features of Spark 2.0, I am considering using
> spark-cassandra connector for deletes instead of a series of Delete
> Prepared Statements
>
> So essentially the deletes will happen on the analytical cluster and they
> will be replicated over to transactional cluster by means of our keyspace
> replication strategies.
>
>
>
> Are there any risks involved in this ??
>
>
>
> Thanks,
>
> Charu
>
>
>
>

Re: replace dead node vs remove node

2018-03-22 Thread Jonathan Haddad

Ah sorry - I misread the original post - for some reason I had it in my
head the question was about bootstrap.

Carry on.

On Thu, Mar 22, 2018 at 8:35 PM Jonathan Haddad <j...@jonhaddad.com> wrote:

> Under normal circumstances this is not true.
>
> Take a look at org.apache.cassandra.service.StorageProxy#performWrite, it
> grabs both the natural endpoints and the pending endpoints (new nodes).
> They're eventually passed through
> to 
> org.apache.cassandra.locator.AbstractReplicationStrategy#getWriteResponseHandler,
> which keeps track of both the current endpoints and the pending ones.
> Later, it gets to the actual work:
>
> performer.apply(mutation, Iterables.concat(naturalEndpoints, 
> pendingEndpoints), responseHandler, localDataCenter, consistency_level);
>
> The signature of this method is:
>
> public interface WritePerformer
> {
> public void apply(IMutation mutation,
>   Iterable targets,
>   AbstractWriteResponseHandler responseHandler,
>   String localDataCenter,
>   ConsistencyLevel consistencyLevel) throws 
> OverloadedException;
> }
>
> Notice the targets?  That's the list of all current owners and pending
> owners.  The list is a concatenation of the natural endpoints and the
> pending ones.
>
> Pending owners are listed in org.apache.cassandra.locator.TokenMetadata
>
> // this is a cache of the calculation from {tokenToEndpointMap, 
> bootstrapTokens, leavingEndpoints}
> private final ConcurrentMap<String, PendingRangeMaps> pendingRanges = new 
> ConcurrentHashMap<String, PendingRangeMaps>();
>
>
> TL;DR: mutations are sent to nodes being bootstrapped.
>
> Jon
>
>
> On Thu, Mar 22, 2018 at 8:09 PM Peng Xiao <2535...@qq.com> wrote:
>
>> Hi Anthony,
>>
>> there is a problem with replacing dead node as per the blog,if the
>> replacement process takes longer than max_hint_window_in_ms,we must run
>> repair to make the replaced node consistent again, since it missed ongoing
>> writes during bootstrapping.but for a great cluster,repair is a painful
>> process.
>>
>> Thanks,
>> Peng Xiao
>>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Anthony Grasso"<anthony.gra...@gmail.com>;
>> *发送时间:* 2018年3月22日(星期四) 晚上7:13
>> *收件人:* "user"<user@cassandra.apache.org>;
>> *主题:* Re: replace dead node vs remove node
>>
>> Hi Peng,
>>
>> Depending on the hardware failure you can do one of two things:
>>
>> 1. If the disks are intact and uncorrupted you could just use the disks
>> with the current data on them in the new node. Even if the IP address
>> changes for the new node that is fine. In that case all you need to do is
>> run repair on the new node. The repair will fix any writes the node missed
>> while it was down. This process is similar to the scenario in this blog
>> post:
>> http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html
>>
>> 2. If the disks are inaccessible or corrupted, then use the method as
>> described in the blogpost you linked to. The operation is similar to
>> bootstrapping a new node. There is no need to perform any other remove or
>> join operation on the failed or new nodes. As per the blog post, you
>> definitely want to run repair on the new node as soon as it joins the
>> cluster. In this case here, the data on the failed node is effectively lost
>> and replaced with data from other nodes in the cluster.
>>
>> Hope this helps.
>>
>> Regards,
>> Anthony
>>
>>
>> On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:
>>
>>> Dear All,
>>>
>>> when one node failure with hardware errors,it will be in DN status in
>>> the cluster.Then if we are not able to handle this error in three hours(max
>>> hints window),we will loss data,right?we have to run repair to keep the
>>> consistency.
>>> And as per
>>> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
>>> can replace this dead node,is it the same as bootstrap new node?that means
>>> we don't need to remove node and rejoin?
>>> Could anyone please advise?
>>>
>>> Thanks,
>>> Peng Xiao
>>>
>>>
>>>
>>>
>>>

Re: replace dead node vs remove node

2018-03-22 Thread Jonathan Haddad

Under normal circumstances this is not true.

Take a look at org.apache.cassandra.service.StorageProxy#performWrite, it
grabs both the natural endpoints and the pending endpoints (new nodes).
They're eventually passed through
to 
org.apache.cassandra.locator.AbstractReplicationStrategy#getWriteResponseHandler,
which keeps track of both the current endpoints and the pending ones.
Later, it gets to the actual work:

performer.apply(mutation, Iterables.concat(naturalEndpoints,
pendingEndpoints), responseHandler, localDataCenter,
consistency_level);

The signature of this method is:

public interface WritePerformer
{
public void apply(IMutation mutation,
  Iterable targets,
  AbstractWriteResponseHandler responseHandler,
  String localDataCenter,
  ConsistencyLevel consistencyLevel) throws
OverloadedException;
}

Notice the targets?  That's the list of all current owners and pending
owners.  The list is a concatenation of the natural endpoints and the
pending ones.

Pending owners are listed in org.apache.cassandra.locator.TokenMetadata

// this is a cache of the calculation from {tokenToEndpointMap,
bootstrapTokens, leavingEndpoints}
private final ConcurrentMap pendingRanges =
new ConcurrentHashMap();


TL;DR: mutations are sent to nodes being bootstrapped.

Jon


On Thu, Mar 22, 2018 at 8:09 PM Peng Xiao <2535...@qq.com> wrote:

> Hi Anthony,
>
> there is a problem with replacing dead node as per the blog,if the
> replacement process takes longer than max_hint_window_in_ms,we must run
> repair to make the replaced node consistent again, since it missed ongoing
> writes during bootstrapping.but for a great cluster,repair is a painful
> process.
>
> Thanks,
> Peng Xiao
>
>
>
> -- 原始邮件 --
> *发件人:* "Anthony Grasso";
> *发送时间:* 2018年3月22日(星期四) 晚上7:13
> *收件人:* "user";
> *主题:* Re: replace dead node vs remove node
>
> Hi Peng,
>
> Depending on the hardware failure you can do one of two things:
>
> 1. If the disks are intact and uncorrupted you could just use the disks
> with the current data on them in the new node. Even if the IP address
> changes for the new node that is fine. In that case all you need to do is
> run repair on the new node. The repair will fix any writes the node missed
> while it was down. This process is similar to the scenario in this blog
> post:
> http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html
>
> 2. If the disks are inaccessible or corrupted, then use the method as
> described in the blogpost you linked to. The operation is similar to
> bootstrapping a new node. There is no need to perform any other remove or
> join operation on the failed or new nodes. As per the blog post, you
> definitely want to run repair on the new node as soon as it joins the
> cluster. In this case here, the data on the failed node is effectively lost
> and replaced with data from other nodes in the cluster.
>
> Hope this helps.
>
> Regards,
> Anthony
>
>
> On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:
>
>> Dear All,
>>
>> when one node failure with hardware errors,it will be in DN status in the
>> cluster.Then if we are not able to handle this error in three hours(max
>> hints window),we will loss data,right?we have to run repair to keep the
>> consistency.
>> And as per
>> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
>> can replace this dead node,is it the same as bootstrap new node?that means
>> we don't need to remove node and rejoin?
>> Could anyone please advise?
>>
>> Thanks,
>> Peng Xiao
>>
>>
>>
>>
>>

Re: Fast Writes to Cassandra Failing Through Python Script

2018-03-15 Thread Jonathan Haddad

Generally speaking, you don't need to.  I almost never do.  I've only set
it in situations where I've had a large number of tables and I want to
avoid a lot of flushing when commit log segments are removed.

Setting it to 128 milliseconds means it's flushing 8 times per second,
which gives no benefit, and only hurts things, as you've discovered.

On Thu, Mar 15, 2018 at 10:15 AM Affan Syed  wrote:

> No it did solve the problem, as Faraz mentioned but I am still not sure
> about whats the underlying cause. Is 0ms really correct? how do we setu a
> flush period?
>
> - Affan
>
> On Thu, Mar 15, 2018 at 10:00 PM, Jon Haddad  wrote:
>
>> TWCS does SizeTieredCompaction within the window, so it’s not likely to
>> make a difference.  I’m +1’ing what Jeff said,
>> 128ms memtable_flush_period_in_ms is almost certainly your problem, unless
>> you’ve changed other settings and haven’t told us about them.
>>
>>
>> On Mar 15, 2018, at 9:54 AM, Affan Syed  wrote:
>>
>> Jeff,
>>
>> I think additionally the reason might also be that the keyspace was using 
>> TimeWindowCompactionStrategy
>> with 1 day bucket; however the writes very quite rapid and no automatic
>> compaction was working.
>>
>> I would think changing strategy to SizeTiered would also solve this
>> problem?
>>
>>
>>
>> - Affan
>>
>> On Thu, Mar 15, 2018 at 12:11 AM, Jeff Jirsa  wrote:
>>
>>> The problem was likely more with the fact that it can’t flush in 128ms
>>> so you backup on flush
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Mar 14, 2018, at 12:07 PM, Faraz Mateen  wrote:
>>>
>>> I was able to overcome the timeout error by setting
>>> memtable_flush_period_in_ms to 0 for all my tables. Initially it was set to
>>> 128.
>>> Now I able to write ~4 records/min in cassandra and the script has
>>> been running for around 12 hours now.
>>>
>>> However, I am still curious that why was cassandra unable to hold data
>>> in the memory for 128 ms considering that I have 30 GB of RAM for each node.
>>>
>>> On Wed, Mar 14, 2018 at 2:24 PM, Faraz Mateen  wrote:
>>>
 Thanks for the response.

 Here is the output of "DESCRIBE" on my table

 https://gist.github.com/farazmateen/1c88f6ae4fb0b9f1619a2a1b28ae58c4

 I am getting two errors from the python script that I mentioned above.
 First one does not show any error or exception in server logs. Second 
 error:

 *"cassandra.OperationTimedOut: errors={'10.128.1.1': 'Client request
 timeout. See Session.execute[_async](timeout)'}, last_host=10.128.1.1"*

 shows JAVA HEAP Exception in server logs. You can look at the exception
 here:


 https://gist.githubusercontent.com/farazmateen/e7aa5749f963ad2293f8be0ca1ccdc22/raw/e3fd274af32c20eb9f534849a31734dcd33745b4/JVM-HEAP-EXCEPTION.txt

 My python code snippet can be viewed at the following link:
 https://gist.github.com/farazmateen/02be8bb59cdb205d6a35e8e3f93e27d5

 
 H
 ere
 are timeout related arguments from (*/etc/cassandra/cassandra.yaml*)

 read_request_timeout_in_ms: 5000
 range_request_timeout_in_ms: 1
 write_request_timeout_in_ms: 1
 counter_write_request_timeout_in_ms: 5000
 cas_contention_timeout_in_ms: 1000
 truncate_request_timeout_in_ms: 6
 request_timeout_in_ms: 1
 cross_node_timeout: false


 On Wed, Mar 14, 2018 at 4:22 AM, Bruce Tietjen <
 bruce.tiet...@imatsolutions.com> wrote:

> The following won't address any server performance issues, but will
> allow your application to continue to run even if there are client or
> server timeouts:
>
> Your python code should wrap all Cassandra statement execution
> calls in a try/except block to catch any errors and handle them
> appropriately.
> For timeouts, you might consider re-trying the statement.
>
> You may also want to consider proactively setting your client
> and/or server timeouts so your application sees fewer failures.
>
>
> Any production code should include proper error handling and during
> initial development and testing, it may be helpful to allow your
> application to continue running
> so you get a better idea of if or when different timeouts occur.
>
> see:
>cassandra.Timeout
>cassandra.WriteTimeout
>cassandra.ReadTimeout
>
> also:
>https://datastax.github.io/python-driver/api/cassandra.html
>
>
>
>
>
> On Tue, Mar 13, 2018 at 5:17 PM, Goutham reddy <
> goutham.chiru...@gmail.com> wrote:
>
>> Faraz,
>> Can you share your code snippet, how you are trying to save the
>> entity objects into cassandra.
>>
>>

Re: What versions should the documentation support now?

2018-03-13 Thread Jonathan Haddad

Yes, I agree, we should host versioned docs.  I don't think anyone is
against it, it's a matter of someone having the time to do it.

On Tue, Mar 13, 2018 at 6:14 PM kurt greaves  wrote:

> I’ve never heard of anyone shipping docs for multiple versions, I don’t
>> know why we’d do that.  You can get the docs for any version you need by
>> downloading C*, the docs are included.  I’m a firm -1 on changing that
>> process.
>
> We should still host versioned docs on the website however. Either that or
> we specify "since version x" for each component in the docs with notes on
> behaviour.
> 
>

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad

Right now they can’t.
On Mon, Mar 12, 2018 at 9:03 AM Kenneth Brotman
<kenbrot...@yahoo.com.invalid> wrote:

> I see how that makes sense Jon but how does a user then select the
> documentation for the version they are running on the Apache Cassandra web
> site?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Monday, March 12, 2018 8:40 AM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> The docs are in tree, meaning they are versioned, and should be written
> for the version they correspond to. Trunk docs should reflect the current
> state of trunk, and shouldn’t have caveats for other versions.
>
> On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> If we use DataStax’s example, we would have instructions for v3.0 and
> v2.1.  How’s that?
>
>
>
> We should have to be instructions for the cloud platforms like AWS but how
> do you do that and stay vendor neutral?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Hannu Kröger [mailto:hkro...@gmail.com]
> *Sent:* Monday, March 12, 2018 7:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> In my opinion, a good documentation should somehow include version
> specific pieces of information. Whether it is nodetool command that came in
> certain version or parameter for something or something else.
>
>
>
> That would very useful. It’s confusing if I see documentation talking
> about 4.0 specifics and I try to find that in my 3.11.x
>
>
>
> Hannu
>
>
>
> On 12 Mar 2018, at 16:38, Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> wrote:
>
>
>
> I’m unclear what versions are most popular right now? What version are you
> running?
>
>
>
> What version should still be supported in the documentation?  For example,
> I’m turning my attention back to writing a section on adding a data
> center.  What versions should I support in that information?
>
>
>
> I’m working on it right now.  Thanks,
>
>
>
> Kenneth Brotman
>
>
>
>

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad

The docs are in tree, meaning they are versioned, and should be written for
the version they correspond to. Trunk docs should reflect the current state
of trunk, and shouldn’t have caveats for other versions.
On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman
 wrote:

> If we use DataStax’s example, we would have instructions for v3.0 and
> v2.1.  How’s that?
>
>
>
> We should have to be instructions for the cloud platforms like AWS but how
> do you do that and stay vendor neutral?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Hannu Kröger [mailto:hkro...@gmail.com]
> *Sent:* Monday, March 12, 2018 7:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> In my opinion, a good documentation should somehow include version
> specific pieces of information. Whether it is nodetool command that came in
> certain version or parameter for something or something else.
>
>
>
> That would very useful. It’s confusing if I see documentation talking
> about 4.0 specifics and I try to find that in my 3.11.x
>
>
>
> Hannu
>
>
>
> On 12 Mar 2018, at 16:38, Kenneth Brotman 
> wrote:
>
>
>
> I’m unclear what versions are most popular right now? What version are you
> running?
>
>
>
> What version should still be supported in the documentation?  For example,
> I’m turning my attention back to writing a section on adding a data
> center.  What versions should I support in that information?
>
>
>
> I’m working on it right now.  Thanks,
>
>
>
> Kenneth Brotman
>
>
>

Re: Jon Haddad on Diagnosing Performance Problems in Production

2018-02-27 Thread Jonathan Haddad

There isn't a ton from that talk I'd consider "wrong" at this point, but
some of it is a little stale.  I always start off looking at system
metrics.  For a very thorough discussion on the matter check out Brendan
Gregg's USE [1] method.  I did a blog post on my own about the talk [2]
that has screenshots and might be helpful.  Generally speaking know your OS
and the tools to examine each component.  Learn how to interpret the
numbers you see, there's more information than a human can process in a
lifetime but understanding some fundamentals of throughput vs latency &
error rates and how to find out each of those metrics for cpu / memory /
network / disk is a good start.

More recently I did a talk at Data Day Texas, I posted the slides on
Slideshare [3].  The focus there was more on perf tuning and less on
performance troubleshooting, but I guess it's a matter of perspective which
point your at.  The tools have changed a little (Prometheus instead of
Graphite), and there's some new perf tuning tips like examining your read
ahead and compression settings, generating flame graphs and using tools
like YourKit and Java Flight Recorder, and the easiest win of all time,
disabling dynamic snitch if your hardware is fast and you want sub ms
p99s.  Turn up counter cache if you use counters (it still gets hit on the
write path), and row cache is way more effective than people give it credit
for under the right workloads.

I've got a blog post in the works on JVM tuning, but for now I reference
CASSANDRA-8150 [4] and Blake Eggleston's blog post [5] from back in our
days at a small startup.

Lastly, I'm doing a performance tuning series on our blog at The Last
Pickle, with the first being on Flame Graphs [6].  I've got about 6 posts
in the pipeline, just need to find time to get to them.

Hope this helps,
Jon

[1] http://www.brendangregg.com/usemethod.html
[2] http://rustyrazorblade.com/post/2014/2014-09-18-diagnosing-production/
[3] https://www.slideshare.net/JonHaddad/performance-tuning-86995333
[4] https://issues.apache.org/jira/browse/CASSANDRA-8150
[5]
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html
[6] http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html

On Tue, Feb 27, 2018 at 8:56 AM Michael Shuler 
wrote:

> On 02/27/2018 10:20 AM, Nicolas Guyomar wrote:
> > Is Jon blog
> > post
> https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production
> > was relocated somewhere ?
>
>
> https://web.archive.org/web/20160322011022/planetcassandra.org/blog/cassandra-summit-recap-diagnosing-problems-in-production
>
> --
> Kind regards,
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: How to Parse raw CQL text?

2018-02-25 Thread Jonathan Haddad

I had to do something similar recently.  Take a look at
org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some
sample code here [1] as well as a blog post [2] that explains how to access
the private variables, since there's no access provided.  It wasn't really
designed to be used as a library, so YMMV with future changes.

[1]
https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt
[2]
http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/

On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali  wrote:

> I just did some trial and error. Looks like this would work
>
> public class Test {
>
> public static void main(String[] args) throws Exception {
>
> String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map, primary 
> key (field1) );";
> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
> CqlLexer cqlLexer = new CqlLexer(stringStream);
> CommonTokenStream token = new CommonTokenStream(cqlLexer);
> CqlParser parser = new CqlParser(token);
>
> ParsedStatement query = parser.cqlStatement();
>
>
> if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
> CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
>
> CFMetaData
> .compile(stmt, cts.keyspace())
>
>
> .getColumnMetadata()
> .values()
> .stream()
> .forEach(cd -> System.out.println(cd));
>
> }
>
>}
>
> }
>
>
> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali  wrote:
>
>> Hi Anant,
>>
>> I just have CQL create table statement as a string I want to extract all
>> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
>> ClusteringKey, Clustering Order and so on. Thats really  it!
>>
>> Thanks!
>>
>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh > > wrote:
>>
>>> I think I understand what you are trying to do … but what is your goal?
>>> What do you mean “use it for different” queries… Maybe you want to do an
>>> event and have an event processor? Seems like you are trying to basically
>>> by pass that pattern and parse a query and split it into several actions?
>>>
>>> Did you look into this unit test folder?
>>>
>>>
>>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java
>>>
>>> --
>>> Rahul Singh
>>> rahul.si...@anant.us
>>>
>>> Anant Corporation
>>>
>>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali , wrote:
>>>
>>> Hi All,
>>>
>>> I have a need where I get a raw CQL create table statement as a String
>>> and I need to parse the keyspace, tablename, columns and so on..so I can
>>> use it for various queries and send it to C*. I used the example below
>>> from this link . I get
>>> the following error.  And I thought maybe someone in this mailing list will
>>> be more familiar with internals.
>>>
>>> Exception in thread "main"
>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace
>>> test_keyspace doesn't exist
>>> at
>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
>>> at com.hello.world.Test.main(Test.java:23)
>>>
>>>
>>> Here is my code.
>>>
>>> package com.hello.world;
>>>
>>> import org.antlr.runtime.ANTLRStringStream;
>>> import org.antlr.runtime.CommonTokenStream;
>>> import org.apache.cassandra.cql3.CqlLexer;
>>> import org.apache.cassandra.cql3.CqlParser;
>>> import org.apache.cassandra.cql3.statements.CreateTableStatement;
>>> import org.apache.cassandra.cql3.statements.ParsedStatement;
>>>
>>> public class Test {
>>>
>>> public static void main(String[] args) throws Exception {
>>> String stmt = "create table if not exists test_keyspace.my_table 
>>> (field1 text, field2 int, field3 set, field4 map, 
>>> primary key (field1) );";
>>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>>> CqlParser parser = new CqlParser(token);
>>> ParsedStatement query = parser.query();
>>> if (query.getClass().getDeclaringClass() == 
>>> CreateTableStatement.class) {
>>> CreateTableStatement.RawStatement cts = 
>>> (CreateTableStatement.RawStatement) query;
>>> System.out.println(cts.keyspace());
>>> System.out.println(cts.columnFamily());
>>> ParsedStatement.Prepared prepared = cts.prepare();
>>> CreateTableStatement cts2 = (CreateTableStatement) 
>>> prepared.statement;
>>>

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad

Kenneth, if you want to take the JIRA, feel free to self-assign it to
yourself and put up a pull request or patch, and I'll review.  I'd be very
happy to get more people involved in the docs.

On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman
<kenbrot...@yahoo.com.invalid> wrote:

> That information would have saved me time too.  Thanks for making a JIRA
> for it Jon.  Perhaps this is a good JIRA for me to begin with.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com] *On Behalf Of *Jon
> Haddad
> *Sent:* Thursday, February 22, 2018 11:11 AM
> *To:* user
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that:
> https://issues.apache.org/jira/browse/CASSANDRA-14254
>
>
>
> The datastax docs are pretty good for this though:
> https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>
>
>
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.  in 3.11.2, which was just
> released, CASSANDRA-13080 was backported which will help out when you add
> your second DC.  If you go this route, you can drop your token count down
> to 16 and get all the benefits with no drawbacks.
>
>
>
> At this point I would go straight to 3.11.2 and skip 3.0 as there were
> quite a few improvements that make it worthwhile along the way, in my
> opinion.  We work with several customers that are running 3.11 and are
> pretty happy with it.
>
>
>
> Yes, if there’s no data, you can initialize the cluster with
> auto_boostrap: true.  Be sure to change any key spaces using simple
> strategy to NTS first, and replica them to the new DC as well.
>
>
>
> Jon
>
>
>
>
>
> On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
>
>
> Hi jonathan
>
>
>
> Thank you for the answer. Do you know where to look to understand why this
> works. As i understood all the node then will chose ramdoms tokens. How can
> i assure the correctness of the ring?
>
>
>
> So as you said. Under the condition that there.is no data in the cluster.
> I can initialize a cluster multi dc without disable auto bootstrap.?
>
>
>
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:
>
> If it's a new cluster, there's no need to disable auto_bootstrap.  That
> setting prevents the first node in the second DC from being a replica for
> all the data in the first DC.  If there's no data in the first DC, you can
> skip a couple steps and just leave it on.
>
>
>
> Leave it on, and enjoy your afternoon.
>
>
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>
>
>
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
> Hello
>
> I would like to clarify this,
>
>
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
>
>
> Thank you for the help
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad

If it's a new cluster, there's no need to disable auto_bootstrap.  That
setting prevents the first node in the second DC from being a replica for
all the data in the first DC.  If there's no data in the first DC, you can
skip a couple steps and just leave it on.

Leave it on, and enjoy your afternoon.

Seeds don't bootstrap by the way, changing the setting on those nodes
doesn't do anything.

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
> Thank you for the help
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Jonathan Haddad

The easiest way to do this is replacing one node at a time by using rsync.
I don't know why it has to be more complicated than copying data to a new
machine and replacing it in the cluster.   Bringing up a new DC with
snapshots is going to be a nightmare in comparison.

On Wed, Feb 21, 2018 at 8:16 AM Carl Mueller 
wrote:

> DCs can be stood up with snapshotted data.
>
>
> Stand up a new cluster with your old cluster snapshots:
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html
>
> Then link the DCs together.
>
> Disclaimer: I've never done this in real life.
>
> On Wed, Feb 21, 2018 at 9:25 AM, Nitan Kainth 
> wrote:
>
>> New dc will be faster but may impact cluster performance due to streaming.
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2018, at 8:53 AM, Leena Ghatpande 
>> wrote:
>>
>> We do use LOCAL_ONE and LOCAL_Quorum currently. But these 8 nodes need to
>> be in 2 different DC< so we would end up create additional 2 new DC and
>> dropping 2.
>>
>> are there any advantages on adding DC over one node at a time?
>>
>>
>> --
>> *From:* Jeff Jirsa 
>> *Sent:* Wednesday, February 21, 2018 1:02 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Best approach to Replace existing 8 smaller nodes in
>> production cluster with New 8 nodes that are bigger in capacity, without a
>> downtime
>>
>> You add the nodes with rf=0 so there’s no streaming, then bump it to rf=1
>> and run repair, then rf=2 and run repair, then rf=3 and run repair, then
>> you either change the app to use local quorum in the new dc, or reverse the
>> process by decreasing the rf in the original dc by 1 at a time
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On Feb 20, 2018, at 8:51 PM, Kyrylo Lebediev 
>> wrote:
>> >
>> > I'd say, "add new DC, then remove old DC" approach is more risky
>> especially if they use QUORUM CL (in this case they will need to change CL
>> to LOCAL_QUORUM, otherwise they'll run into a lot of blocking read repairs).
>> > Also, if there is a chance to get rid of streaming, it worth doing as
>> usually direct data copy (not by means of C*) is more effective and less
>> troublesome.
>> >
>> > Regards,
>> > Kyrill
>> >
>> > 
>> > From: Nitan Kainth 
>> > Sent: Wednesday, February 21, 2018 1:04:05 AM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Best approach to Replace existing 8 smaller nodes in
>> production cluster with New 8 nodes that are bigger in capacity, without a
>> downtime
>> >
>> > You can also create a new DC and then terminate old one.
>> >
>> > Sent from my iPhone
>> >
>> >> On Feb 20, 2018, at 2:49 PM, Kyrylo Lebediev 
>> wrote:
>> >>
>> >> Hi,
>> >> Consider using this approach, replacing nodes one by one:
>> https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
>>
>> 
>> Cassandra instantaneous in place node replacement | Carlos ...
>> 
>> mrcalonso.com
>> At some point everyone using Cassandra faces the situation of having to
>> replace nodes. Either because the cluster needs to scale and some nodes are
>> too small or ...
>>
>> >>
>> >> Regards,
>> >> Kyrill
>> >>
>> >> 
>> >> From: Leena Ghatpande 
>> >> Sent: Tuesday, February 20, 2018 10:24:24 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Best approach to Replace existing 8 smaller nodes in
>> production cluster with New 8 nodes that are bigger in capacity, without a
>> downtime
>> >>
>> >> Best approach to replace existing 8 smaller 8 nodes in production
>> cluster with New 8 nodes that are bigger in capacity without a downtime
>> >>
>> >> We have 4 nodes each in 2 DC, and we want to replace these 8 nodes
>> with new 8 nodes that are bigger in capacity in terms of RAM,CPU and
>> Diskspace without a downtime.
>> >> The RF is set to 3 currently, and we have 2 large tables with upto
>> 70Million rows
>> >>
>> >> What would be the best approach to implement this
>> >>- Add 1 New Node and Decomission 1 Old node at a time?
>> >>- Add all New nodes to the cluster, and then decommission old nodes
>> ?
>> >>If we do this, can we still keep the RF=3 while we have 16
>> nodes at a point in the cluster before we start decommissioning?
>> >>   - How long do we wait in between adding a Node or decomissiing to
>> ensure the process is complete before we proceed?
>> >>   - Any tool that we can use to monitor if the add/decomission node is
>> done before we proceed to next
>> >>
>> >> Any other suggestion?
>> >>
>> >>
>> >>

Re: LWT broken?

2018-02-09 Thread Jonathan Haddad

If you want consistent reads you have to use the CL that enforces it.
There’s no way around it.
On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida  wrote:

> In this case, we only write using CAS (code guarantees that). We also
> never update, just insert if not exist. Once a hash exists, it never
> changes (it may get deleted later and that'll be a CAS delete as well).
>
> --
> Mahdi.
>
> On 2/9/18 1:38 PM, Jeff Jirsa wrote:
>
>
>
> On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida 
> wrote:
>
>>  Under what circumstances would we be reading inconsistent results ? Is
>> there a case where we end up reading a value that actually end up not being
>> written ?
>>
>>
>>
>
> If you ever write the same value with CAS and without CAS (different code
> paths both updating the same value), you're using CAS wrong, and
> inconsistencies can happen.
>
>
>
>

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Jonathan Haddad

That might be fine for a one off but is totally impractical at scale or
when using TWCS.
On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan  wrote:

> Or use the new user-defined compaction option recently introduced,
> provided you can determine over which SSTables a partition is spread
>
> On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad  wrote:
>
>> Give this a read through:
>>
>>
>> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>>
>> Basically you write your own logic for how stuff gets forgotten, then you
>> can recompact every sstable with upgradesstables -a.
>>
>> Jon
>>
>>
>> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar 
>> wrote:
>>
>> Hi everyone,
>>
>> Because of GDPR we really face the need to support “Right to Be
>> Forgotten” requests => https://gdpr-info.eu/art-17-gdpr/  stating that *"the
>> controller shall have the obligation to erase personal data without undue
>> delay"*
>>
>> Because I usually meet customers that do not have that much clients,
>> modeling one partition per client is almost always possible, easing
>> deletion by partition key.
>>
>> Then, appart from triggering a manual compaction on impacted tables using
>> STCS, I do not see how I can be GDPR compliant.
>>
>> I'm kind of surprised not to find any thread on that matter on the ML, do
>> you guys have any modeling strategy that would make it easier to get rid of
>> data ?
>>
>> Thank you for any given advice
>>
>> Nicolas
>>
>>
>>
>

Re: index_interval

2018-02-03 Thread Jonathan Haddad

I would also optimize for your worst case, which is hitting zero caches.
If you're using the default settings when creating a table, you're going to
get compression settings that are terrible for reads.  If you've got memory
to spare, I suggest changing your chunk_length_in_kb to 4 and disabling
readahead on your drives entirely.  I've seen 50-100x improvement in read
latency and throughput just by changing those settings.  I just did a talk
on this topic last week, slides are here:
https://www.slideshare.net/JonHaddad/performance-tuning-86995333

Jon

On Wed, Jul 12, 2017 at 2:03 PM Jeff Jirsa  wrote:

>
>
> On 2017-07-12 12:03 (-0700), Fay Hou [Storage Service]  <
> fay...@coupang.com> wrote:
> > First, a big thank to Jeff who spent endless time to help this mailing
> list.
> > Agreed that we should tune the key cache. In my case, my key cache hit
> rate
> > is about 20%. mainly because we do random read. We just going to leave
> the
> > index_interval as is for now.
> >
>
> That's pretty painful. If you can up that a bit, it'll probably help you
> out. You can adjust the index intervals, too, but I'd significantly
> increase key cache size first if it were my cluster.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Old tombstones not being cleaned up

2018-02-01 Thread Jonathan Haddad

Changing the defaul TTL doesn’t change the TTL on the existing data, only
new data. It’s only set if you don’t supply one yourself.

On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen 
wrote:

> Hi,
>
> We are running a small 9 node Cassandra v2.1.17 cluster. The cluster
> generally runs fine, but we have one table that are causing OOMs because an
> enormous amount of tombstones.
> Looking at the data in the table (sstable2json), the first of the
> tombstones are almost a year old. The table was initially created with a
> gc_grace_period of 10 days, but I have now lowered it to 1 hour.
> I have run a full repair of the table across all nodes. I have forced
> several major compactions of the table by using "nodetool compact", and
> also tried to switch from LeveledCompaction to SizeTierCompaction and back.
>
> What could cause cassandra to keep these tombstones?
>
> sstable2json:
> {"key": "foo",
>  "cells":
> [["082f-25ef-4324-bb8a-8cf013c823c1:_","082f-25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135],
>
>  
> ["10f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","10f3-c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731],
>
>  
> ["1d7a-ce95-4c74-b67e-f8cdffec4f85:_","1d7a-ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102],
>
>  
> ["1dd3-ae22-4f6e-944a-8cfa147cde68:_","1dd3-ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006],
>
>  
> ["22cc-d69c-4596-89e5-3e976c0cb9a8:_","22cc-d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448],
>
>  
> ["2777-4b1a-4267-8efc-c43054e63170:_","2777-4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691],
>
>  
> ["61e8-f48b-4484-96f1-f8b6a3ed8f9f:_","61e8-f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300],
>
>  
> ["63da-f165-449b-b65d-2b7869368734:_","63da-f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634],
>
>  
> ["656f-f8b5-472b-93ed-1a893002f027:_","656f-f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716],
> ...
> {"key": "bar",
>  "metadata": {"deletionInfo":
> {"markedForDeleteAt":1517402198585982,"localDeletionTime":1517402198}},
>  "cells":
> [["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2-ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965],
>
>  
> ["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb-7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186],
>
>  
> ["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167-aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840],
> 
>
> sstablemetadata:
> stablemetadata
> /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db
> SSTable:
> /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Bloom Filter FP chance: 0.10
> Minimum timestamp: 1488976211688000
> Maximum timestamp: 1517468644066000
> SSTable max local deletion time: 2147483647
> Compression ratio: 0.5121956624389545
> Estimated droppable tombstones: 18.00161766553587
> SSTable Level: 0
> Repaired at: 0
> ReplayPosition(segmentId=1517168739626, position=22690189)
> Estimated tombstone drop times:%n
> 1488976211: 1
> 1489906506:  4706
> 1490174752:  6111
> 1490449759:  6554
> 1490735410:  6559
> 1491016789:  6369
> 1491347982: 10216
> 1491680214: 13502
> ...
>
> desc:
> CREATE TABLE xxx.yyy (
> ti text,
> uuid text,
> json_data text,
> PRIMARY KEY (ti, uuid)
> ) WITH CLUSTERING ORDER BY (uuid ASC)
> AND bloom_filter_fp_chance = 0.1
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 3600
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
> jmx props(picture):
> [image: image.png]
>

Re: Problem adding a new node to a cluster

2017-12-18 Thread Jonathan Haddad

Definitely upgrade to 3.11.1.
On Sun, Dec 17, 2017 at 8:54 PM Pradeep Chhetri 
wrote:

> Hello Kurt,
>
> I realized it was because of RAM shortage which caused the issue. I bumped
> up the memory of the machine and node bootstrap started but this time i hit
> this bug of cassandra 3.9:
>
> https://issues.apache.org/jira/browse/CASSANDRA-12905
>
> I tried running nodetool bootstrap resume multiple times but every time it
> fails with exception after completing around 963%
>
> https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4
>
> Do you think there is some workaround for this. Or do you suggest
> upgrading to v3.11 which has this fix.
>
> Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling
> fashion or do we need to take care of something in case we have to upgrade.
>
> Thanks.
>
>
>
>
>
>
> On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves 
> wrote:
>
>> You haven't provided enough logs for us to really tell what's wrong. I
>> suggest running *nodetool netstats* *| grep -v 100% *to see if any
>> streams are still ongoing, and also running *nodetool compactionstats -H* to
>> see if there are any index builds the node might be waiting for prior to
>> joining the ring.
>>
>> If neither of those provide any useful information, send us the full
>> system.log and debug.log
>>
>> On 17 December 2017 at 11:19, Pradeep Chhetri 
>> wrote:
>>
>>> Hello all,
>>>
>>> I am trying to add a 4th node to a 3-node cluster which is using
>>> SimpleSnitch. But this new node is stuck in Joining state for last 20
>>> hours. We have around 10GB data per node with RF as 3.
>>>
>>> Its mostly stuck in redistributing index summaries phase.
>>>
>>> Here are the logs:
>>>
>>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>>>
>>> # nodetool status
>>> Datacenter: datacenter1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  AddressLoad   Tokens   Owns (effective)  Host ID
>>>Rack
>>> UJ  10.42.187.43   9.73 GiB   256  ?
>>>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
>>> UN  10.42.106.184  9.95 GiB   256  100.0%
>>> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
>>> UN  10.42.169.195  10.35 GiB  256  100.0%
>>> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
>>> UN  10.42.209.245  8.54 GiB   256  100.0%
>>> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>>>
>>> Not sure what is going here, will be very helpful if someone can help in
>>> identifying the issue.
>>>
>>> Thank you.
>>>
>>>
>>>
>>
>

Re: Stress test cassandr

2017-11-26 Thread Jonathan Haddad

Have you read through the docs for stress? You can have it use your own
queries and data model.

http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html
On Sun, Nov 26, 2017 at 1:02 AM Akshit Jain  wrote:

> Hi,
> What is the best way to stress test the cassandra cluster with real life
> workloads which is being followed currently?
> Currently i am using cassandra stress-tool but it generated blob data
> /yaml files provides the option to use custom keyspace.
>
> But what are the different parameters values which can be set to test the
> cassandra cluster in extreme environment?
>
>

Re: Full repair use case

2017-11-21 Thread Jonathan Haddad

I wouldn’t recommend using incremental repair at all at this time due to
some bugs that can cause massive overstreaming.

Our advice at TLP is to do subrange repair, and we maintain Reaper to help
with that: http://cassandra-reaper.io

Jon
On Wed, Nov 22, 2017 at 2:18 AM Akshit Jain  wrote:

> Is there any use case where we need full repair and incremental repair
> will not help?
> Actually i am performing incremental repair regularly is there any need to
> run full repair?
>

Re: Reaper 1.0

2017-11-17 Thread Jonathan Haddad

It should work with DSE, but we don’t explicitly test it.

Mind testing it and posting your results? If you could include the DSE
version it would be great.
On Thu, Nov 16, 2017 at 11:57 PM Anshu Vajpayee 
wrote:

> Thanks John for your efforts and nicley putting it on website & youtube .
>
> Just quick question - Is  it compactiable with DSE  versions? I know under
> the hood they have  cassandra only , but just wanted to listen your
> thoughts.
>
> On Thu, Nov 16, 2017 at 1:23 AM, Jon Haddad  wrote:
>
>> Apache 2 Licensed, just like Cassandra.
>> https://github.com/thelastpickle/cassandra-reaper/blob/master/LICENSE.txt
>>
>> Feel free to modify, put in prod, fork or improve.
>>
>> Unfortunately I had to re-upload the Getting Started video, we had
>> accidentally uploaded a first cut.  Correctly link is here:
>> https://www.youtube.com/watch?v=0dub29BgwPI
>>
>> Jon
>>
>> On Nov 15, 2017, at 9:14 AM, Harika Vangapelli -T (hvangape - AKRAYA INC
>> at Cisco)  wrote:
>>
>> Open source, free to use in production? Any License constraints, Please
>> let me know.
>>
>> I experimented with it yesterday, really liked it.
>>
>> 
>>
>> *Harika Vangapelli*
>> Engineer - IT
>> hvang...@cisco.com
>> Tel:
>> *Cisco Systems, Inc.*
>>
>>
>>
>> United States
>> cisco.com
>>
>> Think before you print.
>> This email may contain confidential and privileged material for the sole
>> use of the intended recipient. Any review, use, distribution or disclosure
>> by others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>> Please click here
>>  for
>> Company Registration Information.
>>
>> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com
>> ] *On Behalf Of *Jon Haddad
>> *Sent:* Tuesday, November 14, 2017 2:18 PM
>> *To:* user 
>> *Subject:* Reaper 1.0
>>
>> We’re excited to announce the release of the 1.0 version of Reaper for
>> Apache Cassandra!  We’ve made a lot of improvements to the flexibility of
>> managing repairs and simplified the UI based on feedback we’ve received.
>>
>> We’ve written a blog post discussing the changes in detail here:
>> http://thelastpickle.com/blog/2017/11/14/reaper-10-announcement.html
>>
>> We also have a new YouTube video to help folks get up and running
>> quickly: https://www.youtube.com/watch?v=YKJRRFa22T4
>>
>> The reaper site has all the docs should you have any questions:
>> http://cassandra-reaper.io/
>>
>> Thanks all,
>> Jon
>>
>>
>>
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>

Re: Node Failure Scenario

2017-11-14 Thread Jonathan Haddad

Anthony’s suggestions using replace_address_first_boot lets you avoid that
requirement, and it’s specifically why it was added in 2.2.
On Tue, Nov 14, 2017 at 1:02 AM Anshu Vajpayee 
wrote:

> Thanks  guys ,
>
> I thikn better to pass replace_address on command line rather than update
> the cassndra-env file so that there would not be requirement to  remove it
> later.
> 
>
> On Tue, Nov 14, 2017 at 6:32 AM, Anthony Grasso 
> wrote:
>
>> Hi Anshu,
>>
>> To add to Erick's comment, remember to remove the *replace_address* method
>> from the *cassandra-env.sh* file once the node has rejoined
>> successfully. The node will fail the next restart otherwise.
>>
>> Alternatively, use the *replace_address_first_boot* method which works
>> exactly the same way as *replace_address* the only difference is there
>> is no need to remove it from the *cassandra-env.sh* file.
>>
>> Kind regards,
>> Anthony
>>
>> On 13 November 2017 at 14:59, Erick Ramirez  wrote:
>>
>>> Use the replace_address method with its own IP address. Make sure you
>>> delete the contents of the following directories:
>>> - data/
>>> - commitlog/
>>> - saved_caches/
>>>
>>> Forget rejoining with repair -- it will just cause more problems. Cheers!
>>>
>>> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee <
>>> anshu.vajpa...@gmail.com> wrote:
>>>
 Hi All ,

 There was a node failure in one of production cluster due to disk
 failure.  After h/w recovery that node is noew ready be part of cluster,
 but it doesn't has any data due to disk crash.

 I can think of following option :

 1. replace the node with same. using replace_address

 2. Set bootstrap=false , start the node and run the repair to stream
 the data.

 Please suggest if both option are good and which is  best as per your
 experience. This is live production cluster.

 Thanks,

 --
 *C*heers,*
 *Anshu V*

>>>
>>
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>

Re: Alter table gc_grace_seconds

2017-10-01 Thread Jonathan Haddad

The TTL is applied to the cells on insert. Changing it doesn't change the
TTL on data that was inserted previously.

On Sun, Oct 1, 2017 at 6:23 AM Gábor Auth  wrote:

> Hi,
>
> The `alter table number_item with gc_grace_seconds = 3600;` is sets the
> grace seconds of tombstones of the future modification of number_item
> column family or affects all existing data?
>
> Bye,
> Gábor Auth
>
>

Re: Help in c* Data modelling

2017-07-23 Thread Jonathan Haddad

Using a different table to answer each query is the correct answer here
assuming there's a significant amount of data.

If you don't have that much data, maybe you should consider using a
database like Postgres which gives you query flexibility instead of
horizontal scalability.
On Sun, Jul 23, 2017 at 1:10 PM techpyaasa .  wrote:

> Hi vladyu/varunbarala
>
> Instead of creating second table as you said can I just have one(first)
> table below and get all rows with status=0.
>
> CREATE TABLE IF NOT EXISTS test.user ( account_id bigint, pid bigint, 
> disp_name text, status int, PRIMARY KEY (account_id, pid) ) WITH CLUSTERING 
> ORDER BY (pid ASC);
>>
>
> I mean get all rows within same partition(account_id) whose status=0(say some 
> value) using *UDF/UDA* in c* ?
>
>>
>> select group_by_status from test.user;
>
>
> where group_by_status is UDA/UDF
>
>
> Thanks in advance
> TechPyaasa
>
>
> On Sun, Jul 23, 2017 at 10:42 PM, Vladimir Yudovin 
> wrote:
>
>> Hi,
>>
>> unfortunately ORDER BY is supported for clustering columns only...
>>
>> *Winguzone  - Cloud Cassandra Hosting*
>>
>>
>>  On Sun, 23 Jul 2017 12:49:36 -0400 *techpyaasa .
>> >* wrote 
>>
>> Hi Varun,
>>
>> Thanks a lot for your reply.
>>
>> In this case if I want to update status(status can be updated for given
>> account_id, pid) , I need to delete existing row in 2nd table & add new
>> one...  :( :(
>>
>> Its like hitting cassandra twice for 1 change.. :(
>>
>>
>>
>> On Sun, Jul 23, 2017 at 8:42 PM, Varun Barala 
>> wrote:
>>
>> Hi,
>> You can create pseudo index table.
>>
>> IMO, structure can be:-
>>
>>
>> CREATE TABLE IF NOT EXISTS test.user ( account_id bigint, pid bigint, 
>> disp_name text, status int, PRIMARY KEY (account_id, pid) ) WITH CLUSTERING 
>> ORDER BY (pid ASC);
>> CREATE TABLE IF NOT EXISTS test.user_index ( account_id bigint, pid bigint, 
>> disp_name text, status int, PRIMARY KEY ((account_id, status), disp_name) ) 
>> WITH CLUSTERING ORDER BY (disp_name ASC);
>>
>> to support query *:-  select * from site24x7.wm_current_status where
>> uid=1 order by dispName asc;*
>> You can use *in condition* on last partition key *status *in table
>> *test.user_index.*
>>
>>
>> *It depends on your use case and amount of data as well. It can be
>> optimized more...*
>> Thanks!!
>>
>> On Sun, Jul 23, 2017 at 2:48 AM, techpyaasa . 
>> wrote:
>>
>> Hi ,
>>
>> We have a table like below :
>>
>> CREATE TABLE ks.cf ( accountId bigint, pid bigint, dispName text, status
>> int, PRIMARY KEY (accountId, pid) ) WITH CLUSTERING ORDER BY (pid ASC);
>>
>>
>>
>> We would like to have following queries possible on the above table:
>>
>> select * from site24x7.wm_current_status where uid=1 and mid=1;
>> select * from site24x7.wm_current_status where uid=1 order by dispName
>> asc;
>> select * from site24x7.wm_current_status where uid=1 and status=0 order
>> by dispName asc;
>>
>> I know first query is possible by default , but I want the last 2 queries
>> also to work.
>>
>> So can some one please let me know how can I achieve the same in
>> cassandra(c*-2.1.17). I'm ok with applying indexes etc,
>>
>> Thanks
>> TechPyaasa
>>
>>
>>
>

Re: write time for nulls is not consistent

2017-07-18 Thread Jonathan Haddad

Since you didn't show all the queries you used, I made an assumption that
you've inserted other data.  Where is the query that inserted a="v"?  I
don't know how to answer the question if you're not actually showing how to
reproduce the problem.


On Tue, Jul 18, 2017 at 12:24 PM Nitan Kainth <ni...@bamlabs.com> wrote:

> Jonathan,
>
> Please notice last rows with partition key values (w,v and t). they were
> inserted same way and has write time values
>
> On Jul 18, 2017, at 2:22 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
> This looks like expected behavior to me.  You aren't inserting a value for
> b.  Since there's no value, there's also no writetime.
>
> On Tue, Jul 18, 2017 at 12:15 PM Nitan Kainth <ni...@bamlabs.com> wrote:
>
>> Hi,
>>
>> We see that null columns have writetime(column) populated for few columns
>> and shows null for few other. Is it any bug or something else?
>>
>>
>> CREATE KEYSPACE test WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'us-east': '2'}  AND durable_writes = true;
>>
>> CREATE TABLE test.t (
>> a text PRIMARY KEY,
>> b text
>> ) WITH bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = ''
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE’;
>>
>> insert into test.t (a ) values ('z’);
>> insert into test.t (a ) values ('w’);
>> insert into test.t (a ) values ('e’);
>> insert into test.t (a ) values ('r’);
>> insert into test.t (a ) values ('t’);
>>
>>  select a,b, writetime (b) from test.t ;
>>
>>  *a* | *b*| *writetime(b)*
>> ---+--+--
>>  *z* | *null* | *null*
>>  *a* | *null* | *null*
>>  *c* | *null* | *null*
>>  *e* | *null* | *null*
>>  *r* | *null* | *null*
>>  *d* | *null* | *1500400745074499*
>>  *w* | *null* | *1500400745074499*
>>  *v* | *null* | *1500400745074499*
>>  *t* | *null* | *1500400745074499*
>>  *x* |*y* | *1500400626266371*
>>
>
>

1 2 3 4 5 >

1 - 100 of 498 matches

Mail list logo