Re: Strange metadata being appended in some rows

2018-05-30 Thread Jeff Jirsa
Most of what you describe sounds pretty DSE specific so I suspect your best 
source for answers will be datastax

There were bugs in some versions of Cassandra that caused corruption during a 
few milliseconds race of ALTER TABLE, and if you’re not using compression with 
CRC checking enabled, it’s possible a bad disk or bit flip could have corrupted 
some of your data, but hard to say much beyond that.

-- 
Jeff Jirsa


> On May 30, 2018, at 3:42 PM, Charulata Sharma (charshar)  
> wrote:
> 
> Hi,
>  
> I am observing a very strange behavior in our cluster. Metadata is being 
> prefixed in some rows.
> This metadata cannot be sent by application primarily because application 
> writing to C* will not have this data,
> and also applications use custom Java objects and this metadata doesn’t fall 
> in that category.
>  
> I am suspecting that this is being added during multi data center replication 
> between analytics and transactions cluster.
> Does anyone have any idea on this. I found no clue online also.
>  
> This is the metadata: The nodes mentioned here are the analytics node and the 
> data directory is from DSEFS. We are experiencing
> Some corruption in dsefs and we understood that this is because of the 
> version we are on (DSE 5.1.5), so I suspect this could be related.
>  
> Thanks,
> Charu
>  
>  
> {
> "privateAddress": "333.33.333.333",
> "lastUpdate": "2018-04-25 05:05:40.665+",
> "readOnly": false,
> "up": true,
> "host": "cssdb-prd-07",
> "publicAddress": "333.33.333.333",
> "privatePort": 5599,
> "storageWeight": 1.0,
> "minFreeSpace": 5368709120,
> "publicPort": 5598,
> "dataCenter": "DC1-RPTG",
> "version": 2,
> "locationId": "498ff9f4-a989-461a-8ce7-17c41800",
> "rack": "RACK1",
> "estUsedSpace": 29771,
> "nodeId": "97494bd1-e783-4a8b-b180-a1946defc7cc",
> "estFreeSpace": 726682292224,
> "directory": "/cassandra/data/ccrcprd-cluster/data5/dsefs/data"
>   }


Strange metadata being appended in some rows

2018-05-30 Thread Charulata Sharma (charshar)
Hi,

I am observing a very strange behavior in our cluster. Metadata is being 
prefixed in some rows.
This metadata cannot be sent by application primarily because application 
writing to C* will not have this data,
and also applications use custom Java objects and this metadata doesn’t fall in 
that category.

I am suspecting that this is being added during multi data center replication 
between analytics and transactions cluster.
Does anyone have any idea on this. I found no clue online also.

This is the metadata: The nodes mentioned here are the analytics node and the 
data directory is from DSEFS. We are experiencing
Some corruption in dsefs and we understood that this is because of the version 
we are on (DSE 5.1.5), so I suspect this could be related.

Thanks,
Charu


{
"privateAddress": "333.33.333.333",
"lastUpdate": "2018-04-25 05:05:40.665+",
"readOnly": false,
"up": true,
"host": "cssdb-prd-07",
"publicAddress": "333.33.333.333",
"privatePort": 5599,
"storageWeight": 1.0,
"minFreeSpace": 5368709120,
"publicPort": 5598,
"dataCenter": "DC1-RPTG",
"version": 2,
"locationId": "498ff9f4-a989-461a-8ce7-17c41800",
"rack": "RACK1",
"estUsedSpace": 29771,
"nodeId": "97494bd1-e783-4a8b-b180-a1946defc7cc",
"estFreeSpace": 726682292224,
"directory": "/cassandra/data/ccrcprd-cluster/data5/dsefs/data"
  }


Re: Fwd: Re: cassandra update vs insert + delete

2018-05-30 Thread Rahul Singh
Soft delete = logical delete - which is an update.

An update doesnt create a tombstone . It appends to the sstable,and when they 
are compacted, the latest write is what is seen as the definitive data.

 A tombstone by definition is an update which tells C* to remove the value that 
was there before, but doesn’t do it immediately .


On May 28, 2018, 2:32 AM -0400, onmstester onmstester , 
wrote:
> How update is working underneath?
> Does it create a new row (because i'm changing a column of partition key) and 
> add a tombstone to the old row?
>
> Sent using Zoho Mail
>
>
>  Forwarded message 
> From : Jonathan Haddad 
> To : 
> Date : Mon, 28 May 2018 00:07:36 +0430
> Subject : Re: cassandra update vs insert + delete
>  Forwarded message 
>
> > What is a “soft delete”?
> >
> > My 2 cents, if you want to update some information just update it. There’s 
> > no need to overthink it.
> >
> > Batches are good if they’re constrained to a single partition, not so hot 
> > otherwise.
> >
> >
> > On Sun, May 27, 2018 at 8:19 AM Rahul Singh  
> > wrote:
> >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
> > > Deletes create tombstones — not really something to consider. Better to 
> > > add / update or insert data and do a soft delete on old data and apply a 
> > > TTL to remove it at a future time.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On May 27, 2018, 5:36 AM -0400, onmstester onmstester 
> > > , wrote:
> > >
> > > > Hi
> > > > I want to load all rows from many partitions and change a column value 
> > > > in each row, which of following ways is better concerning disk space 
> > > > and performance?
> > > > 1. create a update statement for every row and batch update for each 
> > > > partitions
> > > > 2. create an insert statement for every row and batch insert for each 
> > > > partition, then run a single statement to delete the whole old partition
> > > >
> > > > Thanks in advance
> > > >
> > > > Sent using Zoho Mail
> > > >
>
>


Re: Time Series schema performance

2018-05-30 Thread Haris Altaf
Thanks Affan Syed! :)

On Wed, 30 May 2018 at 11:07 sujeet jog  wrote:

> Thanks Jeff & Jonathan,
>
>
> On Tue, May 29, 2018 at 10:41 PM, Jonathan Haddad 
> wrote:
>
>> I wrote a post on this topic a while ago, might be worth reading over:
>>
>> http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
>> On Tue, May 29, 2018 at 8:02 AM Jeff Jirsa  wrote:
>>
>> > There’s a third option which is doing bucketing by time instead of by
>> hash, which tends to perform quite well if you’re using TWCS as it makes
>> it
>> quite likely that a read can be served by a single sstable
>>
>> > --
>> > Jeff Jirsa
>>
>>
>> > On May 29, 2018, at 6:49 AM, sujeet jog  wrote:
>>
>> > Folks,
>> > I have two alternatives for the time series schema i have, and wanted to
>> weigh of on one of the schema .
>>
>> > The query is given id, & timestamp, read the metrics associated with the
>> id
>>
>> > The records are inserted every 5 mins, and the number of id's = 2
>> million,
>> > so at every 5mins  it will be 2 million records that will be written.
>>
>> > Bucket Range  : 0 - 5K.
>>
>> > Schema 1 )
>>
>> > create table (
>> > id timeuuid,
>> > bucketid Int,
>> > date date,
>> > timestamp timestamp,
>> > metricName1   BigInt,
>> > metricName2 BigInt.
>> > ...
>> > .
>> > metricName300 BigInt,
>>
>> > Primary Key (( day, bucketid ) ,  id, timestamp)
>> > )
>>
>> > BucketId is just a murmur3 hash of the id  which acts as a splitter to
>> group id's in a partition
>>
>>
>> > Pros : -
>>
>> > Efficient write performance, since data is written to minimal partitions
>>
>> > Cons : -
>>
>> > While the first schema works best when queried programmatically, but is
>> a
>> bit inflexible If it has to be integrated with 3rd party BI tools like
>> tableau, bucket-id cannot be generated from tableau as it's not part of
>> the
>> view etc..
>>
>>
>> > Schema 2 )
>> > Same as above, without bucketid &  date.
>>
>> > Primary Key (id, timestamp )
>>
>> > Pros : -
>>
>> > BI tools don't need to generate bucket id lookups,
>>
>> > Cons :-
>> > Too many partitions are written every 5 mins,  say 2 million records
>> written in distinct 2 million partitions.
>>
>>
>>
>> > I believe writing this data to commit log is same in case of Schema 1 &
>> Schema 2 ) , but the actual performance bottleneck could be compaction,
>> since the data from memtable is transformed to ssTables often based on the
>> memory settings, and
>> > the header for every SSTable would maintain partitionIndex with
>>   byteoffsets,
>>
>> >   wanted to guage how bad can the performance of Schema-2 go with
>> respect
>> to Write/Compaction having to do many diskseeks.
>>
>> > compacting many tables but with too many partitionIndex entries because
>> of the high number of parititions ,  can this be a bottleneck ?..
>>
>> > Any indept performance explanation of Schema-2 would be very much
>> helpful
>>
>>
>> > Thanks,
>>
>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
> --
regards,
Haris


Re: Snapshot SSTable modified??

2018-05-30 Thread Max C.
Oh, thanks Elliott for the explanation!  I had no idea about that little tidbit 
concerning ctime.   Now it all makes sense!

- Max

> On May 28, 2018, at 10:24 pm, Elliott Sims  wrote:
> 
> Unix timestamps are a bit odd.  "mtime/Modify" is file changes, 
> "ctime/Change/(sometimes called create)" is file metadata changes, and a link 
> count change is a metadata change.  This seems like an odd decision on the 
> part of GNU tar, but presumably there's a good reason for it.
> 
> When the original sstable is compacted away, it's removed and therefore the 
> link count on the snapshot file is decremented.  The file's contents haven't 
> changed so mtime is identical, but ctime does get updated.  BSDtar doesn't 
> seem to interpret link count changes as a file change, so it's pretty 
> effective as a workaround.
> 
> 
> 
> On Fri, May 25, 2018 at 8:00 PM, Max C  > wrote:
> I looked at the source code for GNU tar, and it looks for a change in the 
> create time or (more likely) a change in the size.
> 
> This seems very strange to me — I would think that creating a snapshot would 
> cause a flush and then once the SSTables are written, hardlinks would be 
> created and the SSTables wouldn't be written to after that.
> 
> Our solution is to wait 5 minutes and retry the tar if an error occurs.  This 
> isn't ideal - but it's the best I could come up with.  :-/
> 
> Thanks Jeff & others for your responses.
> 
> - Max
> 
>> On May 25, 2018, at 5:05pm, Elliott Sims > > wrote:
>> 
>> I've run across this problem before - it seems like GNU tar interprets 
>> changes in the link count as changes to the file, so if the file gets 
>> compacted mid-backup it freaks out even if the file contents are unchanged.  
>> I worked around it by just using bsdtar instead.
>> 
>> On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth > > wrote:
>> Jeff,
>> 
>> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't 
>> impact backup operation right?
>> 
>> 
>> Regards,
>> Nitan K.
>> Cassandra and Oracle Architect/SME
>> Datastax Certified Cassandra expert
>> Oracle 10g Certified
>> 
>> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa > > wrote:
>> In versions before 3.0, sstables were written with a -tmp filename and 
>> copied/moved to the final filename when complete. This changes in 3.0 - we 
>> write into the file with the final name, and have a journal/log to let uss 
>> know when it's done/final/live.
>> 
>> Therefore, you can no longer just watch for a -Data.db file to be created 
>> and uploaded - you have to watch the log to make sure it's not being written.
>> 
>> 
>> On Wed, May 23, 2018 at 2:18 PM, Max C. > > wrote:
>> Hi Everyone,
>> 
>> We’ve noticed a few times in the last few weeks that when we’re doing 
>> backups, tar has complained with messages like this:
>> 
>> tar: 
>> /var/lib/cassandra/data/mars/test_instances_by_test_id-6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>>  file changed as we read it
>> 
>> Any idea what might be causing this?
>> 
>> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our 
>> backup process:
>> 
>> 
>> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
>> nodetool snapshot -t $SNAPSHOT_NAME
>> 
>> for each keyspace
>> - dump schema to “schema.cql"
>> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz 
>> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>> 
>> nodetool clearsnapshot -t $SNAPSHOT_NAME
>> 
>> Thanks.
>> 
>> - Max
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> 
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> 
>> 
>> 
>> 
>> 
> 
> 



Re: Time Series schema performance

2018-05-30 Thread sujeet jog
Thanks Jeff & Jonathan,


On Tue, May 29, 2018 at 10:41 PM, Jonathan Haddad  wrote:

> I wrote a post on this topic a while ago, might be worth reading over:
> http://thelastpickle.com/blog/2017/08/02/time-series-data-
> modeling-massive-scale.html
> On Tue, May 29, 2018 at 8:02 AM Jeff Jirsa  wrote:
>
> > There’s a third option which is doing bucketing by time instead of by
> hash, which tends to perform quite well if you’re using TWCS as it makes it
> quite likely that a read can be served by a single sstable
>
> > --
> > Jeff Jirsa
>
>
> > On May 29, 2018, at 6:49 AM, sujeet jog  wrote:
>
> > Folks,
> > I have two alternatives for the time series schema i have, and wanted to
> weigh of on one of the schema .
>
> > The query is given id, & timestamp, read the metrics associated with the
> id
>
> > The records are inserted every 5 mins, and the number of id's = 2
> million,
> > so at every 5mins  it will be 2 million records that will be written.
>
> > Bucket Range  : 0 - 5K.
>
> > Schema 1 )
>
> > create table (
> > id timeuuid,
> > bucketid Int,
> > date date,
> > timestamp timestamp,
> > metricName1   BigInt,
> > metricName2 BigInt.
> > ...
> > .
> > metricName300 BigInt,
>
> > Primary Key (( day, bucketid ) ,  id, timestamp)
> > )
>
> > BucketId is just a murmur3 hash of the id  which acts as a splitter to
> group id's in a partition
>
>
> > Pros : -
>
> > Efficient write performance, since data is written to minimal partitions
>
> > Cons : -
>
> > While the first schema works best when queried programmatically, but is a
> bit inflexible If it has to be integrated with 3rd party BI tools like
> tableau, bucket-id cannot be generated from tableau as it's not part of the
> view etc..
>
>
> > Schema 2 )
> > Same as above, without bucketid &  date.
>
> > Primary Key (id, timestamp )
>
> > Pros : -
>
> > BI tools don't need to generate bucket id lookups,
>
> > Cons :-
> > Too many partitions are written every 5 mins,  say 2 million records
> written in distinct 2 million partitions.
>
>
>
> > I believe writing this data to commit log is same in case of Schema 1 &
> Schema 2 ) , but the actual performance bottleneck could be compaction,
> since the data from memtable is transformed to ssTables often based on the
> memory settings, and
> > the header for every SSTable would maintain partitionIndex with
>   byteoffsets,
>
> >   wanted to guage how bad can the performance of Schema-2 go with respect
> to Write/Compaction having to do many diskseeks.
>
> > compacting many tables but with too many partitionIndex entries because
> of the high number of parititions ,  can this be a bottleneck ?..
>
> > Any indept performance explanation of Schema-2 would be very much helpful
>
>
> > Thanks,
>
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>