A question to sstable2json

2016-09-07 Thread Lu, Boying
Hi, All,



We use Cassandra 2.1.11 in our product and I tried its sstable2json to dump 
some sstable file like this:

sstable2json full-path-to-sstable-file (e.g. xxx-Data.db).



But I got an assert error at  "assert initialized || 
keyspaceName.equals(SYSTEM_KS);" (Keyspace.java:97).

The 'keyspaceName' is our keyspace, but the SYSTEM_KS is "system" (defined 
inside Keyspace class).



This error is related to the following statement in SSTableExport.java:

Keyspace keyspace = Keyspace.open(descriptor.ksname); ( 
SSTableExport.java:432)



Adding "Keyspace.setInitialized()" before this statement solves the issue.



Is it a bug of Cassandra 2.1.11 or I misused this command?



Thanks



Boying



Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

2016-09-07 Thread Alexandr Porunov
Hello,

I need to store a "Long" Java variable.
The question is: whether the storage cost is the same both for store hex
representation of "Long" variable to the blob and for store "Long" variable
to the bigint?
Are there any performance pros or cons?
Is it OK to use blob as primary key?

Sincerely,
Alexandr


Re: Finding records that exist on Cassandra but not externally

2016-09-07 Thread Jens Rantil
Hi Chris,

Without fully knowing your usecase; You can't keep track of which keys have
changed in the external system somehow? Otherwise 2) sounds like the way to
go to me.

Cheers,
Jens

On Wed, Sep 7, 2016 at 9:47 AM  wrote:

> First off I hope this appropriate here- I couldn't decide whether this was
> a question for Cassandra users or spark users so if you think it's in the
> wiring place feel free to redirect me.
>
> I have a system that does a load of data manipulation using spark.  The
> output of this program is a effectively the new state that I want my
> Cassandra table to be in and the final step is to update Cassandra so that
> it matches this state.
>
> At present I'm currently inserting all rows in my generated state into
> Cassandra. This works for new rows and also for updating existing rows but
> doesn't of course delete any rows that were already in Cassandra but not in
> my new state.
>
> The problem I have now is how best to delete these missing rows. Options I
> have considered are:
>
> 1. Setting a ttl on inserts which is roughly the same as my data refresh
> period. This would probably be pretty performant but I really don't want to
> do this because it would mean that all data in my database would disappear
> if I had issues running my refresh task!
>
> 2. Every time I refresh the data I would first have to fetch all primary
> keys from Cassandra and, compare them to primary keys locally to create a
> list of pks to delete before the insert. This seems the most logicaly
> correct option but is going to result in reading vast amounts of data from
> Cassandra.
>
> 3. Truncating the entire table before refreshing Cassandra. This has the
> benefit of being pretty simple in code but I'm not sure of the performance
> implications of this and what will happen if I truncate while a node is
> offline.
>
> For reference the table is on the order of 10s of millions of rows and for
> any data refresh only a very small fraction (<.1%) will actually need
> deleting. 99% of the time I'll just be overwriting existing keys.
>
> I'd be grateful if anyone could shed some advice on the best solution here
> or whether there's some better way I haven't thought of.
>
> Thanks,
>
> Chris
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: Is it possible to replay hints after running nodetool drain?

2016-09-07 Thread jerome
Hi Romain,


I see. Thanks for the feedback, it's much appreciated. Since the only way to 
force it is through JMX I think we'll continue to use our current method.


Best,

Jerome


From: Romain Hardouin 
Sent: Monday, September 5, 2016 1:30:24 PM
To: user@cassandra.apache.org
Subject: Re: Is it possible to replay hints after running nodetool drain?

Hi,

You don't have to worry about that unless you write with CL = ANY.
The sole method to force hints that I know is to invoke scheduleHintDelivery on 
"org.apache.cassandra.db:type=HintedHandoffManager" via JMX but it takes an 
endpoint as argument.
If you have lots of nodes and several DCs, make sure to properly set 
hinted_handoff_throttle_in_kb and max_hints_delivery_threads.

Best,

Romain



Le Samedi 3 septembre 2016 2h59, jerome  a ?crit :


Hi Matija,

Thanks for your help! The downtime is minimal, usually less than five minutes. 
Since it is so short we're not so concerned about the node that's down missing 
data, we just want to make sure that before it goes down it replays all the 
hints it has so that there won't be any gaps in data on any other nodes for the 
hints it has while it's down.

Thanks,
Jerome

From: Matija Gobec 
Sent: Friday, September 2, 2016 6:05:01 PM
To: user@cassandra.apache.org
Subject: Re: Is it possible to replay hints after running nodetool drain?

Hi Jerome,

The node being drained stops listening to requests but the other nodes being 
coordinators for given requests will store hints for that downed node for a 
configured period of time (max_hint_window_in_ms is 3 hours by default). If the 
downed node is back online in this time window it will receive hints from other 
nodes in the cluster and eventually catch up.
What is your typical maintenance downtime?

Regards,
Matija

On Fri, Sep 2, 2016 at 10:53 PM, jerome 
> wrote:
Hello,

As part of routine maintenance for our cluster, my colleagues and I will run a 
nodetool drain before stopping a Cassandra node, performing maintenance, and 
bringing it back up. We run maintenance as a cron-job with a lock stored in a 
different cluster to ensure only node is ever down at a time. We would like to 
make sure the node has replayed all its hints before bringing it down to 
minimize the potential window in which users might read out-of-date data (we 
read at a consistency level of ONE). Is it possible to replay hints after 
performing a nodetool drain? The documentation leads me to believe its not 
since Cassandra will stop listening for connections from other nodes, but I was 
unable to find anything definitive either way. If a node won't replay hints 
after a nodetool drain, is there perhaps another way to tell Cassandra to stop 
listening for client connections but continue to replay hints to other nodes.

Thanks,
Jerome





Re: Read timeouts on primary key queries

2016-09-07 Thread Romain Hardouin
 Is it still fast if you specify CONSISTENCY LOCAL_QUORUM in cqlsh?
RomainLe Mercredi 7 septembre 2016 13h56, Joseph Tech 
 a écrit :
 

 Thanks, Romain for the detailed explanation. We use log4j 2 and i have added 
the driver logging for slow/error queries, will see if it helps to provide any 
pattern once in Prod. 
I tried getendpoints and getsstables for some of the timed out keys and most of 
them listed only 1 SSTable .There were a few which showed 2 SSTables. There is 
no specific trend on the keys, it's completely based on the user access, and 
the same keys return results instantly from cqlsh


On Tue, Sep 6, 2016 at 1:57 PM, Romain Hardouin  wrote:

There is nothing special in the two sstablemetadata outuputs but if the 
timeouts are due to a network split or overwhelmed node or something like that 
you won't see anything here. That said, if you have the keys which produced the 
timeouts then, yes, you can look for a regular pattern (i.e. always the same 
keys?). 

You can find sstables for a given key with nodetool:    nodetool getendpoints 
  Then you can run the following command on one/each node of 
the enpoints:    nodetool getsstables   
If many sstables are shown in the previous command it means that your data is 
fragmented but thanks to LCS this number should be low.
I think the most usefull actions now would be:
 1) Enable DEBUG for o.a.c.db.ConsistencyLevel, it won't spam your log file, 
you will see the following when errors will occur:    - Local replicas 
[, ...] are insufficient to satisfy LOCAL_QUORUM requirement of X 
live nodes in ''
    You are using C* 2.1 but you can have a look at the C* 2.2 logback.xml: 
https://github. com/apache/cassandra/blob/ cassandra-2.2/conf/logback.xml    
I'm using it on production, it's better because it creates a separate debug.log 
file with a asynchronous appender.
   Watch out when enabling:              
Because the default logback configuration set all o.a.c in DEBUG:           
       Instead you can set:  
         
    Also, if you want to restrict debug.log to DEBUG level only (instead of 
DEBUG+INFO+...) you can add a LevelFilter to ASYNCDEBUGLOG in logback.xml:      
            
DEBUG      ACCEPT      
DENY    
  Thus, the debug.log file will be empty unless some Consistency issues happen. 
   2) Enable slow queries log at the driver level with a QueryLogger:    
Cluster cluster = ...   // log queries longer than 1 second, see also 
withDynamicThreshold   QueryLogger queryLogger = QueryLogger.builder(cluster). 
withConstantThreshold(1000). build();   cluster.register(queryLogger) ;        
Then in your driver logback file:              3) And/or: you mentioned that you use 
DSE so you can enable slow queries logging in dse.yaml (cql_slow_log_options)
Best,
Romain   

 Le Lundi 5 septembre 2016 20h05, Joseph Tech  a écrit :
 

 Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52 MB 
(out2). The records are inserted with different TTLs based on their nature ; 
test records with 1 day, typeA records with 6 months, typeB records with 1 year 
etc. There are also explicit DELETEs from this table, though it's much lower 
than the rate of inserts.
I am not sure how to interpret this output, or if it's the right SSTables that 
were picked. Please advise. Is there a way to get the sstables corresponding to 
the keys that timed out, though they are accessible later.
On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee  
wrote:

We have seen read time out issue in cassandra due to high droppable tombstone 
ratio for repository. 
Please check for high droppable tombstone ratio for your repo. 
On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin  wrote:

Yes dclocal_read_repair_chance will reduce the cross-DC traffic and latency, so 
you can swap the values ( https://issues.apache.org/ji ra/browse/CASSANDRA-7320 
). I guess the sstable_size_in_mb was set to 50 because back in the day (C* 
1.0) the default size was way too small: 5 MB. So maybe someone in your company 
tried "10 * the default" i.e. 50 MB. Now the default is 160 MB. I don't say to 
change the value but just keep in mind that you're using a small value here, it 
could help you someday.
Regarding the cells, the histograms shows an *estimation* of the min, p50, ..., 
p99, max of cells based on SSTables metadata. On your screenshot, the Max is 
4768. So you have a partition key with ~ 4768 cells. The p99 is 1109, so 99% of 
your partition keys have less than (or equal to) 1109 cells. You can see these 
data of a given sstable with the tool sstablemetadata.
Best,
Romain
 

Le Lundi 5 septembre 2016 15h17, Joseph Tech  a 
écrit :
 

 Thanks, Romain . We will try to enable the DEBUG logging (assuming it won't 
clog the logs much) . Regarding the table configs, read_repair_chance must be 
carried over from older versions - mostly 

Re: Read timeouts on primary key queries

2016-09-07 Thread Joseph Tech
Thanks, Romain for the detailed explanation. We use log4j 2 and i have
added the driver logging for slow/error queries, will see if it helps to
provide any pattern once in Prod.

I tried getendpoints and getsstables for some of the timed out keys and
most of them listed only 1 SSTable .There were a few which showed 2
SSTables. There is no specific trend on the keys, it's completely based on
the user access, and the same keys return results instantly from cqlsh

On Tue, Sep 6, 2016 at 1:57 PM, Romain Hardouin  wrote:

> There is nothing special in the two sstablemetadata outuputs but if the
> timeouts are due to a network split or overwhelmed node or something like
> that you won't see anything here. That said, if you have the keys which
> produced the timeouts then, yes, you can look for a regular pattern (i.e.
> always the same keys?).
>
> You can find sstables for a given key with nodetool:
> nodetool getendpoints   
> Then you can run the following command on one/each node of the enpoints:
> nodetool getsstables   
>
> If many sstables are shown in the previous command it means that your data
> is fragmented but thanks to LCS this number should be low.
>
> I think the most usefull actions now would be:
>
> * 1) *Enable DEBUG for o.a.c.db.ConsistencyLevel, it won't spam your log
> file, you will see the following when errors will occur:
> - Local replicas [, ...] are insufficient to satisfy
> LOCAL_QUORUM requirement of X live nodes in ''
>
> You are using C* 2.1 but you can have a look at the C* 2.2
> logback.xml: https://github.com/apache/cassandra/blob/
> cassandra-2.2/conf/logback.xml
> I'm using it on production, it's better because it creates a separate
> debug.log file with a asynchronous appender.
>
>Watch out when enabling:
>
> 
>
>Because the default logback configuration set all o.a.c in DEBUG:
>
> 
>
>Instead you can set:
>
>
>
>
> Also, if you want to restrict debug.log to DEBUG level only (instead
> of DEBUG+INFO+...) you can add a LevelFilter to ASYNCDEBUGLOG in
> logback.xml:
>
> 
>   DEBUG
>   ACCEPT
>   DENY
> 
>
>   Thus, the debug.log file will be empty unless some Consistency issues
> happen.
>
> * 2) *Enable slow queries log at the driver level with a QueryLogger:
>
>Cluster cluster = ...
>// log queries longer than 1 second, see also withDynamicThreshold
>QueryLogger queryLogger = QueryLogger.builder(cluster).
> withConstantThreshold(1000).build();
>cluster.register(queryLogger);
>
> Then in your driver logback file:
>
>  level="DEBUG" />
>
>  *3) *And/or: you mentioned that you use DSE so you can enable slow
> queries logging in dse.yaml (cql_slow_log_options)
>
> Best,
>
> Romain
>
>
> Le Lundi 5 septembre 2016 20h05, Joseph Tech  a
> écrit :
>
>
> Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52
> MB (out2). The records are inserted with different TTLs based on their
> nature ; test records with 1 day, typeA records with 6 months, typeB
> records with 1 year etc. There are also explicit DELETEs from this table,
> though it's much lower than the rate of inserts.
>
> I am not sure how to interpret this output, or if it's the right SSTables
> that were picked. Please advise. Is there a way to get the sstables
> corresponding to the keys that timed out, though they are accessible later.
>
> On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee 
> wrote:
>
> We have seen read time out issue in cassandra due to high droppable
> tombstone ratio for repository.
>
> Please check for high droppable tombstone ratio for your repo.
>
> On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin 
> wrote:
>
> Yes dclocal_read_repair_chance will reduce the cross-DC traffic and
> latency, so you can swap the values ( https://issues.apache.org/ji
> ra/browse/CASSANDRA-7320
>  ). I guess the
> sstable_size_in_mb was set to 50 because back in the day (C* 1.0) the
> default size was way too small: 5 MB. So maybe someone in your company
> tried "10 * the default" i.e. 50 MB. Now the default is 160 MB. I don't say
> to change the value but just keep in mind that you're using a small value
> here, it could help you someday.
>
> Regarding the cells, the histograms shows an *estimation* of the min, p50,
> ..., p99, max of cells based on SSTables metadata. On your screenshot, the
> Max is 4768. So you have a partition key with ~ 4768 cells. The p99 is
> 1109, so 99% of your partition keys have less than (or equal to) 1109
> cells.
> You can see these data of a given sstable with the tool sstablemetadata.
>
> Best,
>
> Romain
>
>
>
> Le Lundi 5 septembre 2016 15h17, Joseph Tech  a
> écrit :
>
>
> Thanks, Romain . We will try to enable the DEBUG logging (assuming it
> won't clog the logs much) . Regarding the table configs, 

Re: Incremental repairs in 3.0

2016-09-07 Thread Jean Carlo
Well I did an small test on my cluster and I didn't get the results I was
expecting.

I truncate a table lcs, Then I inserted one line and I used nodetool flush
to have all the sstables. Using a RF 3 I ran a repair -inc directly and I
observed that the value of Reaired At was equal 0.

So I start to think that if there is no changes ( diff on the merkles
trees) the repair will not pass to the streaming phase, and it is there
where the sstables are marked as repaired.

I did another test to confirm my assomptions and I saw the sstables marked
as repaired. ("repaired at" value isn't 0). Well just those sstables not
sync.

So my quesion is, if we migrate to repair inc in prod and we dont use the
migration procedure, for tables that some sstables are never mutated, they
will keep in a not repaired state ?

Probably there is something I am not able to see





Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Tue, Sep 6, 2016 at 8:19 PM, Bryan Cheng  wrote:

> HI Jean,
>
> This blog post is a pretty good resource: http://www.datastax.
> com/dev/blog/anticompaction-in-cassandra-2-1
>
> I believe in 2.1.x you don't need to do the manual migration procedure,
> but if you run regular repairs and the data set under LCS is fairly large
> (what this means will probably depend on your data model and
> hardware/cluster makeup) you can take advantage of a full repair to make
> anticompaction a bit easier. What we observed was the anticompaction
> procedure taking longer than a standard full repair and with a higher load
> on the cluster while running.
>
> On Tue, Sep 6, 2016 at 2:00 AM, Jean Carlo 
> wrote:
>
>> Hi @Bryan
>>
>> When you said "sizable amount of data" you meant a huge amount of data
>> right? Our big table is in LCS and if we use the migration process we will
>> need to run a repair seq over this table for a long time.
>>
>> We are planning to go to repairs inc using the version 2.1.14
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Tue, Jun 21, 2016 at 4:34 PM, Vlad  wrote:
>>
>>> Thanks for answer!
>>>
>>> >It may still be a good idea to manually migrate if you have a sizable
>>> amount of data
>>> No, it would be brand new ;-) 3.0 cluster
>>>
>>>
>>>
>>> On Tuesday, June 21, 2016 1:21 AM, Bryan Cheng 
>>> wrote:
>>>
>>>
>>> Sorry, meant to say "therefore manual migration procedure should be
>>> UNnecessary"
>>>
>>> On Mon, Jun 20, 2016 at 3:21 PM, Bryan Cheng 
>>> wrote:
>>>
>>> I don't use 3.x so hopefully someone with operational experience can
>>> chime in, however my understanding is: 1) Incremental repairs should be the
>>> default in the 3.x release branch and 2) sstable repairedAt is now properly
>>> set in all sstables as of 2.2.x for standard repairs and therefore manual
>>> migration procedure should be necessary. It may still be a good idea to
>>> manually migrate if you have a sizable amount of data and are using LCS as
>>> anticompaction is rather painful.
>>>
>>> On Sun, Jun 19, 2016 at 6:37 AM, Vlad  wrote:
>>>
>>> Hi,
>>>
>>> assuming I have new, empty Cassandra cluster, how should I start using
>>> incremental repairs? Is incremental repair is default now (as I don't see
>>> *-inc* option in nodetool) and nothing is needed to use it, or should
>>> we perform migration procedure
>>> 
>>> anyway? And what happens to new column families?
>>>
>>> Regards.
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


unsubscribe

2016-09-07 Thread Mahmoud Younes
-- 
Best Regards
Mahmoud K. Younes

Disclaimer:
This Electronic Mail and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to which they are
addressed. If you are not an addressee, or have received the message by
error, please notify the sender via E-Mail or over the telephone and delete
this e-mail. You are not authorized to read, copy, disseminate, distribute
or use this E-Mail or any of its attachment in any way. The recipient
should check this email and any attachments for the presence of
viruses/worms. Mahmoud K. Younes accepts no liability for any damage caused
by any virus/worms transmitted by this email.


unsubscribe

2016-09-07 Thread Mike Yeap



Finding records that exist on Cassandra but not externally

2016-09-07 Thread chris
First off I hope this appropriate here- I couldn't decide whether this was a 
question for Cassandra users or spark users so if you think it's in the wiring 
place feel free to redirect me.

I have a system that does a load of data manipulation using spark.  The output 
of this program is a effectively the new state that I want my Cassandra table 
to be in and the final step is to update Cassandra so that it matches this 
state.

At present I'm currently inserting all rows in my generated state into 
Cassandra. This works for new rows and also for updating existing rows but 
doesn't of course delete any rows that were already in Cassandra but not in my 
new state. 
 
The problem I have now is how best to delete these missing rows. Options I have 
considered are:

1. Setting a ttl on inserts which is roughly the same as my data refresh 
period. This would probably be pretty performant but I really don't want to do 
this because it would mean that all data in my database would disappear if I 
had issues running my refresh task!

2. Every time I refresh the data I would first have to fetch all primary keys 
from Cassandra and, compare them to primary keys locally to create a list of 
pks to delete before the insert. This seems the most logicaly correct option 
but is going to result in reading vast amounts of data from Cassandra.

3. Truncating the entire table before refreshing Cassandra. This has the 
benefit of being pretty simple in code but I'm not sure of the performance 
implications of this and what will happen if I truncate while a node is offline.

For reference the table is on the order of 10s of millions of rows and for any 
data refresh only a very small fraction (<.1%) will actually need deleting. 99% 
of the time I'll just be overwriting existing keys. 

I'd be grateful if anyone could shed some advice on the best solution here or 
whether there's some better way I haven't thought of.

Thanks,

Chris